[jira] [Created] (SPARK-42089) Different result in nested lambda function
Ruifeng Zheng created SPARK-42089: - Summary: Different result in nested lambda function Key: SPARK-42089 URL: https://issues.apache.org/jira/browse/SPARK-42089 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Ruifeng Zheng test_nested_higher_order_function {code:java} Traceback (most recent call last): File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/tests/test_functions.py", line 814, in test_nested_higher_order_function self.assertEquals(actual, expected) AssertionError: Lists differ: [Row(n='a', l='a'), Row(n='b', l='b'), Row[124 chars]'c')] != [(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), ([43 chars]'c')] First differing element 0: Row(n='a', l='a') (1, 'a') - [Row(n='a', l='a'), - Row(n='b', l='b'), - Row(n='c', l='c'), - Row(n='a', l='a'), - Row(n='b', l='b'), - Row(n='c', l='c'), - Row(n='a', l='a'), - Row(n='b', l='b'), - Row(n='c', l='c')] + [(1, 'a'), + (1, 'b'), + (1, 'c'), + (2, 'a'), + (2, 'b'), + (2, 'c'), + (3, 'a'), + (3, 'b'), + (3, 'c')] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-41471) SPJ: Reduce Spark shuffle when only one side of a join is KeyGroupedPartitioning
[ https://issues.apache.org/jira/browse/SPARK-41471 ] Mars deleted comment on SPARK-41471: -- was (Author: JIRAUSER290821): [~csun] Hi, I want to take it :) > SPJ: Reduce Spark shuffle when only one side of a join is > KeyGroupedPartitioning > > > Key: SPARK-41471 > URL: https://issues.apache.org/jira/browse/SPARK-41471 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.1 >Reporter: Chao Sun >Priority: Major > > When only one side of a SPJ (Storage-Partitioned Join) is > {{{}KeyGroupedPartitioning{}}}, Spark currently needs to shuffle both sides > using {{{}HashPartitioning{}}}. However, we may just need to shuffle the > other side according to the partition transforms defined in > {{{}KeyGroupedPartitioning{}}}. This is especially useful when the other side > is relatively small. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-38230) InsertIntoHadoopFsRelationCommand unnecessarily fetches details of partitions in most cases
[ https://issues.apache.org/jira/browse/SPARK-38230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677162#comment-17677162 ] Xiaomin Zhang edited comment on SPARK-38230 at 1/16/23 10:54 AM: - Hello [~coalchan] Thanks for working on this. I created PR based on your work with some improvements as per [~Jackey Lee]'s comment. [~roczei] Can you please review the PR and let me know if I missed anything? Thank you. was (Author: ximz): Hello [~coalchan] Thanks for working on this. I created PR based on your work with some improvements as per [~Jackey Lee]'s comment. Now we don't need a new parameter and Spark will only invoke listPartitions for the case of overwriting hive static partitions. [~roczei] Can you please review the PR and let me know if I missed anything? Thank you. > InsertIntoHadoopFsRelationCommand unnecessarily fetches details of partitions > in most cases > --- > > Key: SPARK-38230 > URL: https://issues.apache.org/jira/browse/SPARK-38230 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.2 >Reporter: Coal Chan >Priority: Major > > In > `org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand`, > `sparkSession.sessionState.catalog.listPartitions` will call method > `org.apache.hadoop.hive.metastore.listPartitionsPsWithAuth` of hive metastore > client, this method will produce multiple queries per partition on hive > metastore db. So when you insert into a table which has too many > partitions(ie: 10k), it will produce too many queries on hive metastore > db(ie: n * 10k = 10nk), it puts a lot of strain on the database. > In fact, it calls method `listPartitions` in order to get locations of > partitions and get `customPartitionLocations`. But in most cases, we do not > have custom partitions, we can just get partition names, so we can call > method listPartitionNames. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41993) Move RowEncoder to AgnosticEncoders
[ https://issues.apache.org/jira/browse/SPARK-41993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-41993. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39517 [https://github.com/apache/spark/pull/39517] > Move RowEncoder to AgnosticEncoders > --- > > Key: SPARK-41993 > URL: https://issues.apache.org/jira/browse/SPARK-41993 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > Fix For: 3.4.0 > > > Move RowEncoder to the AgnosticEncoder framework. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42087) Use `--no-same-owner` when HiveExternalCatalogVersionsSuite untars.
[ https://issues.apache.org/jira/browse/SPARK-42087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-42087. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39601 [https://github.com/apache/spark/pull/39601] > Use `--no-same-owner` when HiveExternalCatalogVersionsSuite untars. > --- > > Key: SPARK-42087 > URL: https://issues.apache.org/jira/browse/SPARK-42087 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42087) Use `--no-same-owner` when HiveExternalCatalogVersionsSuite untars.
[ https://issues.apache.org/jira/browse/SPARK-42087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-42087: - Assignee: Dongjoon Hyun > Use `--no-same-owner` when HiveExternalCatalogVersionsSuite untars. > --- > > Key: SPARK-42087 > URL: https://issues.apache.org/jira/browse/SPARK-42087 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36728) Can't create datetime object from anything other then year column Pyspark - koalas
[ https://issues.apache.org/jira/browse/SPARK-36728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36728. -- Resolution: Duplicate Thanks for letting me know > Can't create datetime object from anything other then year column Pyspark - > koalas > -- > > Key: SPARK-36728 > URL: https://issues.apache.org/jira/browse/SPARK-36728 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Bjørn Jørgensen >Priority: Major > Attachments: pyspark_date.txt, pyspark_date2.txt > > > If I create a datetime object it must be from columns named year. > > df = ps.DataFrame(\{'year': [2015, 2016],df = ps.DataFrame({'year': [2015, > 2016], 'month': [2, 3], 'day': [4, 5], > 'hour': [2, 3], 'minute': [10, 30], > 'second': [21,25]}) df.info() > Int64Index: 2 entries, 1 to 0Data > columns (total 6 columns): # Column Non-Null Count Dtype--- -- > -- - 0 year 2 non-null int64 1 month 2 > non-null int64 2 day 2 non-null int64 3 hour 2 non-null > int64 4 minute 2 non-null int64 5 second 2 non-null > int64dtypes: int64(6) > df['date'] = ps.to_datetime(df[['year', 'month', 'day']]) > df.info() > Int64Index: 2 entries, 1 to 0Data > columns (total 7 columns): # Column Non-Null Count Dtype --- -- > -- - 0 year 2 non-null int64 1 month > 2 non-null int64 2 day 2 non-null int64 3 hour > 2 non-null int64 4 minute 2 non-null int64 5 second > 2 non-null int64 6 date 2 non-null datetime64dtypes: > datetime64(1), int64(6) > df_test = ps.DataFrame(\{'testyear': [2015, 2016], > 'testmonth': [2, 3], 'testday': [4, 5], > 'hour': [2, 3], 'minute': [10, 30], > 'second': [21,25]}) df_test['date'] = ps.to_datetime(df[['testyear', > 'testmonth', 'testday']]) > ---KeyError > Traceback (most recent call > last)/tmp/ipykernel_73/904491906.py in > 1 df_test['date'] = > ps.to_datetime(df[['testyear', 'testmonth', 'testday']]) > /opt/spark/python/pyspark/pandas/frame.py in __getitem__(self, key) 11853 > return self.loc[:, key] 11854 elif is_list_like(key):> > 11855 return self.loc[:, list(key)] 11856 raise > NotImplementedError(key) 11857 > /opt/spark/python/pyspark/pandas/indexing.py in __getitem__(self, key) 476 > returns_series, 477 series_name,--> 478 > ) = self._select_cols(cols_sel) 479 480 if cond > is None and limit is None and returns_series: > /opt/spark/python/pyspark/pandas/indexing.py in _select_cols(self, cols_sel, > missing_keys) 322 return self._select_cols_else(cols_sel, > missing_keys) 323 elif is_list_like(cols_sel):--> 324 > return self._select_cols_by_iterable(cols_sel, missing_keys) 325 > else: 326 return self._select_cols_else(cols_sel, missing_keys) > /opt/spark/python/pyspark/pandas/indexing.py in > _select_cols_by_iterable(self, cols_sel, missing_keys) 1352 > if not found: 1353 if missing_keys is None:-> 1354 > raise KeyError("['{}'] not in > index".format(name_like_string(key))) 1355 else: 1356 > missing_keys.append(key) > KeyError: "['testyear'] not in index" > df_test > testyear testmonth testday hour minute second0 2015 2 4 2 10 211 2016 3 5 3 > 30 25 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42032) Map data show in different order
[ https://issues.apache.org/jira/browse/SPARK-42032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42032: Assignee: Ruifeng Zheng > Map data show in different order > > > Key: SPARK-42032 > URL: https://issues.apache.org/jira/browse/SPARK-42032 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > > not sure whether this needs to be fixed: > {code:java} > ** > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", > line 1623, in pyspark.sql.connect.functions.transform_keys > Failed example: > df.select(transform_keys( > "data", lambda k, _: upper(k)).alias("data_upper") > ).show(truncate=False) > Expected: > +-+ > |data_upper | > +-+ > |{BAR -> 2.0, FOO -> -2.0}| > +-+ > Got: > +-+ > |data_upper | > +-+ > |{FOO -> -2.0, BAR -> 2.0}| > +-+ > > ** > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", > line 1630, in pyspark.sql.connect.functions.transform_values > Failed example: > df.select(transform_values( > "data", lambda k, v: when(k.isin("IT", "OPS"), v + 10.0).otherwise(v) > ).alias("new_data")).show(truncate=False) > Expected: > +---+ > |new_data | > +---+ > |{OPS -> 34.0, IT -> 20.0, SALES -> 2.0}| > +---+ > Got: > +---+ > |new_data | > +---+ > |{IT -> 20.0, SALES -> 2.0, OPS -> 34.0}| > +---+ > > ** >1 of 2 in pyspark.sql.connect.functions.transform_keys >1 of 2 in pyspark.sql.connect.functions.transform_values > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42032) Map data show in different order
[ https://issues.apache.org/jira/browse/SPARK-42032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42032. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39600 [https://github.com/apache/spark/pull/39600] > Map data show in different order > > > Key: SPARK-42032 > URL: https://issues.apache.org/jira/browse/SPARK-42032 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > > not sure whether this needs to be fixed: > {code:java} > ** > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", > line 1623, in pyspark.sql.connect.functions.transform_keys > Failed example: > df.select(transform_keys( > "data", lambda k, _: upper(k)).alias("data_upper") > ).show(truncate=False) > Expected: > +-+ > |data_upper | > +-+ > |{BAR -> 2.0, FOO -> -2.0}| > +-+ > Got: > +-+ > |data_upper | > +-+ > |{FOO -> -2.0, BAR -> 2.0}| > +-+ > > ** > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", > line 1630, in pyspark.sql.connect.functions.transform_values > Failed example: > df.select(transform_values( > "data", lambda k, v: when(k.isin("IT", "OPS"), v + 10.0).otherwise(v) > ).alias("new_data")).show(truncate=False) > Expected: > +---+ > |new_data | > +---+ > |{OPS -> 34.0, IT -> 20.0, SALES -> 2.0}| > +---+ > Got: > +---+ > |new_data | > +---+ > |{IT -> 20.0, SALES -> 2.0, OPS -> 34.0}| > +---+ > > ** >1 of 2 in pyspark.sql.connect.functions.transform_keys >1 of 2 in pyspark.sql.connect.functions.transform_values > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41988) Fix map_filter and map_zip_with output order
[ https://issues.apache.org/jira/browse/SPARK-41988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41988. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39600 [https://github.com/apache/spark/pull/39600] > Fix map_filter and map_zip_with output order > > > Key: SPARK-41988 > URL: https://issues.apache.org/jira/browse/SPARK-41988 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.4.0 > > > {code:java} > File > "/Users/jiaan.geng/git-local/github-forks/spark/python/pyspark/sql/connect/functions.py", > line 1423, in pyspark.sql.connect.functions.map_filter > Failed example: > df.select(map_filter( > "data", lambda _, v: v > 30.0).alias("data_filtered") > ).show(truncate=False) > Expected: > +--+ > |data_filtered | > +--+ > |{baz -> 32.0, foo -> 42.0}| > +--+ > Got: > +--+ > |data_filtered | > +--+ > |{foo -> 42.0, baz -> 32.0}| > +--+ > > ** > File > "/Users/jiaan.geng/git-local/github-forks/spark/python/pyspark/sql/connect/functions.py", > line 1465, in pyspark.sql.connect.functions.map_zip_with > Failed example: > df.select(map_zip_with( > "base", "ratio", lambda k, v1, v2: round(v1 * v2, > 2)).alias("updated_data") > ).show(truncate=False) > Expected: > +---+ > |updated_data | > +---+ > |{SALES -> 16.8, IT -> 48.0}| > +---+ > Got: > +---+ > |updated_data | > +---+ > |{IT -> 48.0, SALES -> 16.8}| > +---+ > > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41988) Fix map_filter and map_zip_with output order
[ https://issues.apache.org/jira/browse/SPARK-41988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41988: Assignee: jiaan.geng > Fix map_filter and map_zip_with output order > > > Key: SPARK-41988 > URL: https://issues.apache.org/jira/browse/SPARK-41988 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > > {code:java} > File > "/Users/jiaan.geng/git-local/github-forks/spark/python/pyspark/sql/connect/functions.py", > line 1423, in pyspark.sql.connect.functions.map_filter > Failed example: > df.select(map_filter( > "data", lambda _, v: v > 30.0).alias("data_filtered") > ).show(truncate=False) > Expected: > +--+ > |data_filtered | > +--+ > |{baz -> 32.0, foo -> 42.0}| > +--+ > Got: > +--+ > |data_filtered | > +--+ > |{foo -> 42.0, baz -> 32.0}| > +--+ > > ** > File > "/Users/jiaan.geng/git-local/github-forks/spark/python/pyspark/sql/connect/functions.py", > line 1465, in pyspark.sql.connect.functions.map_zip_with > Failed example: > df.select(map_zip_with( > "base", "ratio", lambda k, v1, v2: round(v1 * v2, > 2)).alias("updated_data") > ).show(truncate=False) > Expected: > +---+ > |updated_data | > +---+ > |{SALES -> 16.8, IT -> 48.0}| > +---+ > Got: > +---+ > |updated_data | > +---+ > |{IT -> 48.0, SALES -> 16.8}| > +---+ > > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35801) SPIP: Row-level operations in Data Source V2
[ https://issues.apache.org/jira/browse/SPARK-35801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677256#comment-17677256 ] Dongjoon Hyun commented on SPARK-35801: --- Hi, [~viirya]and [~aokolnychyi]. Are we going to open this in Apache Spark 3.4.0 as `Unresolved`? > SPIP: Row-level operations in Data Source V2 > > > Key: SPARK-35801 > URL: https://issues.apache.org/jira/browse/SPARK-35801 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Anton Okolnychyi >Priority: Major > Labels: SPIP > > Row-level operations such as UPDATE, DELETE, MERGE are becoming more and more > important for modern Big Data workflows. Use cases include but are not > limited to deleting a set of records for regulatory compliance, updating a > set of records to fix an issue in the ingestion pipeline, applying changes in > a transaction log to a fact table. Row-level operations allow users to easily > express their use cases that would otherwise require much more SQL. Common > patterns for updating partitions are to read, union, and overwrite or read, > diff, and append. Using commands like MERGE, these operations are easier to > express and can be more efficient to run. > Hive supports [MERGE|https://blog.cloudera.com/update-hive-tables-easy-way/] > and Spark should implement similar support. > SPIP: > https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42032) Map data show in different order
[ https://issues.apache.org/jira/browse/SPARK-42032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677235#comment-17677235 ] jiaan.geng commented on SPARK-42032: After my investigation, the fact is the result of connect is the same as Dataset API. This is a bug of pyspark. cc [~podongfeng][~gurwls223] > Map data show in different order > > > Key: SPARK-42032 > URL: https://issues.apache.org/jira/browse/SPARK-42032 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > > not sure whether this needs to be fixed: > {code:java} > ** > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", > line 1623, in pyspark.sql.connect.functions.transform_keys > Failed example: > df.select(transform_keys( > "data", lambda k, _: upper(k)).alias("data_upper") > ).show(truncate=False) > Expected: > +-+ > |data_upper | > +-+ > |{BAR -> 2.0, FOO -> -2.0}| > +-+ > Got: > +-+ > |data_upper | > +-+ > |{FOO -> -2.0, BAR -> 2.0}| > +-+ > > ** > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", > line 1630, in pyspark.sql.connect.functions.transform_values > Failed example: > df.select(transform_values( > "data", lambda k, v: when(k.isin("IT", "OPS"), v + 10.0).otherwise(v) > ).alias("new_data")).show(truncate=False) > Expected: > +---+ > |new_data | > +---+ > |{OPS -> 34.0, IT -> 20.0, SALES -> 2.0}| > +---+ > Got: > +---+ > |new_data | > +---+ > |{IT -> 20.0, SALES -> 2.0, OPS -> 34.0}| > +---+ > > ** >1 of 2 in pyspark.sql.connect.functions.transform_keys >1 of 2 in pyspark.sql.connect.functions.transform_values > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42088) Running python3 setup.py sdist on windows reports a permission error
[ https://issues.apache.org/jira/browse/SPARK-42088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42088: Assignee: Apache Spark > Running python3 setup.py sdist on windows reports a permission error > > > Key: SPARK-42088 > URL: https://issues.apache.org/jira/browse/SPARK-42088 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.0 >Reporter: zheju_he >Assignee: Apache Spark >Priority: Minor > > My system version is windows 10, and I can run setup.py with administrator > permissions, so there will be no error. However, it may be troublesome for us > to upgrade permissions with Windows Server, so we need to modify the code of > setup.py to ensure no error. To avoid the hassle of compiling for the user, I > suggest modifying the following code to enable the out-of-the-box effect > {code:python} > def _supports_symlinks(): > """Check if the system supports symlinks (e.g. *nix) or not.""" > return getattr(os, "symlink", None) is not None and > ctypes.windll.shell32.IsUserAnAdmin() != 0 if sys.platform == "win32" else > True > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42088) Running python3 setup.py sdist on windows reports a permission error
[ https://issues.apache.org/jira/browse/SPARK-42088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677214#comment-17677214 ] zheju_he commented on SPARK-42088: -- This is my pr address https://github.com/apache/spark/pull/39603 > Running python3 setup.py sdist on windows reports a permission error > > > Key: SPARK-42088 > URL: https://issues.apache.org/jira/browse/SPARK-42088 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.0 >Reporter: zheju_he >Priority: Minor > > My system version is windows 10, and I can run setup.py with administrator > permissions, so there will be no error. However, it may be troublesome for us > to upgrade permissions with Windows Server, so we need to modify the code of > setup.py to ensure no error. To avoid the hassle of compiling for the user, I > suggest modifying the following code to enable the out-of-the-box effect > {code:python} > def _supports_symlinks(): > """Check if the system supports symlinks (e.g. *nix) or not.""" > return getattr(os, "symlink", None) is not None and > ctypes.windll.shell32.IsUserAnAdmin() != 0 if sys.platform == "win32" else > True > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42086) Sort test cases in SQLQueryTestSuite
[ https://issues.apache.org/jira/browse/SPARK-42086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-42086: - Assignee: Dongjoon Hyun > Sort test cases in SQLQueryTestSuite > > > Key: SPARK-42086 > URL: https://issues.apache.org/jira/browse/SPARK-42086 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42086) Sort test cases in SQLQueryTestSuite
[ https://issues.apache.org/jira/browse/SPARK-42086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-42086. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39599 [https://github.com/apache/spark/pull/39599] > Sort test cases in SQLQueryTestSuite > > > Key: SPARK-42086 > URL: https://issues.apache.org/jira/browse/SPARK-42086 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42088) Running python3 setup.py sdist on windows reports a permission error
[ https://issues.apache.org/jira/browse/SPARK-42088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42088: Assignee: (was: Apache Spark) > Running python3 setup.py sdist on windows reports a permission error > > > Key: SPARK-42088 > URL: https://issues.apache.org/jira/browse/SPARK-42088 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.0 >Reporter: zheju_he >Priority: Minor > > My system version is windows 10, and I can run setup.py with administrator > permissions, so there will be no error. However, it may be troublesome for us > to upgrade permissions with Windows Server, so we need to modify the code of > setup.py to ensure no error. To avoid the hassle of compiling for the user, I > suggest modifying the following code to enable the out-of-the-box effect > {code:python} > def _supports_symlinks(): > """Check if the system supports symlinks (e.g. *nix) or not.""" > return getattr(os, "symlink", None) is not None and > ctypes.windll.shell32.IsUserAnAdmin() != 0 if sys.platform == "win32" else > True > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42088) Running python3 setup.py sdist on windows reports a permission error
[ https://issues.apache.org/jira/browse/SPARK-42088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677216#comment-17677216 ] Apache Spark commented on SPARK-42088: -- User 'zekai-li' has created a pull request for this issue: https://github.com/apache/spark/pull/39603 > Running python3 setup.py sdist on windows reports a permission error > > > Key: SPARK-42088 > URL: https://issues.apache.org/jira/browse/SPARK-42088 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.0 >Reporter: zheju_he >Priority: Minor > > My system version is windows 10, and I can run setup.py with administrator > permissions, so there will be no error. However, it may be troublesome for us > to upgrade permissions with Windows Server, so we need to modify the code of > setup.py to ensure no error. To avoid the hassle of compiling for the user, I > suggest modifying the following code to enable the out-of-the-box effect > {code:python} > def _supports_symlinks(): > """Check if the system supports symlinks (e.g. *nix) or not.""" > return getattr(os, "symlink", None) is not None and > ctypes.windll.shell32.IsUserAnAdmin() != 0 if sys.platform == "win32" else > True > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42088) Running python3 setup.py sdist on windows reports a permission error
zheju_he created SPARK-42088: Summary: Running python3 setup.py sdist on windows reports a permission error Key: SPARK-42088 URL: https://issues.apache.org/jira/browse/SPARK-42088 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.4.0 Reporter: zheju_he My system version is windows 10, and I can run setup.py with administrator permissions, so there will be no error. However, it may be troublesome for us to upgrade permissions with Windows Server, so we need to modify the code of setup.py to ensure no error. To avoid the hassle of compiling for the user, I suggest modifying the following code to enable the out-of-the-box effect {code:python} def _supports_symlinks(): """Check if the system supports symlinks (e.g. *nix) or not.""" return getattr(os, "symlink", None) is not None and ctypes.windll.shell32.IsUserAnAdmin() != 0 if sys.platform == "win32" else True {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42085) Make `from_arrow_schema` support nested types
[ https://issues.apache.org/jira/browse/SPARK-42085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-42085. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39594 [https://github.com/apache/spark/pull/39594] > Make `from_arrow_schema` support nested types > - > > Key: SPARK-42085 > URL: https://issues.apache.org/jira/browse/SPARK-42085 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42085) Make `from_arrow_schema` support nested types
[ https://issues.apache.org/jira/browse/SPARK-42085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-42085: - Assignee: Ruifeng Zheng > Make `from_arrow_schema` support nested types > - > > Key: SPARK-42085 > URL: https://issues.apache.org/jira/browse/SPARK-42085 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org