[jira] [Created] (SPARK-42089) Different result in nested lambda function

2023-01-16 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-42089:
-

 Summary: Different result in nested lambda function
 Key: SPARK-42089
 URL: https://issues.apache.org/jira/browse/SPARK-42089
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng


test_nested_higher_order_function


{code:java}
Traceback (most recent call last):
  File 
"/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/tests/test_functions.py", 
line 814, in test_nested_higher_order_function
self.assertEquals(actual, expected)
AssertionError: Lists differ: [Row(n='a', l='a'), Row(n='b', l='b'), Row[124 
chars]'c')] != [(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), ([43 chars]'c')]

First differing element 0:
Row(n='a', l='a')
(1, 'a')

- [Row(n='a', l='a'),
-  Row(n='b', l='b'),
-  Row(n='c', l='c'),
-  Row(n='a', l='a'),
-  Row(n='b', l='b'),
-  Row(n='c', l='c'),
-  Row(n='a', l='a'),
-  Row(n='b', l='b'),
-  Row(n='c', l='c')]
+ [(1, 'a'),
+  (1, 'b'),
+  (1, 'c'),
+  (2, 'a'),
+  (2, 'b'),
+  (2, 'c'),
+  (3, 'a'),
+  (3, 'b'),
+  (3, 'c')]


{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-41471) SPJ: Reduce Spark shuffle when only one side of a join is KeyGroupedPartitioning

2023-01-16 Thread Mars (Jira)


[ https://issues.apache.org/jira/browse/SPARK-41471 ]


Mars deleted comment on SPARK-41471:
--

was (Author: JIRAUSER290821):
[~csun] Hi, I want to take it :)

> SPJ: Reduce Spark shuffle when only one side of a join is 
> KeyGroupedPartitioning
> 
>
> Key: SPARK-41471
> URL: https://issues.apache.org/jira/browse/SPARK-41471
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Chao Sun
>Priority: Major
>
> When only one side of a SPJ (Storage-Partitioned Join) is 
> {{{}KeyGroupedPartitioning{}}}, Spark currently needs to shuffle both sides 
> using {{{}HashPartitioning{}}}. However, we may just need to shuffle the 
> other side according to the partition transforms defined in 
> {{{}KeyGroupedPartitioning{}}}. This is especially useful when the other side 
> is relatively small.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-38230) InsertIntoHadoopFsRelationCommand unnecessarily fetches details of partitions in most cases

2023-01-16 Thread Xiaomin Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677162#comment-17677162
 ] 

Xiaomin Zhang edited comment on SPARK-38230 at 1/16/23 10:54 AM:
-

Hello [~coalchan] Thanks for working on this.  I created PR based on your work 
with some improvements as per [~Jackey Lee]'s comment. 
[~roczei] Can you please review the PR and let me know if I missed anything? 
Thank you.


was (Author: ximz):
Hello [~coalchan] Thanks for working on this.  I created PR based on your work 
with some improvements as per [~Jackey Lee]'s comment. Now we don't need a new 
parameter and Spark will only invoke listPartitions for the case of overwriting 
hive static partitions.
[~roczei] Can you please review the PR and let me know if I missed anything? 
Thank you.

> InsertIntoHadoopFsRelationCommand unnecessarily fetches details of partitions 
> in most cases
> ---
>
> Key: SPARK-38230
> URL: https://issues.apache.org/jira/browse/SPARK-38230
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.2
>Reporter: Coal Chan
>Priority: Major
>
> In 
> `org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand`,
>  `sparkSession.sessionState.catalog.listPartitions` will call method 
> `org.apache.hadoop.hive.metastore.listPartitionsPsWithAuth` of hive metastore 
> client, this method will produce multiple queries per partition on hive 
> metastore db. So when you insert into a table which has too many 
> partitions(ie: 10k), it will produce too many queries on hive metastore 
> db(ie: n * 10k = 10nk), it puts a lot of strain on the database.
> In fact, it calls method `listPartitions` in order to get locations of 
> partitions and get `customPartitionLocations`. But in most cases, we do not 
> have custom partitions, we can just get partition names, so we can call 
> method listPartitionNames.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41993) Move RowEncoder to AgnosticEncoders

2023-01-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-41993.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39517
[https://github.com/apache/spark/pull/39517]

> Move RowEncoder to AgnosticEncoders
> ---
>
> Key: SPARK-41993
> URL: https://issues.apache.org/jira/browse/SPARK-41993
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
> Fix For: 3.4.0
>
>
> Move RowEncoder to the AgnosticEncoder framework.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42087) Use `--no-same-owner` when HiveExternalCatalogVersionsSuite untars.

2023-01-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42087.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39601
[https://github.com/apache/spark/pull/39601]

> Use `--no-same-owner` when HiveExternalCatalogVersionsSuite untars.
> ---
>
> Key: SPARK-42087
> URL: https://issues.apache.org/jira/browse/SPARK-42087
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42087) Use `--no-same-owner` when HiveExternalCatalogVersionsSuite untars.

2023-01-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42087:
-

Assignee: Dongjoon Hyun

> Use `--no-same-owner` when HiveExternalCatalogVersionsSuite untars.
> ---
>
> Key: SPARK-42087
> URL: https://issues.apache.org/jira/browse/SPARK-42087
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36728) Can't create datetime object from anything other then year column Pyspark - koalas

2023-01-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-36728.
--
Resolution: Duplicate

Thanks for letting me know

> Can't create datetime object from anything other then year column Pyspark - 
> koalas
> --
>
> Key: SPARK-36728
> URL: https://issues.apache.org/jira/browse/SPARK-36728
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Bjørn Jørgensen
>Priority: Major
> Attachments: pyspark_date.txt, pyspark_date2.txt
>
>
> If I create a datetime object it must be from columns named year.
>  
> df = ps.DataFrame(\{'year': [2015, 2016],df = ps.DataFrame({'year': [2015, 
> 2016],                   'month': [2, 3],                    'day': [4, 5],   
>                  'hour': [2, 3],                    'minute': [10, 30],       
>              'second': [21,25]}) df.info()
> Int64Index: 2 entries, 1 to 0Data 
> columns (total 6 columns): #   Column  Non-Null Count  Dtype---  --  
> --  - 0   year    2 non-null      int64 1   month   2 
> non-null      int64 2   day     2 non-null      int64 3   hour    2 non-null  
>     int64 4   minute  2 non-null      int64 5   second  2 non-null      
> int64dtypes: int64(6)
> df['date'] = ps.to_datetime(df[['year', 'month', 'day']])
> df.info()
> Int64Index: 2 entries, 1 to 0Data 
> columns (total 7 columns): #   Column  Non-Null Count  Dtype     ---  --  
> --  -      0   year    2 non-null      int64      1   month   
> 2 non-null      int64      2   day     2 non-null      int64      3   hour    
> 2 non-null      int64      4   minute  2 non-null      int64      5   second  
> 2 non-null      int64      6   date    2 non-null      datetime64dtypes: 
> datetime64(1), int64(6)
> df_test = ps.DataFrame(\{'testyear': [2015, 2016],                   
> 'testmonth': [2, 3],                    'testday': [4, 5],                    
> 'hour': [2, 3],                    'minute': [10, 30],                    
> 'second': [21,25]}) df_test['date'] = ps.to_datetime(df[['testyear', 
> 'testmonth', 'testday']])
> ---KeyError
>                                   Traceback (most recent call 
> last)/tmp/ipykernel_73/904491906.py in > 1 df_test['date'] = 
> ps.to_datetime(df[['testyear', 'testmonth', 'testday']])
> /opt/spark/python/pyspark/pandas/frame.py in __getitem__(self, key)  11853    
>          return self.loc[:, key]  11854         elif is_list_like(key):> 
> 11855             return self.loc[:, list(key)]  11856         raise 
> NotImplementedError(key)  11857 
> /opt/spark/python/pyspark/pandas/indexing.py in __getitem__(self, key)    476 
>                 returns_series,    477                 series_name,--> 478    
>          ) = self._select_cols(cols_sel)    479     480             if cond 
> is None and limit is None and returns_series:
> /opt/spark/python/pyspark/pandas/indexing.py in _select_cols(self, cols_sel, 
> missing_keys)    322             return self._select_cols_else(cols_sel, 
> missing_keys)    323         elif is_list_like(cols_sel):--> 324             
> return self._select_cols_by_iterable(cols_sel, missing_keys)    325         
> else:    326             return self._select_cols_else(cols_sel, missing_keys)
> /opt/spark/python/pyspark/pandas/indexing.py in 
> _select_cols_by_iterable(self, cols_sel, missing_keys)   1352                 
> if not found:   1353                     if missing_keys is None:-> 1354      
>                    raise KeyError("['{}'] not in 
> index".format(name_like_string(key)))   1355                     else:   1356 
>                         missing_keys.append(key)
> KeyError: "['testyear'] not in index"
> df_test
> testyear testmonth testday hour minute second0 2015 2 4 2 10 211 2016 3 5 3 
> 30 25



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42032) Map data show in different order

2023-01-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42032:


Assignee: Ruifeng Zheng

> Map data show in different order
> 
>
> Key: SPARK-42032
> URL: https://issues.apache.org/jira/browse/SPARK-42032
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>
> not sure whether this needs to be fixed:
> {code:java}
> **
> File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
> line 1623, in pyspark.sql.connect.functions.transform_keys
> Failed example:
> df.select(transform_keys(
> "data", lambda k, _: upper(k)).alias("data_upper")
> ).show(truncate=False)
> Expected:
> +-+
> |data_upper   |
> +-+
> |{BAR -> 2.0, FOO -> -2.0}|
> +-+
> Got:
> +-+
> |data_upper   |
> +-+
> |{FOO -> -2.0, BAR -> 2.0}|
> +-+
> 
> **
> File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
> line 1630, in pyspark.sql.connect.functions.transform_values
> Failed example:
> df.select(transform_values(
> "data", lambda k, v: when(k.isin("IT", "OPS"), v + 10.0).otherwise(v)
> ).alias("new_data")).show(truncate=False)
> Expected:
> +---+
> |new_data   |
> +---+
> |{OPS -> 34.0, IT -> 20.0, SALES -> 2.0}|
> +---+
> Got:
> +---+
> |new_data   |
> +---+
> |{IT -> 20.0, SALES -> 2.0, OPS -> 34.0}|
> +---+
> 
> **
>1 of   2 in pyspark.sql.connect.functions.transform_keys
>1 of   2 in pyspark.sql.connect.functions.transform_values
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42032) Map data show in different order

2023-01-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42032.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39600
[https://github.com/apache/spark/pull/39600]

> Map data show in different order
> 
>
> Key: SPARK-42032
> URL: https://issues.apache.org/jira/browse/SPARK-42032
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>
> not sure whether this needs to be fixed:
> {code:java}
> **
> File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
> line 1623, in pyspark.sql.connect.functions.transform_keys
> Failed example:
> df.select(transform_keys(
> "data", lambda k, _: upper(k)).alias("data_upper")
> ).show(truncate=False)
> Expected:
> +-+
> |data_upper   |
> +-+
> |{BAR -> 2.0, FOO -> -2.0}|
> +-+
> Got:
> +-+
> |data_upper   |
> +-+
> |{FOO -> -2.0, BAR -> 2.0}|
> +-+
> 
> **
> File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
> line 1630, in pyspark.sql.connect.functions.transform_values
> Failed example:
> df.select(transform_values(
> "data", lambda k, v: when(k.isin("IT", "OPS"), v + 10.0).otherwise(v)
> ).alias("new_data")).show(truncate=False)
> Expected:
> +---+
> |new_data   |
> +---+
> |{OPS -> 34.0, IT -> 20.0, SALES -> 2.0}|
> +---+
> Got:
> +---+
> |new_data   |
> +---+
> |{IT -> 20.0, SALES -> 2.0, OPS -> 34.0}|
> +---+
> 
> **
>1 of   2 in pyspark.sql.connect.functions.transform_keys
>1 of   2 in pyspark.sql.connect.functions.transform_values
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41988) Fix map_filter and map_zip_with output order

2023-01-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41988.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39600
[https://github.com/apache/spark/pull/39600]

> Fix map_filter and map_zip_with output order
> 
>
> Key: SPARK-41988
> URL: https://issues.apache.org/jira/browse/SPARK-41988
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> File 
> "/Users/jiaan.geng/git-local/github-forks/spark/python/pyspark/sql/connect/functions.py",
>  line 1423, in pyspark.sql.connect.functions.map_filter
> Failed example:
> df.select(map_filter(
> "data", lambda _, v: v > 30.0).alias("data_filtered")
> ).show(truncate=False)
> Expected:
> +--+
> |data_filtered |
> +--+
> |{baz -> 32.0, foo -> 42.0}|
> +--+
> Got:
> +--+
> |data_filtered |
> +--+
> |{foo -> 42.0, baz -> 32.0}|
> +--+
> 
> **
> File 
> "/Users/jiaan.geng/git-local/github-forks/spark/python/pyspark/sql/connect/functions.py",
>  line 1465, in pyspark.sql.connect.functions.map_zip_with
> Failed example:
> df.select(map_zip_with(
> "base", "ratio", lambda k, v1, v2: round(v1 * v2, 
> 2)).alias("updated_data")
> ).show(truncate=False)
> Expected:
> +---+
> |updated_data   |
> +---+
> |{SALES -> 16.8, IT -> 48.0}|
> +---+
> Got:
> +---+
> |updated_data   |
> +---+
> |{IT -> 48.0, SALES -> 16.8}|
> +---+
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41988) Fix map_filter and map_zip_with output order

2023-01-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41988:


Assignee: jiaan.geng

> Fix map_filter and map_zip_with output order
> 
>
> Key: SPARK-41988
> URL: https://issues.apache.org/jira/browse/SPARK-41988
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
>
> {code:java}
> File 
> "/Users/jiaan.geng/git-local/github-forks/spark/python/pyspark/sql/connect/functions.py",
>  line 1423, in pyspark.sql.connect.functions.map_filter
> Failed example:
> df.select(map_filter(
> "data", lambda _, v: v > 30.0).alias("data_filtered")
> ).show(truncate=False)
> Expected:
> +--+
> |data_filtered |
> +--+
> |{baz -> 32.0, foo -> 42.0}|
> +--+
> Got:
> +--+
> |data_filtered |
> +--+
> |{foo -> 42.0, baz -> 32.0}|
> +--+
> 
> **
> File 
> "/Users/jiaan.geng/git-local/github-forks/spark/python/pyspark/sql/connect/functions.py",
>  line 1465, in pyspark.sql.connect.functions.map_zip_with
> Failed example:
> df.select(map_zip_with(
> "base", "ratio", lambda k, v1, v2: round(v1 * v2, 
> 2)).alias("updated_data")
> ).show(truncate=False)
> Expected:
> +---+
> |updated_data   |
> +---+
> |{SALES -> 16.8, IT -> 48.0}|
> +---+
> Got:
> +---+
> |updated_data   |
> +---+
> |{IT -> 48.0, SALES -> 16.8}|
> +---+
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35801) SPIP: Row-level operations in Data Source V2

2023-01-16 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677256#comment-17677256
 ] 

Dongjoon Hyun commented on SPARK-35801:
---

Hi, [~viirya]and [~aokolnychyi]. Are we going to open this in Apache Spark 
3.4.0 as `Unresolved`?

> SPIP: Row-level operations in Data Source V2
> 
>
> Key: SPARK-35801
> URL: https://issues.apache.org/jira/browse/SPARK-35801
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Anton Okolnychyi
>Priority: Major
>  Labels: SPIP
>
> Row-level operations such as UPDATE, DELETE, MERGE are becoming more and more 
> important for modern Big Data workflows. Use cases include but are not 
> limited to deleting a set of records for regulatory compliance, updating a 
> set of records to fix an issue in the ingestion pipeline, applying changes in 
> a transaction log to a fact table. Row-level operations allow users to easily 
> express their use cases that would otherwise require much more SQL. Common 
> patterns for updating partitions are to read, union, and overwrite or read, 
> diff, and append. Using commands like MERGE, these operations are easier to 
> express and can be more efficient to run.
> Hive supports [MERGE|https://blog.cloudera.com/update-hive-tables-easy-way/] 
> and Spark should implement similar support.
> SPIP: 
> https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42032) Map data show in different order

2023-01-16 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677235#comment-17677235
 ] 

jiaan.geng commented on SPARK-42032:


After my investigation, the fact is the result of connect is the same as 
Dataset API. This is a bug of pyspark.
cc [~podongfeng][~gurwls223]

> Map data show in different order
> 
>
> Key: SPARK-42032
> URL: https://issues.apache.org/jira/browse/SPARK-42032
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> not sure whether this needs to be fixed:
> {code:java}
> **
> File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
> line 1623, in pyspark.sql.connect.functions.transform_keys
> Failed example:
> df.select(transform_keys(
> "data", lambda k, _: upper(k)).alias("data_upper")
> ).show(truncate=False)
> Expected:
> +-+
> |data_upper   |
> +-+
> |{BAR -> 2.0, FOO -> -2.0}|
> +-+
> Got:
> +-+
> |data_upper   |
> +-+
> |{FOO -> -2.0, BAR -> 2.0}|
> +-+
> 
> **
> File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", 
> line 1630, in pyspark.sql.connect.functions.transform_values
> Failed example:
> df.select(transform_values(
> "data", lambda k, v: when(k.isin("IT", "OPS"), v + 10.0).otherwise(v)
> ).alias("new_data")).show(truncate=False)
> Expected:
> +---+
> |new_data   |
> +---+
> |{OPS -> 34.0, IT -> 20.0, SALES -> 2.0}|
> +---+
> Got:
> +---+
> |new_data   |
> +---+
> |{IT -> 20.0, SALES -> 2.0, OPS -> 34.0}|
> +---+
> 
> **
>1 of   2 in pyspark.sql.connect.functions.transform_keys
>1 of   2 in pyspark.sql.connect.functions.transform_values
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42088) Running python3 setup.py sdist on windows reports a permission error

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42088:


Assignee: Apache Spark

> Running python3 setup.py sdist on windows reports a permission error
> 
>
> Key: SPARK-42088
> URL: https://issues.apache.org/jira/browse/SPARK-42088
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: zheju_he
>Assignee: Apache Spark
>Priority: Minor
>
> My system version is windows 10, and I can run setup.py with administrator 
> permissions, so there will be no error. However, it may be troublesome for us 
> to upgrade permissions with Windows Server, so we need to modify the code of 
> setup.py to ensure no error. To avoid the hassle of compiling for the user, I 
> suggest modifying the following code to enable the out-of-the-box effect
> {code:python}
> def _supports_symlinks():
> """Check if the system supports symlinks (e.g. *nix) or not."""
> return getattr(os, "symlink", None) is not None and 
> ctypes.windll.shell32.IsUserAnAdmin() != 0 if sys.platform == "win32" else 
> True
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42088) Running python3 setup.py sdist on windows reports a permission error

2023-01-16 Thread zheju_he (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677214#comment-17677214
 ] 

zheju_he commented on SPARK-42088:
--

This is my pr address https://github.com/apache/spark/pull/39603

> Running python3 setup.py sdist on windows reports a permission error
> 
>
> Key: SPARK-42088
> URL: https://issues.apache.org/jira/browse/SPARK-42088
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: zheju_he
>Priority: Minor
>
> My system version is windows 10, and I can run setup.py with administrator 
> permissions, so there will be no error. However, it may be troublesome for us 
> to upgrade permissions with Windows Server, so we need to modify the code of 
> setup.py to ensure no error. To avoid the hassle of compiling for the user, I 
> suggest modifying the following code to enable the out-of-the-box effect
> {code:python}
> def _supports_symlinks():
> """Check if the system supports symlinks (e.g. *nix) or not."""
> return getattr(os, "symlink", None) is not None and 
> ctypes.windll.shell32.IsUserAnAdmin() != 0 if sys.platform == "win32" else 
> True
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42086) Sort test cases in SQLQueryTestSuite

2023-01-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42086:
-

Assignee: Dongjoon Hyun

> Sort test cases in SQLQueryTestSuite
> 
>
> Key: SPARK-42086
> URL: https://issues.apache.org/jira/browse/SPARK-42086
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42086) Sort test cases in SQLQueryTestSuite

2023-01-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42086.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39599
[https://github.com/apache/spark/pull/39599]

> Sort test cases in SQLQueryTestSuite
> 
>
> Key: SPARK-42086
> URL: https://issues.apache.org/jira/browse/SPARK-42086
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42088) Running python3 setup.py sdist on windows reports a permission error

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42088:


Assignee: (was: Apache Spark)

> Running python3 setup.py sdist on windows reports a permission error
> 
>
> Key: SPARK-42088
> URL: https://issues.apache.org/jira/browse/SPARK-42088
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: zheju_he
>Priority: Minor
>
> My system version is windows 10, and I can run setup.py with administrator 
> permissions, so there will be no error. However, it may be troublesome for us 
> to upgrade permissions with Windows Server, so we need to modify the code of 
> setup.py to ensure no error. To avoid the hassle of compiling for the user, I 
> suggest modifying the following code to enable the out-of-the-box effect
> {code:python}
> def _supports_symlinks():
> """Check if the system supports symlinks (e.g. *nix) or not."""
> return getattr(os, "symlink", None) is not None and 
> ctypes.windll.shell32.IsUserAnAdmin() != 0 if sys.platform == "win32" else 
> True
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42088) Running python3 setup.py sdist on windows reports a permission error

2023-01-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677216#comment-17677216
 ] 

Apache Spark commented on SPARK-42088:
--

User 'zekai-li' has created a pull request for this issue:
https://github.com/apache/spark/pull/39603

> Running python3 setup.py sdist on windows reports a permission error
> 
>
> Key: SPARK-42088
> URL: https://issues.apache.org/jira/browse/SPARK-42088
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: zheju_he
>Priority: Minor
>
> My system version is windows 10, and I can run setup.py with administrator 
> permissions, so there will be no error. However, it may be troublesome for us 
> to upgrade permissions with Windows Server, so we need to modify the code of 
> setup.py to ensure no error. To avoid the hassle of compiling for the user, I 
> suggest modifying the following code to enable the out-of-the-box effect
> {code:python}
> def _supports_symlinks():
> """Check if the system supports symlinks (e.g. *nix) or not."""
> return getattr(os, "symlink", None) is not None and 
> ctypes.windll.shell32.IsUserAnAdmin() != 0 if sys.platform == "win32" else 
> True
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42088) Running python3 setup.py sdist on windows reports a permission error

2023-01-16 Thread zheju_he (Jira)
zheju_he created SPARK-42088:


 Summary: Running python3 setup.py sdist on windows reports a 
permission error
 Key: SPARK-42088
 URL: https://issues.apache.org/jira/browse/SPARK-42088
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.4.0
Reporter: zheju_he


My system version is windows 10, and I can run setup.py with administrator 
permissions, so there will be no error. However, it may be troublesome for us 
to upgrade permissions with Windows Server, so we need to modify the code of 
setup.py to ensure no error. To avoid the hassle of compiling for the user, I 
suggest modifying the following code to enable the out-of-the-box effect
{code:python}
def _supports_symlinks():
"""Check if the system supports symlinks (e.g. *nix) or not."""
return getattr(os, "symlink", None) is not None and 
ctypes.windll.shell32.IsUserAnAdmin() != 0 if sys.platform == "win32" else True
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42085) Make `from_arrow_schema` support nested types

2023-01-16 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-42085.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39594
[https://github.com/apache/spark/pull/39594]

> Make `from_arrow_schema` support nested types
> -
>
> Key: SPARK-42085
> URL: https://issues.apache.org/jira/browse/SPARK-42085
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42085) Make `from_arrow_schema` support nested types

2023-01-16 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-42085:
-

Assignee: Ruifeng Zheng

> Make `from_arrow_schema` support nested types
> -
>
> Key: SPARK-42085
> URL: https://issues.apache.org/jira/browse/SPARK-42085
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2