[jira] [Commented] (SPARK-42567) Track state store provider load time and log warning if it exceeds a threshold

2023-02-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693354#comment-17693354
 ] 

Apache Spark commented on SPARK-42567:
--

User 'anishshri-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40163

> Track state store provider load time and log warning if it exceeds a threshold
> --
>
> Key: SPARK-42567
> URL: https://issues.apache.org/jira/browse/SPARK-42567
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.1
>Reporter: Anish Shrigondekar
>Priority: Major
>
> Track state store provider load time and log warning if it exceeds a threshold
>  
> In some cases, we see that the filesystem initialization might take time for 
> the first time that we create the provider and initialize it. This change 
> will log the time taken if it exceeds a certain threshold



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42566) RocksDB StateStore lock acquisition should happen after getting input iterator from inputRDD

2023-02-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693329#comment-17693329
 ] 

Apache Spark commented on SPARK-42566:
--

User 'huanliwang-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40162

> RocksDB StateStore lock acquisition should happen after getting input 
> iterator from inputRDD
> 
>
> Key: SPARK-42566
> URL: https://issues.apache.org/jira/browse/SPARK-42566
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Huanli Wang
>Priority: Minor
>
> The current behavior of the `{*}compute{*}` method in both 
> `{*}StateStoreRDD{*}` and `{*}ReadStateStoreRDD{*}` is: we first get the 
> state store instance and then get the input iterator for the inputRDD.
> For RocksDB state store, the running task will acquire and hold the lock for 
> this instance. The retried task or speculative task will fail to acquire the 
> lock and eventually abort the job if there are some network issues. For 
> example, When we shrink the executors, the alive one will try to fetch data 
> from the killed ones because it doesn't know the target location (prefetched 
> from the driver) is dead until it tries to fetch data. The query might be 
> hanging for a long time as the executor will retry 
> {{*spark.shuffle.io.maxRetries=3*}} times and for each retry wait for 
> {{*spark.shuffle.io.connectionTimeout*}} (default value is 120s) before 
> timeout. In total, the task could be hanging for about 6 minutes. And the 
> retried or speculative tasks won't be able to acquire the lock in this period.
> Making lock acquisition happen after retrieving the input iterator should be 
> able to avoid this situation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42566) RocksDB StateStore lock acquisition should happen after getting input iterator from inputRDD

2023-02-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42566:


Assignee: Apache Spark

> RocksDB StateStore lock acquisition should happen after getting input 
> iterator from inputRDD
> 
>
> Key: SPARK-42566
> URL: https://issues.apache.org/jira/browse/SPARK-42566
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Huanli Wang
>Assignee: Apache Spark
>Priority: Minor
>
> The current behavior of the `{*}compute{*}` method in both 
> `{*}StateStoreRDD{*}` and `{*}ReadStateStoreRDD{*}` is: we first get the 
> state store instance and then get the input iterator for the inputRDD.
> For RocksDB state store, the running task will acquire and hold the lock for 
> this instance. The retried task or speculative task will fail to acquire the 
> lock and eventually abort the job if there are some network issues. For 
> example, When we shrink the executors, the alive one will try to fetch data 
> from the killed ones because it doesn't know the target location (prefetched 
> from the driver) is dead until it tries to fetch data. The query might be 
> hanging for a long time as the executor will retry 
> {{*spark.shuffle.io.maxRetries=3*}} times and for each retry wait for 
> {{*spark.shuffle.io.connectionTimeout*}} (default value is 120s) before 
> timeout. In total, the task could be hanging for about 6 minutes. And the 
> retried or speculative tasks won't be able to acquire the lock in this period.
> Making lock acquisition happen after retrieving the input iterator should be 
> able to avoid this situation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42566) RocksDB StateStore lock acquisition should happen after getting input iterator from inputRDD

2023-02-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42566:


Assignee: (was: Apache Spark)

> RocksDB StateStore lock acquisition should happen after getting input 
> iterator from inputRDD
> 
>
> Key: SPARK-42566
> URL: https://issues.apache.org/jira/browse/SPARK-42566
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Huanli Wang
>Priority: Minor
>
> The current behavior of the `{*}compute{*}` method in both 
> `{*}StateStoreRDD{*}` and `{*}ReadStateStoreRDD{*}` is: we first get the 
> state store instance and then get the input iterator for the inputRDD.
> For RocksDB state store, the running task will acquire and hold the lock for 
> this instance. The retried task or speculative task will fail to acquire the 
> lock and eventually abort the job if there are some network issues. For 
> example, When we shrink the executors, the alive one will try to fetch data 
> from the killed ones because it doesn't know the target location (prefetched 
> from the driver) is dead until it tries to fetch data. The query might be 
> hanging for a long time as the executor will retry 
> {{*spark.shuffle.io.maxRetries=3*}} times and for each retry wait for 
> {{*spark.shuffle.io.connectionTimeout*}} (default value is 120s) before 
> timeout. In total, the task could be hanging for about 6 minutes. And the 
> retried or speculative tasks won't be able to acquire the lock in this period.
> Making lock acquisition happen after retrieving the input iterator should be 
> able to avoid this situation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42565) Error log improve ment for the lock acquisition of RocksDB state store instance

2023-02-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42565:


Assignee: Apache Spark

> Error log improve ment for the lock acquisition of RocksDB state store 
> instance
> ---
>
> Key: SPARK-42565
> URL: https://issues.apache.org/jira/browse/SPARK-42565
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Huanli Wang
>Assignee: Apache Spark
>Priority: Minor
>
>  
> {code:java}
> "23/02/23 23:57:44 INFO Executor: Running task 2.0 in stage 57.1 (TID 363)
> "23/02/23 23:58:44 ERROR RocksDB StateStoreId(opId=0,partId=3,name=default): 
> RocksDB instance could not be acquired by [ThreadId: Some(49), task: 3.0 in 
> stage 57, TID 363] as it was not released by [ThreadId: Some(51), task: 3.1 
> in stage 57, TID 342] after 60002 ms.{code}
>  
> We are seeing those error messages for a testing query. The *taskId != 
> partitionId* but we fail to be clear on this in the error log.
> It's confusing when we see those logs: the second log entry seems to talk 
> about `{*}task 3.0{*}` (it's actually partition 3 and retry attempt 0), but 
> the `{*}TID 363{*}` is already occupied by `{*}task 2.0 in stage 57.1{*}`.
>  
> Also, it's unclear at which stage retry attempt, the lock is acquired (or 
> fails to be acquired)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42565) Error log improve ment for the lock acquisition of RocksDB state store instance

2023-02-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693320#comment-17693320
 ] 

Apache Spark commented on SPARK-42565:
--

User 'huanliwang-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40161

> Error log improve ment for the lock acquisition of RocksDB state store 
> instance
> ---
>
> Key: SPARK-42565
> URL: https://issues.apache.org/jira/browse/SPARK-42565
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Huanli Wang
>Priority: Minor
>
>  
> {code:java}
> "23/02/23 23:57:44 INFO Executor: Running task 2.0 in stage 57.1 (TID 363)
> "23/02/23 23:58:44 ERROR RocksDB StateStoreId(opId=0,partId=3,name=default): 
> RocksDB instance could not be acquired by [ThreadId: Some(49), task: 3.0 in 
> stage 57, TID 363] as it was not released by [ThreadId: Some(51), task: 3.1 
> in stage 57, TID 342] after 60002 ms.{code}
>  
> We are seeing those error messages for a testing query. The *taskId != 
> partitionId* but we fail to be clear on this in the error log.
> It's confusing when we see those logs: the second log entry seems to talk 
> about `{*}task 3.0{*}` (it's actually partition 3 and retry attempt 0), but 
> the `{*}TID 363{*}` is already occupied by `{*}task 2.0 in stage 57.1{*}`.
>  
> Also, it's unclear at which stage retry attempt, the lock is acquired (or 
> fails to be acquired)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42565) Error log improve ment for the lock acquisition of RocksDB state store instance

2023-02-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42565:


Assignee: (was: Apache Spark)

> Error log improve ment for the lock acquisition of RocksDB state store 
> instance
> ---
>
> Key: SPARK-42565
> URL: https://issues.apache.org/jira/browse/SPARK-42565
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Huanli Wang
>Priority: Minor
>
>  
> {code:java}
> "23/02/23 23:57:44 INFO Executor: Running task 2.0 in stage 57.1 (TID 363)
> "23/02/23 23:58:44 ERROR RocksDB StateStoreId(opId=0,partId=3,name=default): 
> RocksDB instance could not be acquired by [ThreadId: Some(49), task: 3.0 in 
> stage 57, TID 363] as it was not released by [ThreadId: Some(51), task: 3.1 
> in stage 57, TID 342] after 60002 ms.{code}
>  
> We are seeing those error messages for a testing query. The *taskId != 
> partitionId* but we fail to be clear on this in the error log.
> It's confusing when we see those logs: the second log entry seems to talk 
> about `{*}task 3.0{*}` (it's actually partition 3 and retry attempt 0), but 
> the `{*}TID 363{*}` is already occupied by `{*}task 2.0 in stage 57.1{*}`.
>  
> Also, it's unclear at which stage retry attempt, the lock is acquired (or 
> fails to be acquired)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42509) WindowGroupLimitExec supports codegen

2023-02-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693180#comment-17693180
 ] 

Apache Spark commented on SPARK-42509:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/40159

> WindowGroupLimitExec supports codegen
> -
>
> Key: SPARK-42509
> URL: https://issues.apache.org/jira/browse/SPARK-42509
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42488) Upgrade commons-crypto from 1.1.0 to 1.2.0

2023-02-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42488:


Assignee: (was: Apache Spark)

> Upgrade commons-crypto from 1.1.0 to 1.2.0
> --
>
> Key: SPARK-42488
> URL: https://issues.apache.org/jira/browse/SPARK-42488
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>
> https://github.com/apache/commons-crypto/compare/rel/commons-crypto-1.1.0...rel/commons-crypto-1.2.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42488) Upgrade commons-crypto from 1.1.0 to 1.2.0

2023-02-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42488:


Assignee: Apache Spark

> Upgrade commons-crypto from 1.1.0 to 1.2.0
> --
>
> Key: SPARK-42488
> URL: https://issues.apache.org/jira/browse/SPARK-42488
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> https://github.com/apache/commons-crypto/compare/rel/commons-crypto-1.1.0...rel/commons-crypto-1.2.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42551) Support subexpression elimination in FilterExec

2023-02-23 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693038#comment-17693038
 ] 

Apache Spark commented on SPARK-42551:
--

User 'wankunde' has created a pull request for this issue:
https://github.com/apache/spark/pull/40157

> Support subexpression elimination in FilterExec
> ---
>
> Key: SPARK-42551
> URL: https://issues.apache.org/jira/browse/SPARK-42551
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: Wan Kun
>Priority: Major
>
> Just like SPARK-33092, We can support subexpression elimination in FilterExec 
> in Whole-stage codegen.
> For example:
> {code:java}
> SELECT * FROM (
>   SELECT v, v * v + 1 v1 from values(1) as t2(v)
> ) t
> where v > 0 and v1 > 5 and v1 < 10
> Codegen plan
> {code:java}
> *(1) Project [v#1, ((v#1 * v#1) + 1) AS v1#0]
> +- *(1) Filter (((v#1 > 0) AND (((v#1 * v#1) + 1) > 5)) AND (((v#1 * v#1) + 
> 1) < 10))
>+- *(1) LocalTableScan [v#1]
> {code}
> The subexpression *(v#1 * v#1) + 1* will be execute twice times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42551) Support subexpression elimination in FilterExec

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42551:


Assignee: (was: Apache Spark)

> Support subexpression elimination in FilterExec
> ---
>
> Key: SPARK-42551
> URL: https://issues.apache.org/jira/browse/SPARK-42551
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: Wan Kun
>Priority: Major
>
> Just like SPARK-33092, We can support subexpression elimination in FilterExec 
> in Whole-stage codegen.
> For example:
> {code:java}
> SELECT * FROM (
>   SELECT v, v * v + 1 v1 from values(1) as t2(v)
> ) t
> where v > 0 and v1 > 5 and v1 < 10
> Codegen plan
> {code:java}
> *(1) Project [v#1, ((v#1 * v#1) + 1) AS v1#0]
> +- *(1) Filter (((v#1 > 0) AND (((v#1 * v#1) + 1) > 5)) AND (((v#1 * v#1) + 
> 1) < 10))
>+- *(1) LocalTableScan [v#1]
> {code}
> The subexpression *(v#1 * v#1) + 1* will be execute twice times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42551) Support subexpression elimination in FilterExec

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42551:


Assignee: Apache Spark

> Support subexpression elimination in FilterExec
> ---
>
> Key: SPARK-42551
> URL: https://issues.apache.org/jira/browse/SPARK-42551
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: Wan Kun
>Assignee: Apache Spark
>Priority: Major
>
> Just like SPARK-33092, We can support subexpression elimination in FilterExec 
> in Whole-stage codegen.
> For example:
> {code:java}
> SELECT * FROM (
>   SELECT v, v * v + 1 v1 from values(1) as t2(v)
> ) t
> where v > 0 and v1 > 5 and v1 < 10
> Codegen plan
> {code:java}
> *(1) Project [v#1, ((v#1 * v#1) + 1) AS v1#0]
> +- *(1) Filter (((v#1 > 0) AND (((v#1 * v#1) + 1) > 5)) AND (((v#1 * v#1) + 
> 1) < 10))
>+- *(1) LocalTableScan [v#1]
> {code}
> The subexpression *(v#1 * v#1) + 1* will be execute twice times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41823) DataFrame.join creating ambiguous column names

2023-02-23 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693005#comment-17693005
 ] 

Apache Spark commented on SPARK-41823:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/40156

> DataFrame.join creating ambiguous column names
> --
>
> Key: SPARK-41823
> URL: https://issues.apache.org/jira/browse/SPARK-41823
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Ruifeng Zheng
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 254, in pyspark.sql.connect.dataframe.DataFrame.drop
> Failed example:
>     df.join(df2, df.name == df2.name, 'inner').drop('name').show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 
> 1, in 
>         df.join(df2, df.name == df2.name, 'inner').drop('name').show()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could be: [`name`, 
> `name`].
>     Plan: {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42534) Fix DB2 Limit clause

2023-02-23 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693004#comment-17693004
 ] 

Apache Spark commented on SPARK-42534:
--

User 'sadikovi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40155

> Fix DB2 Limit clause
> 
>
> Key: SPARK-42534
> URL: https://issues.apache.org/jira/browse/SPARK-42534
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Ivan Sadikov
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42548) Add PlainReferences to skip rewriting attributes

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42548:


Assignee: Apache Spark

> Add PlainReferences to skip rewriting attributes
> 
>
> Key: SPARK-42548
> URL: https://issues.apache.org/jira/browse/SPARK-42548
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42548) Add PlainReferences to skip rewriting attributes

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42548:


Assignee: (was: Apache Spark)

> Add PlainReferences to skip rewriting attributes
> 
>
> Key: SPARK-42548
> URL: https://issues.apache.org/jira/browse/SPARK-42548
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42548) Add PlainReferences to skip rewriting attributes

2023-02-23 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692992#comment-17692992
 ] 

Apache Spark commented on SPARK-42548:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/40154

> Add PlainReferences to skip rewriting attributes
> 
>
> Key: SPARK-42548
> URL: https://issues.apache.org/jira/browse/SPARK-42548
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42547) Make PySpark working with Python 3.7

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42547:


Assignee: (was: Apache Spark)

> Make PySpark working with Python 3.7
> 
>
> Key: SPARK-42547
> URL: https://issues.apache.org/jira/browse/SPARK-42547
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Blocker
>
> {code}
> + ./python/run-tests --python-executables=python3
> Running PySpark tests. Output is in /home/ec2-user/spark/python/unit-tests.log
> Will test against the following Python executables: ['python3']
> Will test the following Python modules: ['pyspark-connect', 'pyspark-core', 
> 'pyspark-errors', 'pyspark-ml', 'pyspark-mllib', 'pyspark-pandas', 
> 'pyspark-pandas-slow', 'pyspark-resource', 'pyspark-sql', 'pyspark-streaming']
> python3 python_implementation is CPython
> python3 version is: Python 3.7.16
> Starting test(python3): pyspark.ml.tests.test_feature (temp output: 
> /home/ec2-user/spark/python/target/8ca9ab1a-05cc-4845-bf89-30d9001510bc/python3__pyspark.ml.tests.test_feature__kg6sseie.log)
> Starting test(python3): pyspark.ml.tests.test_base (temp output: 
> /home/ec2-user/spark/python/target/f2264f3b-6b26-4e61-9452-8d6ddd7eb002/python3__pyspark.ml.tests.test_base__0902zf9_.log)
> Starting test(python3): pyspark.ml.tests.test_algorithms (temp output: 
> /home/ec2-user/spark/python/target/d1dc4e07-e58c-4c03-abe5-09d8fab22e6a/python3__pyspark.ml.tests.test_algorithms__lh3wb2u8.log)
> Starting test(python3): pyspark.ml.tests.test_evaluation (temp output: 
> /home/ec2-user/spark/python/target/3f42dc79-c945-4cf2-a1eb-83e72b40a9ee/python3__pyspark.ml.tests.test_evaluation__89idc7fa.log)
> Finished test(python3): pyspark.ml.tests.test_base (16s)
> Starting test(python3): pyspark.ml.tests.test_functions (temp output: 
> /home/ec2-user/spark/python/target/5a3b90f0-216b-4edd-9d15-6619d3e03300/python3__pyspark.ml.tests.test_functions__g5u1290s.log)
> Traceback (most recent call last):
>   File "/usr/lib64/python3.7/runpy.py", line 193, in _run_module_as_main
> "__main__", mod_spec)
>   File "/usr/lib64/python3.7/runpy.py", line 85, in _run_code
> exec(code, run_globals)
>   File "/home/ec2-user/spark/python/pyspark/ml/tests/test_functions.py", line 
> 21, in 
> from pyspark.ml.functions import predict_batch_udf
>   File "/home/ec2-user/spark/python/pyspark/ml/functions.py", line 38, in 
> 
> from typing import Any, Callable, Iterator, List, Mapping, Protocol, 
> TYPE_CHECKING, Tuple, Union
> ImportError: cannot import name 'Protocol' from 'typing' 
> (/usr/lib64/python3.7/typing.py)
> Had test failures in pyspark.ml.tests.test_functions with python3; see logs.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42547) Make PySpark working with Python 3.7

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42547:


Assignee: Apache Spark

> Make PySpark working with Python 3.7
> 
>
> Key: SPARK-42547
> URL: https://issues.apache.org/jira/browse/SPARK-42547
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Blocker
>
> {code}
> + ./python/run-tests --python-executables=python3
> Running PySpark tests. Output is in /home/ec2-user/spark/python/unit-tests.log
> Will test against the following Python executables: ['python3']
> Will test the following Python modules: ['pyspark-connect', 'pyspark-core', 
> 'pyspark-errors', 'pyspark-ml', 'pyspark-mllib', 'pyspark-pandas', 
> 'pyspark-pandas-slow', 'pyspark-resource', 'pyspark-sql', 'pyspark-streaming']
> python3 python_implementation is CPython
> python3 version is: Python 3.7.16
> Starting test(python3): pyspark.ml.tests.test_feature (temp output: 
> /home/ec2-user/spark/python/target/8ca9ab1a-05cc-4845-bf89-30d9001510bc/python3__pyspark.ml.tests.test_feature__kg6sseie.log)
> Starting test(python3): pyspark.ml.tests.test_base (temp output: 
> /home/ec2-user/spark/python/target/f2264f3b-6b26-4e61-9452-8d6ddd7eb002/python3__pyspark.ml.tests.test_base__0902zf9_.log)
> Starting test(python3): pyspark.ml.tests.test_algorithms (temp output: 
> /home/ec2-user/spark/python/target/d1dc4e07-e58c-4c03-abe5-09d8fab22e6a/python3__pyspark.ml.tests.test_algorithms__lh3wb2u8.log)
> Starting test(python3): pyspark.ml.tests.test_evaluation (temp output: 
> /home/ec2-user/spark/python/target/3f42dc79-c945-4cf2-a1eb-83e72b40a9ee/python3__pyspark.ml.tests.test_evaluation__89idc7fa.log)
> Finished test(python3): pyspark.ml.tests.test_base (16s)
> Starting test(python3): pyspark.ml.tests.test_functions (temp output: 
> /home/ec2-user/spark/python/target/5a3b90f0-216b-4edd-9d15-6619d3e03300/python3__pyspark.ml.tests.test_functions__g5u1290s.log)
> Traceback (most recent call last):
>   File "/usr/lib64/python3.7/runpy.py", line 193, in _run_module_as_main
> "__main__", mod_spec)
>   File "/usr/lib64/python3.7/runpy.py", line 85, in _run_code
> exec(code, run_globals)
>   File "/home/ec2-user/spark/python/pyspark/ml/tests/test_functions.py", line 
> 21, in 
> from pyspark.ml.functions import predict_batch_udf
>   File "/home/ec2-user/spark/python/pyspark/ml/functions.py", line 38, in 
> 
> from typing import Any, Callable, Iterator, List, Mapping, Protocol, 
> TYPE_CHECKING, Tuple, Union
> ImportError: cannot import name 'Protocol' from 'typing' 
> (/usr/lib64/python3.7/typing.py)
> Had test failures in pyspark.ml.tests.test_functions with python3; see logs.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42547) Make PySpark working with Python 3.7

2023-02-23 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692988#comment-17692988
 ] 

Apache Spark commented on SPARK-42547:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/40153

> Make PySpark working with Python 3.7
> 
>
> Key: SPARK-42547
> URL: https://issues.apache.org/jira/browse/SPARK-42547
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Blocker
>
> {code}
> + ./python/run-tests --python-executables=python3
> Running PySpark tests. Output is in /home/ec2-user/spark/python/unit-tests.log
> Will test against the following Python executables: ['python3']
> Will test the following Python modules: ['pyspark-connect', 'pyspark-core', 
> 'pyspark-errors', 'pyspark-ml', 'pyspark-mllib', 'pyspark-pandas', 
> 'pyspark-pandas-slow', 'pyspark-resource', 'pyspark-sql', 'pyspark-streaming']
> python3 python_implementation is CPython
> python3 version is: Python 3.7.16
> Starting test(python3): pyspark.ml.tests.test_feature (temp output: 
> /home/ec2-user/spark/python/target/8ca9ab1a-05cc-4845-bf89-30d9001510bc/python3__pyspark.ml.tests.test_feature__kg6sseie.log)
> Starting test(python3): pyspark.ml.tests.test_base (temp output: 
> /home/ec2-user/spark/python/target/f2264f3b-6b26-4e61-9452-8d6ddd7eb002/python3__pyspark.ml.tests.test_base__0902zf9_.log)
> Starting test(python3): pyspark.ml.tests.test_algorithms (temp output: 
> /home/ec2-user/spark/python/target/d1dc4e07-e58c-4c03-abe5-09d8fab22e6a/python3__pyspark.ml.tests.test_algorithms__lh3wb2u8.log)
> Starting test(python3): pyspark.ml.tests.test_evaluation (temp output: 
> /home/ec2-user/spark/python/target/3f42dc79-c945-4cf2-a1eb-83e72b40a9ee/python3__pyspark.ml.tests.test_evaluation__89idc7fa.log)
> Finished test(python3): pyspark.ml.tests.test_base (16s)
> Starting test(python3): pyspark.ml.tests.test_functions (temp output: 
> /home/ec2-user/spark/python/target/5a3b90f0-216b-4edd-9d15-6619d3e03300/python3__pyspark.ml.tests.test_functions__g5u1290s.log)
> Traceback (most recent call last):
>   File "/usr/lib64/python3.7/runpy.py", line 193, in _run_module_as_main
> "__main__", mod_spec)
>   File "/usr/lib64/python3.7/runpy.py", line 85, in _run_code
> exec(code, run_globals)
>   File "/home/ec2-user/spark/python/pyspark/ml/tests/test_functions.py", line 
> 21, in 
> from pyspark.ml.functions import predict_batch_udf
>   File "/home/ec2-user/spark/python/pyspark/ml/functions.py", line 38, in 
> 
> from typing import Any, Callable, Iterator, List, Mapping, Protocol, 
> TYPE_CHECKING, Tuple, Union
> ImportError: cannot import name 'Protocol' from 'typing' 
> (/usr/lib64/python3.7/typing.py)
> Had test failures in pyspark.ml.tests.test_functions with python3; see logs.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42545) Remove `experimental` from Volcano docs

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42545:


Assignee: Apache Spark

> Remove `experimental` from Volcano docs
> ---
>
> Key: SPARK-42545
> URL: https://issues.apache.org/jira/browse/SPARK-42545
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42545) Remove `experimental` from Volcano docs

2023-02-23 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692963#comment-17692963
 ] 

Apache Spark commented on SPARK-42545:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40152

> Remove `experimental` from Volcano docs
> ---
>
> Key: SPARK-42545
> URL: https://issues.apache.org/jira/browse/SPARK-42545
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42545) Remove `experimental` from Volcano docs

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42545:


Assignee: (was: Apache Spark)

> Remove `experimental` from Volcano docs
> ---
>
> Key: SPARK-42545
> URL: https://issues.apache.org/jira/browse/SPARK-42545
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42121) Add built-in table-valued functions posexplode and posexplode_outer

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42121:


Assignee: Apache Spark

> Add built-in table-valued functions posexplode and posexplode_outer
> ---
>
> Key: SPARK-42121
> URL: https://issues.apache.org/jira/browse/SPARK-42121
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Assignee: Apache Spark
>Priority: Major
>
> Add `posexplode` and `posexplode_outer` to the built-in table function 
> registry.
> Add new SQL tests in `table-valued-functions.sql` and `join-lateral.sql`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42121) Add built-in table-valued functions posexplode and posexplode_outer

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42121:


Assignee: (was: Apache Spark)

> Add built-in table-valued functions posexplode and posexplode_outer
> ---
>
> Key: SPARK-42121
> URL: https://issues.apache.org/jira/browse/SPARK-42121
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Priority: Major
>
> Add `posexplode` and `posexplode_outer` to the built-in table function 
> registry.
> Add new SQL tests in `table-valued-functions.sql` and `join-lateral.sql`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42121) Add built-in table-valued functions posexplode and posexplode_outer

2023-02-23 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692961#comment-17692961
 ] 

Apache Spark commented on SPARK-42121:
--

User 'allisonwang-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40151

> Add built-in table-valued functions posexplode and posexplode_outer
> ---
>
> Key: SPARK-42121
> URL: https://issues.apache.org/jira/browse/SPARK-42121
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Priority: Major
>
> Add `posexplode` and `posexplode_outer` to the built-in table function 
> registry.
> Add new SQL tests in `table-valued-functions.sql` and `join-lateral.sql`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41834) Implement SparkSession.conf

2023-02-23 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692947#comment-17692947
 ] 

Apache Spark commented on SPARK-41834:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40150

> Implement SparkSession.conf
> ---
>
> Key: SPARK-41834
> URL: https://issues.apache.org/jira/browse/SPARK-41834
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 2119, in pyspark.sql.connect.functions.unix_timestamp
> Failed example:
>     spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 
> 1, in 
>         spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
>     AttributeError: 'SparkSession' object has no attribute 'conf'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41834) Implement SparkSession.conf

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41834:


Assignee: Apache Spark

> Implement SparkSession.conf
> ---
>
> Key: SPARK-41834
> URL: https://issues.apache.org/jira/browse/SPARK-41834
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 2119, in pyspark.sql.connect.functions.unix_timestamp
> Failed example:
>     spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 
> 1, in 
>         spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
>     AttributeError: 'SparkSession' object has no attribute 'conf'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41834) Implement SparkSession.conf

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41834:


Assignee: (was: Apache Spark)

> Implement SparkSession.conf
> ---
>
> Key: SPARK-41834
> URL: https://issues.apache.org/jira/browse/SPARK-41834
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 2119, in pyspark.sql.connect.functions.unix_timestamp
> Failed example:
>     spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 
> 1, in 
>         spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
>     AttributeError: 'SparkSession' object has no attribute 'conf'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41834) Implement SparkSession.conf

2023-02-23 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692946#comment-17692946
 ] 

Apache Spark commented on SPARK-41834:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40150

> Implement SparkSession.conf
> ---
>
> Key: SPARK-41834
> URL: https://issues.apache.org/jira/browse/SPARK-41834
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 2119, in pyspark.sql.connect.functions.unix_timestamp
> Failed example:
>     spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 
> 1, in 
>         spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
>     AttributeError: 'SparkSession' object has no attribute 'conf'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42122) Add built-in table-valued function stack

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42122:


Assignee: (was: Apache Spark)

> Add built-in table-valued function stack
> 
>
> Key: SPARK-42122
> URL: https://issues.apache.org/jira/browse/SPARK-42122
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Priority: Major
>
> Add `stack` to the built-in table function registry.
> Add new SQL tests in `table-valued-functions.sql` and `join-lateral.sql`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42122) Add built-in table-valued function stack

2023-02-23 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692945#comment-17692945
 ] 

Apache Spark commented on SPARK-42122:
--

User 'allisonwang-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40149

> Add built-in table-valued function stack
> 
>
> Key: SPARK-42122
> URL: https://issues.apache.org/jira/browse/SPARK-42122
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Priority: Major
>
> Add `stack` to the built-in table function registry.
> Add new SQL tests in `table-valued-functions.sql` and `join-lateral.sql`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42122) Add built-in table-valued function stack

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42122:


Assignee: Apache Spark

> Add built-in table-valued function stack
> 
>
> Key: SPARK-42122
> URL: https://issues.apache.org/jira/browse/SPARK-42122
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Assignee: Apache Spark
>Priority: Major
>
> Add `stack` to the built-in table function registry.
> Add new SQL tests in `table-valued-functions.sql` and `join-lateral.sql`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42544) Spark Connect Scala Client: support parameterized SQL

2023-02-23 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692937#comment-17692937
 ] 

Apache Spark commented on SPARK-42544:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/40148

> Spark Connect Scala Client: support parameterized SQL
> -
>
> Key: SPARK-42544
> URL: https://issues.apache.org/jira/browse/SPARK-42544
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42544) Spark Connect Scala Client: support parameterized SQL

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42544:


Assignee: Apache Spark  (was: Rui Wang)

> Spark Connect Scala Client: support parameterized SQL
> -
>
> Key: SPARK-42544
> URL: https://issues.apache.org/jira/browse/SPARK-42544
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42544) Spark Connect Scala Client: support parameterized SQL

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42544:


Assignee: Rui Wang  (was: Apache Spark)

> Spark Connect Scala Client: support parameterized SQL
> -
>
> Key: SPARK-42544
> URL: https://issues.apache.org/jira/browse/SPARK-42544
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42544) Spark Connect Scala Client: support parameterized SQL

2023-02-23 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692936#comment-17692936
 ] 

Apache Spark commented on SPARK-42544:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/40148

> Spark Connect Scala Client: support parameterized SQL
> -
>
> Key: SPARK-42544
> URL: https://issues.apache.org/jira/browse/SPARK-42544
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42543) Specify protocol for UDF artifact transfer in JVM/Scala client

2023-02-23 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692924#comment-17692924
 ] 

Apache Spark commented on SPARK-42543:
--

User 'vicennial' has created a pull request for this issue:
https://github.com/apache/spark/pull/40147

> Specify protocol for UDF artifact transfer in JVM/Scala client 
> ---
>
> Key: SPARK-42543
> URL: https://issues.apache.org/jira/browse/SPARK-42543
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> An "artifact" is any file that may be used during the execution of a UDF.
> In the decoupled client-server architecture of Spark Connect, a remote client 
> may use a local JAR or a new class in their UDF that may not be present on 
> the server. To handle these cases of missing "artifacts", a protocol for 
> artifact transfer is needed to move the required artifacts from the client 
> side over to the server side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42543) Specify protocol for UDF artifact transfer in JVM/Scala client

2023-02-23 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692923#comment-17692923
 ] 

Apache Spark commented on SPARK-42543:
--

User 'vicennial' has created a pull request for this issue:
https://github.com/apache/spark/pull/40147

> Specify protocol for UDF artifact transfer in JVM/Scala client 
> ---
>
> Key: SPARK-42543
> URL: https://issues.apache.org/jira/browse/SPARK-42543
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> An "artifact" is any file that may be used during the execution of a UDF.
> In the decoupled client-server architecture of Spark Connect, a remote client 
> may use a local JAR or a new class in their UDF that may not be present on 
> the server. To handle these cases of missing "artifacts", a protocol for 
> artifact transfer is needed to move the required artifacts from the client 
> side over to the server side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42543) Specify protocol for UDF artifact transfer in JVM/Scala client

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42543:


Assignee: (was: Apache Spark)

> Specify protocol for UDF artifact transfer in JVM/Scala client 
> ---
>
> Key: SPARK-42543
> URL: https://issues.apache.org/jira/browse/SPARK-42543
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> An "artifact" is any file that may be used during the execution of a UDF.
> In the decoupled client-server architecture of Spark Connect, a remote client 
> may use a local JAR or a new class in their UDF that may not be present on 
> the server. To handle these cases of missing "artifacts", a protocol for 
> artifact transfer is needed to move the required artifacts from the client 
> side over to the server side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42543) Specify protocol for UDF artifact transfer in JVM/Scala client

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42543:


Assignee: Apache Spark

> Specify protocol for UDF artifact transfer in JVM/Scala client 
> ---
>
> Key: SPARK-42543
> URL: https://issues.apache.org/jira/browse/SPARK-42543
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Venkata Sai Akhil Gudesa
>Assignee: Apache Spark
>Priority: Major
>
> An "artifact" is any file that may be used during the execution of a UDF.
> In the decoupled client-server architecture of Spark Connect, a remote client 
> may use a local JAR or a new class in their UDF that may not be present on 
> the server. To handle these cases of missing "artifacts", a protocol for 
> artifact transfer is needed to move the required artifacts from the client 
> side over to the server side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42120) Add built-in table-valued function json_tuple

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42120:


Assignee: Apache Spark

> Add built-in table-valued function json_tuple
> -
>
> Key: SPARK-42120
> URL: https://issues.apache.org/jira/browse/SPARK-42120
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Assignee: Apache Spark
>Priority: Major
>
> Add `json_tuple` to the built-in table function registry.
> Add new SQL tests in `table-valued-functions.sql` and `join-lateral.sql`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42120) Add built-in table-valued function json_tuple

2023-02-23 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692921#comment-17692921
 ] 

Apache Spark commented on SPARK-42120:
--

User 'allisonwang-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40146

> Add built-in table-valued function json_tuple
> -
>
> Key: SPARK-42120
> URL: https://issues.apache.org/jira/browse/SPARK-42120
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Priority: Major
>
> Add `json_tuple` to the built-in table function registry.
> Add new SQL tests in `table-valued-functions.sql` and `join-lateral.sql`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42120) Add built-in table-valued function json_tuple

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42120:


Assignee: (was: Apache Spark)

> Add built-in table-valued function json_tuple
> -
>
> Key: SPARK-42120
> URL: https://issues.apache.org/jira/browse/SPARK-42120
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Priority: Major
>
> Add `json_tuple` to the built-in table function registry.
> Add new SQL tests in `table-valued-functions.sql` and `join-lateral.sql`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42541) Support Pivot with provided pivot column values

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42541:


Assignee: Apache Spark  (was: Rui Wang)

> Support Pivot with provided pivot column values
> ---
>
> Key: SPARK-42541
> URL: https://issues.apache.org/jira/browse/SPARK-42541
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42541) Support Pivot with provided pivot column values

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42541:


Assignee: Rui Wang  (was: Apache Spark)

> Support Pivot with provided pivot column values
> ---
>
> Key: SPARK-42541
> URL: https://issues.apache.org/jira/browse/SPARK-42541
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42541) Support Pivot with provided pivot column values

2023-02-23 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692913#comment-17692913
 ] 

Apache Spark commented on SPARK-42541:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/40145

> Support Pivot with provided pivot column values
> ---
>
> Key: SPARK-42541
> URL: https://issues.apache.org/jira/browse/SPARK-42541
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42473) An explicit cast will be needed when INSERT OVERWRITE SELECT UNION ALL

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42473:


Assignee: Apache Spark

> An explicit cast will be needed when INSERT OVERWRITE SELECT UNION ALL
> --
>
> Key: SPARK-42473
> URL: https://issues.apache.org/jira/browse/SPARK-42473
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 3.3.1
> Environment: spark 3.3.1
>Reporter: kevinshin
>Assignee: Apache Spark
>Priority: Major
>
> *when 'union all' and one select statement use* *Literal as column value , 
> the other* *select statement  has computed expression at the same column , 
> then the whole statement will compile failed. A explicit cast will be needed.*
> for example:
> {color:#4c9aff}explain{color}
> {color:#4c9aff}*INSERT* OVERWRITE *TABLE* test.spark33_decimal_orc{color}
> {color:#4c9aff}*select* *null* *as* amt1, {*}cast{*}('256.99' *as* 
> {*}decimal{*}(20,8)) *as* amt2{color}
> {color:#4c9aff}*union* *all*{color}
> {color:#4c9aff}*select* {*}cast{*}('200.99' *as* 
> {*}decimal{*}(20,8)){*}/{*}100 *as* amt1,{*}cast{*}('256.99' *as* 
> {*}decimal{*}(20,8)) *as* amt2;{color}
> *will got error :* 
> org.apache.spark.{*}sql{*}.catalyst.expressions.Literal cannot be *cast* *to* 
> org.apache.spark.{*}sql{*}.catalyst.expressions.AnsiCast
> The SQL will need to change to : 
> {color:#4c9aff}explain{color}
> {color:#4c9aff}*INSERT* OVERWRITE *TABLE* test.spark33_decimal_orc{color}
> {color:#4c9aff}*select* *null* *as* amt1,{*}cast{*}('256.99' *as* 
> {*}decimal{*}(20,8)) *as* amt2{color}
> {color:#4c9aff}*union* *all*{color}
> {color:#4c9aff}*select* {color:#de350b}{*}cast{*}({color}{*}cast{*}('200.99' 
> *as* {*}decimal{*}(20,8)){*}/{*}100 *as* 
> {*}decimal{*}(20,8){color:#de350b}){color} *as* amt1,{*}cast{*}('256.99' *as* 
> {*}decimal{*}(20,8)) *as* amt2;{color}
>  
> *but this is not need in spark3.2.1 , is this a bug for spark3.3.1 ?* 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42473) An explicit cast will be needed when INSERT OVERWRITE SELECT UNION ALL

2023-02-23 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692857#comment-17692857
 ] 

Apache Spark commented on SPARK-42473:
--

User 'RunyaoChen' has created a pull request for this issue:
https://github.com/apache/spark/pull/40140

> An explicit cast will be needed when INSERT OVERWRITE SELECT UNION ALL
> --
>
> Key: SPARK-42473
> URL: https://issues.apache.org/jira/browse/SPARK-42473
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 3.3.1
> Environment: spark 3.3.1
>Reporter: kevinshin
>Priority: Major
>
> *when 'union all' and one select statement use* *Literal as column value , 
> the other* *select statement  has computed expression at the same column , 
> then the whole statement will compile failed. A explicit cast will be needed.*
> for example:
> {color:#4c9aff}explain{color}
> {color:#4c9aff}*INSERT* OVERWRITE *TABLE* test.spark33_decimal_orc{color}
> {color:#4c9aff}*select* *null* *as* amt1, {*}cast{*}('256.99' *as* 
> {*}decimal{*}(20,8)) *as* amt2{color}
> {color:#4c9aff}*union* *all*{color}
> {color:#4c9aff}*select* {*}cast{*}('200.99' *as* 
> {*}decimal{*}(20,8)){*}/{*}100 *as* amt1,{*}cast{*}('256.99' *as* 
> {*}decimal{*}(20,8)) *as* amt2;{color}
> *will got error :* 
> org.apache.spark.{*}sql{*}.catalyst.expressions.Literal cannot be *cast* *to* 
> org.apache.spark.{*}sql{*}.catalyst.expressions.AnsiCast
> The SQL will need to change to : 
> {color:#4c9aff}explain{color}
> {color:#4c9aff}*INSERT* OVERWRITE *TABLE* test.spark33_decimal_orc{color}
> {color:#4c9aff}*select* *null* *as* amt1,{*}cast{*}('256.99' *as* 
> {*}decimal{*}(20,8)) *as* amt2{color}
> {color:#4c9aff}*union* *all*{color}
> {color:#4c9aff}*select* {color:#de350b}{*}cast{*}({color}{*}cast{*}('200.99' 
> *as* {*}decimal{*}(20,8)){*}/{*}100 *as* 
> {*}decimal{*}(20,8){color:#de350b}){color} *as* amt1,{*}cast{*}('256.99' *as* 
> {*}decimal{*}(20,8)) *as* amt2;{color}
>  
> *but this is not need in spark3.2.1 , is this a bug for spark3.3.1 ?* 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42473) An explicit cast will be needed when INSERT OVERWRITE SELECT UNION ALL

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42473:


Assignee: (was: Apache Spark)

> An explicit cast will be needed when INSERT OVERWRITE SELECT UNION ALL
> --
>
> Key: SPARK-42473
> URL: https://issues.apache.org/jira/browse/SPARK-42473
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 3.3.1
> Environment: spark 3.3.1
>Reporter: kevinshin
>Priority: Major
>
> *when 'union all' and one select statement use* *Literal as column value , 
> the other* *select statement  has computed expression at the same column , 
> then the whole statement will compile failed. A explicit cast will be needed.*
> for example:
> {color:#4c9aff}explain{color}
> {color:#4c9aff}*INSERT* OVERWRITE *TABLE* test.spark33_decimal_orc{color}
> {color:#4c9aff}*select* *null* *as* amt1, {*}cast{*}('256.99' *as* 
> {*}decimal{*}(20,8)) *as* amt2{color}
> {color:#4c9aff}*union* *all*{color}
> {color:#4c9aff}*select* {*}cast{*}('200.99' *as* 
> {*}decimal{*}(20,8)){*}/{*}100 *as* amt1,{*}cast{*}('256.99' *as* 
> {*}decimal{*}(20,8)) *as* amt2;{color}
> *will got error :* 
> org.apache.spark.{*}sql{*}.catalyst.expressions.Literal cannot be *cast* *to* 
> org.apache.spark.{*}sql{*}.catalyst.expressions.AnsiCast
> The SQL will need to change to : 
> {color:#4c9aff}explain{color}
> {color:#4c9aff}*INSERT* OVERWRITE *TABLE* test.spark33_decimal_orc{color}
> {color:#4c9aff}*select* *null* *as* amt1,{*}cast{*}('256.99' *as* 
> {*}decimal{*}(20,8)) *as* amt2{color}
> {color:#4c9aff}*union* *all*{color}
> {color:#4c9aff}*select* {color:#de350b}{*}cast{*}({color}{*}cast{*}('200.99' 
> *as* {*}decimal{*}(20,8)){*}/{*}100 *as* 
> {*}decimal{*}(20,8){color:#de350b}){color} *as* amt1,{*}cast{*}('256.99' *as* 
> {*}decimal{*}(20,8)) *as* amt2;{color}
>  
> *but this is not need in spark3.2.1 , is this a bug for spark3.3.1 ?* 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41991) Interpreted mode subexpression elimination can throw exception during insert

2023-02-23 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692854#comment-17692854
 ] 

Apache Spark commented on SPARK-41991:
--

User 'RunyaoChen' has created a pull request for this issue:
https://github.com/apache/spark/pull/40140

> Interpreted mode subexpression elimination can throw exception during insert
> 
>
> Key: SPARK-41991
> URL: https://issues.apache.org/jira/browse/SPARK-41991
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Major
> Fix For: 3.4.0
>
>
> Example:
> {noformat}
> drop table if exists tbl1;
> create table tbl1 (a int, b int) using parquet;
> set spark.sql.codegen.wholeStage=false;
> set spark.sql.codegen.factoryMode=NO_CODEGEN;
> insert into tbl1
> select id as a, id as b
> from range(1, 5);
> {noformat}
> This results in the following exception:
> {noformat}
> java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.expressions.ExpressionProxy cannot be cast to 
> org.apache.spark.sql.catalyst.expressions.Cast
>   at 
> org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2514)
>   at 
> org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2512)
> {noformat}
> The query produces 2 bigint values, but the table's schema expects 2 int 
> values, so Spark wraps each output field with a {{Cast}}.
> Later, in {{InterpretedUnsafeProjection}}, {{prepareExpressions}} tries to 
> wrap the two {{Cast}} expressions with an {{ExpressionProxy}}. However, the 
> parent expression of each {{Cast}} is a {{CheckOverflowInTableInsert}} 
> expression, which does not accept {{ExpressionProxy}} as a child.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42539) User-provided JARs can override Spark's Hive metadata client JARs when using "builtin"

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42539:


Assignee: Apache Spark

> User-provided JARs can override Spark's Hive metadata client JARs when using 
> "builtin"
> --
>
> Key: SPARK-42539
> URL: https://issues.apache.org/jira/browse/SPARK-42539
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.3, 3.2.3, 3.3.2
>Reporter: Erik Krogen
>Assignee: Apache Spark
>Priority: Major
>
> Recently we observed that on version 3.2.0 and Java 8, it is possible for 
> user-provided Hive JARs to break the ability for Spark, via the Hive metadata 
> client / {{IsolatedClientLoader}}, to communicate with Hive Metastore, when 
> using the default behavior of the "builtin" Hive version. After SPARK-35321, 
> when Spark is compiled against Hive >= 2.3.9 and the "builtin" Hive client 
> version is used, we will call the method {{Hive.getWithoutRegisterFns()}} 
> (from HIVE-21563) instead of {{Hive.get()}}. If the user has included, for 
> example, {{hive-exec-2.3.8.jar}} on their classpath, the client will break 
> with a {{NoSuchMethodError}}. This particular failure mode was resolved in 
> 3.2.1 by SPARK-37446, but while investigating, we found a general issue that 
> it's possible for user JARs to override Spark's own JARs -- but only inside 
> of the IsolatedClientLoader when using "builtin". This happens because even 
> when Spark is configured to use the "builtin" Hive classes, it still creates 
> a separate URLClassLoader for the HiveClientImpl used for HMS communication. 
> To get the set of JAR URLs to use for this classloader, Spark [collects all 
> of the JARs used by the user classloader (and its parent, and that 
> classloader's parent, and so 
> on)](https://github.com/apache/spark/blob/87e3d5625e76bb734b8dd753bfb25002822c8585/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L412-L438).
>  Thus the newly created classloader will have all of the same JARs as the 
> user classloader, but the ordering has been reversed! User JARs get 
> prioritized ahead of system JARs, because the classloader hierarchy is 
> traversed from bottom-to-top. For example let's say we have user JARs 
> "foo.jar" and "hive-exec-2.3.8.jar". The user classloader will look like this:
> {code}
> MutableURLClassLoader
> -- foo.jar
> -- hive-exec-2.3.8.jar
> -- parent: URLClassLoader
> - spark-core_2.12-3.2.0.jar
> - ...
> - hive-exec-2.3.9.jar
> - ...
> {code}
> This setup provides the expected behavior within the user classloader; it 
> will first check the parent, so hive-exec-2.3.9.jar takes precedence, and the 
> MutableURLClassLoader is only checked if the class doesn't exist in the 
> parent. But when a JAR list is constructed for the IsolatedClientLoader, it 
> traverses the URLs from MutableURLClassLoader first, then it's parent, so the 
> final list looks like (in order):
> {code}
> URLClassLoader [IsolatedClientLoader]
> -- foo.jar
> -- hive-exec-2.3.8.jar
> -- spark-core_2.12-3.2.0.jar
> -- ...
> -- hive-exec-2.3.9.jar
> -- ...
> -- parent: boot classloader (JVM classes)
> {code}
> Now when a lookup happens, all of the JARs are within the same 
> URLClassLoader, and the user JARs are in front of the Spark ones, so the user 
> JARs get prioritized. This is the opposite of the expected behavior when 
> using the default user/application classloader in Spark, which has 
> parent-first behavior, prioritizing the Spark/system classes over the user 
> classes. (Note that this behavior is correct when using the 
> {{ChildFirstURLClassLoader}}.)
> After SPARK-37446, the NoSuchMethodError is no longer an issue, but this 
> still breaks assumptions about how user JARs should be treated vs. system 
> JARs, and presents the ability for the client to break in other ways. For 
> example in SPARK-37446 it describes a scenario whereby Hive 2.3.8 JARs have 
> been included; the changes in Hive 2.3.9 were needed to improve compatibility 
> with older HMS, so if a user were to accidentally include these older JARs, 
> it could break the ability of Spark to communicate with HMS 1.x
> I see two solutions to this:
> *(A) Remove the separate classloader entirely when using "builtin"*
> Starting from 3.0.0, due to SPARK-26839, when using Java 9+, we don't even 
> create a new classloader when using "builtin". This makes sense, as [called 
> out in this 
> comment|https://github.com/apache/spark/pull/24057#discussion_r265142878], 
> since the point of "builtin" is to use the existing JARs on the classpath 
> anyway. This proposes simply extending the changes from SPARK-26839 to all 
> Java versions, instead of restricting to Java 9+ only.
> *(B) Reverse the 

[jira] [Commented] (SPARK-42539) User-provided JARs can override Spark's Hive metadata client JARs when using "builtin"

2023-02-23 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692820#comment-17692820
 ] 

Apache Spark commented on SPARK-42539:
--

User 'xkrogen' has created a pull request for this issue:
https://github.com/apache/spark/pull/40144

> User-provided JARs can override Spark's Hive metadata client JARs when using 
> "builtin"
> --
>
> Key: SPARK-42539
> URL: https://issues.apache.org/jira/browse/SPARK-42539
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.3, 3.2.3, 3.3.2
>Reporter: Erik Krogen
>Priority: Major
>
> Recently we observed that on version 3.2.0 and Java 8, it is possible for 
> user-provided Hive JARs to break the ability for Spark, via the Hive metadata 
> client / {{IsolatedClientLoader}}, to communicate with Hive Metastore, when 
> using the default behavior of the "builtin" Hive version. After SPARK-35321, 
> when Spark is compiled against Hive >= 2.3.9 and the "builtin" Hive client 
> version is used, we will call the method {{Hive.getWithoutRegisterFns()}} 
> (from HIVE-21563) instead of {{Hive.get()}}. If the user has included, for 
> example, {{hive-exec-2.3.8.jar}} on their classpath, the client will break 
> with a {{NoSuchMethodError}}. This particular failure mode was resolved in 
> 3.2.1 by SPARK-37446, but while investigating, we found a general issue that 
> it's possible for user JARs to override Spark's own JARs -- but only inside 
> of the IsolatedClientLoader when using "builtin". This happens because even 
> when Spark is configured to use the "builtin" Hive classes, it still creates 
> a separate URLClassLoader for the HiveClientImpl used for HMS communication. 
> To get the set of JAR URLs to use for this classloader, Spark [collects all 
> of the JARs used by the user classloader (and its parent, and that 
> classloader's parent, and so 
> on)](https://github.com/apache/spark/blob/87e3d5625e76bb734b8dd753bfb25002822c8585/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L412-L438).
>  Thus the newly created classloader will have all of the same JARs as the 
> user classloader, but the ordering has been reversed! User JARs get 
> prioritized ahead of system JARs, because the classloader hierarchy is 
> traversed from bottom-to-top. For example let's say we have user JARs 
> "foo.jar" and "hive-exec-2.3.8.jar". The user classloader will look like this:
> {code}
> MutableURLClassLoader
> -- foo.jar
> -- hive-exec-2.3.8.jar
> -- parent: URLClassLoader
> - spark-core_2.12-3.2.0.jar
> - ...
> - hive-exec-2.3.9.jar
> - ...
> {code}
> This setup provides the expected behavior within the user classloader; it 
> will first check the parent, so hive-exec-2.3.9.jar takes precedence, and the 
> MutableURLClassLoader is only checked if the class doesn't exist in the 
> parent. But when a JAR list is constructed for the IsolatedClientLoader, it 
> traverses the URLs from MutableURLClassLoader first, then it's parent, so the 
> final list looks like (in order):
> {code}
> URLClassLoader [IsolatedClientLoader]
> -- foo.jar
> -- hive-exec-2.3.8.jar
> -- spark-core_2.12-3.2.0.jar
> -- ...
> -- hive-exec-2.3.9.jar
> -- ...
> -- parent: boot classloader (JVM classes)
> {code}
> Now when a lookup happens, all of the JARs are within the same 
> URLClassLoader, and the user JARs are in front of the Spark ones, so the user 
> JARs get prioritized. This is the opposite of the expected behavior when 
> using the default user/application classloader in Spark, which has 
> parent-first behavior, prioritizing the Spark/system classes over the user 
> classes. (Note that this behavior is correct when using the 
> {{ChildFirstURLClassLoader}}.)
> After SPARK-37446, the NoSuchMethodError is no longer an issue, but this 
> still breaks assumptions about how user JARs should be treated vs. system 
> JARs, and presents the ability for the client to break in other ways. For 
> example in SPARK-37446 it describes a scenario whereby Hive 2.3.8 JARs have 
> been included; the changes in Hive 2.3.9 were needed to improve compatibility 
> with older HMS, so if a user were to accidentally include these older JARs, 
> it could break the ability of Spark to communicate with HMS 1.x
> I see two solutions to this:
> *(A) Remove the separate classloader entirely when using "builtin"*
> Starting from 3.0.0, due to SPARK-26839, when using Java 9+, we don't even 
> create a new classloader when using "builtin". This makes sense, as [called 
> out in this 
> comment|https://github.com/apache/spark/pull/24057#discussion_r265142878], 
> since the point of "builtin" is to use the existing JARs on the classpath 
> anyway. This proposes simply extending the changes from SPARK-26839 to all 
> 

[jira] [Assigned] (SPARK-42539) User-provided JARs can override Spark's Hive metadata client JARs when using "builtin"

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42539:


Assignee: (was: Apache Spark)

> User-provided JARs can override Spark's Hive metadata client JARs when using 
> "builtin"
> --
>
> Key: SPARK-42539
> URL: https://issues.apache.org/jira/browse/SPARK-42539
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.3, 3.2.3, 3.3.2
>Reporter: Erik Krogen
>Priority: Major
>
> Recently we observed that on version 3.2.0 and Java 8, it is possible for 
> user-provided Hive JARs to break the ability for Spark, via the Hive metadata 
> client / {{IsolatedClientLoader}}, to communicate with Hive Metastore, when 
> using the default behavior of the "builtin" Hive version. After SPARK-35321, 
> when Spark is compiled against Hive >= 2.3.9 and the "builtin" Hive client 
> version is used, we will call the method {{Hive.getWithoutRegisterFns()}} 
> (from HIVE-21563) instead of {{Hive.get()}}. If the user has included, for 
> example, {{hive-exec-2.3.8.jar}} on their classpath, the client will break 
> with a {{NoSuchMethodError}}. This particular failure mode was resolved in 
> 3.2.1 by SPARK-37446, but while investigating, we found a general issue that 
> it's possible for user JARs to override Spark's own JARs -- but only inside 
> of the IsolatedClientLoader when using "builtin". This happens because even 
> when Spark is configured to use the "builtin" Hive classes, it still creates 
> a separate URLClassLoader for the HiveClientImpl used for HMS communication. 
> To get the set of JAR URLs to use for this classloader, Spark [collects all 
> of the JARs used by the user classloader (and its parent, and that 
> classloader's parent, and so 
> on)](https://github.com/apache/spark/blob/87e3d5625e76bb734b8dd753bfb25002822c8585/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L412-L438).
>  Thus the newly created classloader will have all of the same JARs as the 
> user classloader, but the ordering has been reversed! User JARs get 
> prioritized ahead of system JARs, because the classloader hierarchy is 
> traversed from bottom-to-top. For example let's say we have user JARs 
> "foo.jar" and "hive-exec-2.3.8.jar". The user classloader will look like this:
> {code}
> MutableURLClassLoader
> -- foo.jar
> -- hive-exec-2.3.8.jar
> -- parent: URLClassLoader
> - spark-core_2.12-3.2.0.jar
> - ...
> - hive-exec-2.3.9.jar
> - ...
> {code}
> This setup provides the expected behavior within the user classloader; it 
> will first check the parent, so hive-exec-2.3.9.jar takes precedence, and the 
> MutableURLClassLoader is only checked if the class doesn't exist in the 
> parent. But when a JAR list is constructed for the IsolatedClientLoader, it 
> traverses the URLs from MutableURLClassLoader first, then it's parent, so the 
> final list looks like (in order):
> {code}
> URLClassLoader [IsolatedClientLoader]
> -- foo.jar
> -- hive-exec-2.3.8.jar
> -- spark-core_2.12-3.2.0.jar
> -- ...
> -- hive-exec-2.3.9.jar
> -- ...
> -- parent: boot classloader (JVM classes)
> {code}
> Now when a lookup happens, all of the JARs are within the same 
> URLClassLoader, and the user JARs are in front of the Spark ones, so the user 
> JARs get prioritized. This is the opposite of the expected behavior when 
> using the default user/application classloader in Spark, which has 
> parent-first behavior, prioritizing the Spark/system classes over the user 
> classes. (Note that this behavior is correct when using the 
> {{ChildFirstURLClassLoader}}.)
> After SPARK-37446, the NoSuchMethodError is no longer an issue, but this 
> still breaks assumptions about how user JARs should be treated vs. system 
> JARs, and presents the ability for the client to break in other ways. For 
> example in SPARK-37446 it describes a scenario whereby Hive 2.3.8 JARs have 
> been included; the changes in Hive 2.3.9 were needed to improve compatibility 
> with older HMS, so if a user were to accidentally include these older JARs, 
> it could break the ability of Spark to communicate with HMS 1.x
> I see two solutions to this:
> *(A) Remove the separate classloader entirely when using "builtin"*
> Starting from 3.0.0, due to SPARK-26839, when using Java 9+, we don't even 
> create a new classloader when using "builtin". This makes sense, as [called 
> out in this 
> comment|https://github.com/apache/spark/pull/24057#discussion_r265142878], 
> since the point of "builtin" is to use the existing JARs on the classpath 
> anyway. This proposes simply extending the changes from SPARK-26839 to all 
> Java versions, instead of restricting to Java 9+ only.
> *(B) Reverse the ordering of parent/child 

[jira] [Assigned] (SPARK-42538) `functions#lit` support more types

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42538:


Assignee: (was: Apache Spark)

> `functions#lit` support more types 
> ---
>
> Key: SPARK-42538
> URL: https://issues.apache.org/jira/browse/SPARK-42538
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42538) `functions#lit` support more types

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42538:


Assignee: Apache Spark

> `functions#lit` support more types 
> ---
>
> Key: SPARK-42538
> URL: https://issues.apache.org/jira/browse/SPARK-42538
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42538) `functions#lit` support more types

2023-02-23 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692698#comment-17692698
 ] 

Apache Spark commented on SPARK-42538:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40143

> `functions#lit` support more types 
> ---
>
> Key: SPARK-42538
> URL: https://issues.apache.org/jira/browse/SPARK-42538
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41171) Push down filter through window when partitionSpec is empty

2023-02-23 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692628#comment-17692628
 ] 

Apache Spark commented on SPARK-41171:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/40142

> Push down filter through window when partitionSpec is empty
> ---
>
> Key: SPARK-41171
> URL: https://issues.apache.org/jira/browse/SPARK-41171
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> Sometimes, filter compares the rank-like window functions with number.
> {code:java}
> SELECT *, ROW_NUMBER() OVER(ORDER BY a) AS rn FROM Tab1 WHERE rn <= 5
> {code}
> We can create a Limit(5) and push down it as the child of Window.
> {code:java}
> SELECT *, ROW_NUMBER() OVER(ORDER BY a) AS rn FROM (SELECT * FROM Tab1 ORDER 
> BY a LIMIT 5) t
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42406) [PROTOBUF] Recursive field handling is incompatible with delta

2023-02-23 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692607#comment-17692607
 ] 

Apache Spark commented on SPARK-42406:
--

User 'rangadi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40141

> [PROTOBUF] Recursive field handling is incompatible with delta
> --
>
> Key: SPARK-42406
> URL: https://issues.apache.org/jira/browse/SPARK-42406
> Project: Spark
>  Issue Type: Bug
>  Components: Protobuf
>Affects Versions: 3.4.0
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
>Priority: Major
> Fix For: 3.4.0
>
>
> Protobuf deserializer (`from_protobuf()` function()) optionally supports 
> recursive fields by limiting the depth to certain level. See example below. 
> It assigns a 'NullType' for such a field when allowed depth is reached. 
> It causes a few issues. E.g. a repeated field as in the following example 
> results in a Array field with 'NullType'. Delta does not support null type in 
> a complex type.
> Actually `Array[NullType]` is not really useful anyway.
> How about this fix: Drop the recursive field when the limit reached rather 
> than using a NullType. 
> The example below makes it clear:
> Consider a recursive Protobuf:
>  
> {code:python}
> message TreeNode {
>   string value = 1;
>   repeated TreeNode children = 2;
> }
> {code}
> Allow depth of 2: 
>  
> {code:python}
>    df.select(
>     'proto',
>      messageName = 'TreeNode',
>      options = { ... "recursive.fields.max.depth" : "2" }
>   ).printSchema()
> {code}
> Schema looks like this:
> {noformat}
> root
> |– from_protobuf(proto): struct (nullable = true)|
> | |– value: string (nullable = true)|
> | |– children: array (nullable = false)|
> | | |– element: struct (containsNull = false)|
> | | | |– value: string (nullable = true)|
> | | | |– children: array (nullable = false)|
> | | | | |– element: struct (containsNull = false)|
> | | | | | |– value: string (nullable = true)|
> | | | | | |– children: array (nullable = false). [ === Proposed fix: Drop 
> this field === ]|
> | | | | | | |– element: void (containsNull = false) [ === NOTICE 'void' HERE 
> === ] 
> {noformat}
> When we try to write this to a delta table, we get an error:
> {noformat}
> AnalysisException: Found nested NullType in column 
> from_protobuf(proto).children which is of ArrayType. Delta doesn't support 
> writing NullType in complex types.
> {noformat}
>  
> We could just drop the field 'element' when recursion depth is reached. It is 
> simpler and does not need to deal with NullType. We are ignoring the value 
> anyway. There is no use in keeping the field.
> Another issue is setting for 'recursive.fields.max.depth': It is not enforced 
> correctly. '0' does not make sense. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42406) [PROTOBUF] Recursive field handling is incompatible with delta

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42406:


Assignee: Raghu Angadi  (was: Apache Spark)

> [PROTOBUF] Recursive field handling is incompatible with delta
> --
>
> Key: SPARK-42406
> URL: https://issues.apache.org/jira/browse/SPARK-42406
> Project: Spark
>  Issue Type: Bug
>  Components: Protobuf
>Affects Versions: 3.4.0
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
>Priority: Major
> Fix For: 3.4.0
>
>
> Protobuf deserializer (`from_protobuf()` function()) optionally supports 
> recursive fields by limiting the depth to certain level. See example below. 
> It assigns a 'NullType' for such a field when allowed depth is reached. 
> It causes a few issues. E.g. a repeated field as in the following example 
> results in a Array field with 'NullType'. Delta does not support null type in 
> a complex type.
> Actually `Array[NullType]` is not really useful anyway.
> How about this fix: Drop the recursive field when the limit reached rather 
> than using a NullType. 
> The example below makes it clear:
> Consider a recursive Protobuf:
>  
> {code:python}
> message TreeNode {
>   string value = 1;
>   repeated TreeNode children = 2;
> }
> {code}
> Allow depth of 2: 
>  
> {code:python}
>    df.select(
>     'proto',
>      messageName = 'TreeNode',
>      options = { ... "recursive.fields.max.depth" : "2" }
>   ).printSchema()
> {code}
> Schema looks like this:
> {noformat}
> root
> |– from_protobuf(proto): struct (nullable = true)|
> | |– value: string (nullable = true)|
> | |– children: array (nullable = false)|
> | | |– element: struct (containsNull = false)|
> | | | |– value: string (nullable = true)|
> | | | |– children: array (nullable = false)|
> | | | | |– element: struct (containsNull = false)|
> | | | | | |– value: string (nullable = true)|
> | | | | | |– children: array (nullable = false). [ === Proposed fix: Drop 
> this field === ]|
> | | | | | | |– element: void (containsNull = false) [ === NOTICE 'void' HERE 
> === ] 
> {noformat}
> When we try to write this to a delta table, we get an error:
> {noformat}
> AnalysisException: Found nested NullType in column 
> from_protobuf(proto).children which is of ArrayType. Delta doesn't support 
> writing NullType in complex types.
> {noformat}
>  
> We could just drop the field 'element' when recursion depth is reached. It is 
> simpler and does not need to deal with NullType. We are ignoring the value 
> anyway. There is no use in keeping the field.
> Another issue is setting for 'recursive.fields.max.depth': It is not enforced 
> correctly. '0' does not make sense. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42406) [PROTOBUF] Recursive field handling is incompatible with delta

2023-02-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42406:


Assignee: Apache Spark  (was: Raghu Angadi)

> [PROTOBUF] Recursive field handling is incompatible with delta
> --
>
> Key: SPARK-42406
> URL: https://issues.apache.org/jira/browse/SPARK-42406
> Project: Spark
>  Issue Type: Bug
>  Components: Protobuf
>Affects Versions: 3.4.0
>Reporter: Raghu Angadi
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.4.0
>
>
> Protobuf deserializer (`from_protobuf()` function()) optionally supports 
> recursive fields by limiting the depth to certain level. See example below. 
> It assigns a 'NullType' for such a field when allowed depth is reached. 
> It causes a few issues. E.g. a repeated field as in the following example 
> results in a Array field with 'NullType'. Delta does not support null type in 
> a complex type.
> Actually `Array[NullType]` is not really useful anyway.
> How about this fix: Drop the recursive field when the limit reached rather 
> than using a NullType. 
> The example below makes it clear:
> Consider a recursive Protobuf:
>  
> {code:python}
> message TreeNode {
>   string value = 1;
>   repeated TreeNode children = 2;
> }
> {code}
> Allow depth of 2: 
>  
> {code:python}
>    df.select(
>     'proto',
>      messageName = 'TreeNode',
>      options = { ... "recursive.fields.max.depth" : "2" }
>   ).printSchema()
> {code}
> Schema looks like this:
> {noformat}
> root
> |– from_protobuf(proto): struct (nullable = true)|
> | |– value: string (nullable = true)|
> | |– children: array (nullable = false)|
> | | |– element: struct (containsNull = false)|
> | | | |– value: string (nullable = true)|
> | | | |– children: array (nullable = false)|
> | | | | |– element: struct (containsNull = false)|
> | | | | | |– value: string (nullable = true)|
> | | | | | |– children: array (nullable = false). [ === Proposed fix: Drop 
> this field === ]|
> | | | | | | |– element: void (containsNull = false) [ === NOTICE 'void' HERE 
> === ] 
> {noformat}
> When we try to write this to a delta table, we get an error:
> {noformat}
> AnalysisException: Found nested NullType in column 
> from_protobuf(proto).children which is of ArrayType. Delta doesn't support 
> writing NullType in complex types.
> {noformat}
>  
> We could just drop the field 'element' when recursion depth is reached. It is 
> simpler and does not need to deal with NullType. We are ignoring the value 
> anyway. There is no use in keeping the field.
> Another issue is setting for 'recursive.fields.max.depth': It is not enforced 
> correctly. '0' does not make sense. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42286) Fix internal error for valid CASE WHEN expression with CAST when inserting into a table

2023-02-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692512#comment-17692512
 ] 

Apache Spark commented on SPARK-42286:
--

User 'RunyaoChen' has created a pull request for this issue:
https://github.com/apache/spark/pull/40140

> Fix internal error for valid CASE WHEN expression with CAST when inserting 
> into a table
> ---
>
> Key: SPARK-42286
> URL: https://issues.apache.org/jira/browse/SPARK-42286
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Runyao.Chen
>Assignee: Runyao.Chen
>Priority: Major
> Fix For: 3.4.0
>
>
> ```
> spark-sql> create or replace table es570639t1 as select x FROM values (1), 
> (2), (3) as tab(x);
> spark-sql> create or replace table es570639t2 (x Decimal(9, 0));
> spark-sql> insert into es570639t2 select 0 - (case when x = 1 then 1 else x 
> end) from es570639t1 where x = 1;
> ```
> hits the following internal error
> org.apache.spark.SparkException: [INTERNAL_ERROR] Child is not Cast or 
> ExpressionProxy of Cast
>  
> Stack trace:
> org.apache.spark.SparkException: [INTERNAL_ERROR] Child is not Cast or 
> ExpressionProxy of Cast at 
> org.apache.spark.SparkException$.internalError(SparkException.scala:78) at 
> org.apache.spark.SparkException$.internalError(SparkException.scala:82) at 
> org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.checkChild(Cast.scala:2693)
>  at 
> org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2697)
>  at 
> org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2683)
>  at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.$anonfun$mapChildren$5(TreeNode.scala:1315)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:106)
>  at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1314)
>  at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1309)
>  at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:636)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:570)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:570)
>  
> This internal error comes from `CheckOverflowInTableInsert``checkChild`, 
> where we covered only `Cast` expr and `ExpressionProxy` expr, but not the 
> `CaseWhen` expr.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39859) Support v2 `DESCRIBE TABLE EXTENDED` for columns

2023-02-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692503#comment-17692503
 ] 

Apache Spark commented on SPARK-39859:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/40139

> Support v2 `DESCRIBE TABLE EXTENDED` for columns
> 
>
> Key: SPARK-39859
> URL: https://issues.apache.org/jira/browse/SPARK-39859
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42049) Improve AliasAwareOutputExpression

2023-02-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692481#comment-17692481
 ] 

Apache Spark commented on SPARK-42049:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/40137

> Improve AliasAwareOutputExpression
> --
>
> Key: SPARK-42049
> URL: https://issues.apache.org/jira/browse/SPARK-42049
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Assignee: Peter Toth
>Priority: Major
> Fix For: 3.4.0
>
>
> AliasAwareOutputExpression now does not support if an attribute has more than 
> one alias.
> AliasAwareOutputExpression should also work for LogicalPlan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41793) Incorrect result for window frames defined by a range clause on large decimals

2023-02-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41793:


Assignee: (was: Apache Spark)

> Incorrect result for window frames defined by a range clause on large 
> decimals 
> ---
>
> Key: SPARK-41793
> URL: https://issues.apache.org/jira/browse/SPARK-41793
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gera Shegalov
>Priority: Blocker
>  Labels: correctness
>
> Context 
> https://github.com/NVIDIA/spark-rapids/issues/7429#issuecomment-1368040686
> The following windowing query on a simple two-row input should produce two 
> non-empty windows as a result
> {code}
> from pprint import pprint
> data = [
>   ('9223372036854775807', '11342371013783243717493546650944543.47'),
>   ('9223372036854775807', '.99')
> ]
> df1 = spark.createDataFrame(data, 'a STRING, b STRING')
> df2 = df1.select(df1.a.cast('LONG'), df1.b.cast('DECIMAL(38,2)'))
> df2.createOrReplaceTempView('test_table')
> df = sql('''
>   SELECT 
> COUNT(1) OVER (
>   PARTITION BY a 
>   ORDER BY b ASC 
>   RANGE BETWEEN 10.2345 PRECEDING AND 6.7890 FOLLOWING
> ) AS CNT_1 
>   FROM 
> test_table
>   ''')
> res = df.collect()
> df.explain(True)
> pprint(res)
> {code}
> Spark 3.4.0-SNAPSHOT output:
> {code}
> [Row(CNT_1=1), Row(CNT_1=0)]
> {code}
> Spark 3.3.1 output as expected:
> {code}
> Row(CNT_1=1), Row(CNT_1=1)]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41793) Incorrect result for window frames defined by a range clause on large decimals

2023-02-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692480#comment-17692480
 ] 

Apache Spark commented on SPARK-41793:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/40138

> Incorrect result for window frames defined by a range clause on large 
> decimals 
> ---
>
> Key: SPARK-41793
> URL: https://issues.apache.org/jira/browse/SPARK-41793
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gera Shegalov
>Priority: Blocker
>  Labels: correctness
>
> Context 
> https://github.com/NVIDIA/spark-rapids/issues/7429#issuecomment-1368040686
> The following windowing query on a simple two-row input should produce two 
> non-empty windows as a result
> {code}
> from pprint import pprint
> data = [
>   ('9223372036854775807', '11342371013783243717493546650944543.47'),
>   ('9223372036854775807', '.99')
> ]
> df1 = spark.createDataFrame(data, 'a STRING, b STRING')
> df2 = df1.select(df1.a.cast('LONG'), df1.b.cast('DECIMAL(38,2)'))
> df2.createOrReplaceTempView('test_table')
> df = sql('''
>   SELECT 
> COUNT(1) OVER (
>   PARTITION BY a 
>   ORDER BY b ASC 
>   RANGE BETWEEN 10.2345 PRECEDING AND 6.7890 FOLLOWING
> ) AS CNT_1 
>   FROM 
> test_table
>   ''')
> res = df.collect()
> df.explain(True)
> pprint(res)
> {code}
> Spark 3.4.0-SNAPSHOT output:
> {code}
> [Row(CNT_1=1), Row(CNT_1=0)]
> {code}
> Spark 3.3.1 output as expected:
> {code}
> Row(CNT_1=1), Row(CNT_1=1)]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41793) Incorrect result for window frames defined by a range clause on large decimals

2023-02-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41793:


Assignee: Apache Spark

> Incorrect result for window frames defined by a range clause on large 
> decimals 
> ---
>
> Key: SPARK-41793
> URL: https://issues.apache.org/jira/browse/SPARK-41793
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gera Shegalov
>Assignee: Apache Spark
>Priority: Blocker
>  Labels: correctness
>
> Context 
> https://github.com/NVIDIA/spark-rapids/issues/7429#issuecomment-1368040686
> The following windowing query on a simple two-row input should produce two 
> non-empty windows as a result
> {code}
> from pprint import pprint
> data = [
>   ('9223372036854775807', '11342371013783243717493546650944543.47'),
>   ('9223372036854775807', '.99')
> ]
> df1 = spark.createDataFrame(data, 'a STRING, b STRING')
> df2 = df1.select(df1.a.cast('LONG'), df1.b.cast('DECIMAL(38,2)'))
> df2.createOrReplaceTempView('test_table')
> df = sql('''
>   SELECT 
> COUNT(1) OVER (
>   PARTITION BY a 
>   ORDER BY b ASC 
>   RANGE BETWEEN 10.2345 PRECEDING AND 6.7890 FOLLOWING
> ) AS CNT_1 
>   FROM 
> test_table
>   ''')
> res = df.collect()
> df.explain(True)
> pprint(res)
> {code}
> Spark 3.4.0-SNAPSHOT output:
> {code}
> [Row(CNT_1=1), Row(CNT_1=0)]
> {code}
> Spark 3.3.1 output as expected:
> {code}
> Row(CNT_1=1), Row(CNT_1=1)]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42515) ClientE2ETestSuite local test failed

2023-02-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692469#comment-17692469
 ] 

Apache Spark commented on SPARK-42515:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40136

> ClientE2ETestSuite local test failed
> 
>
> Key: SPARK-42515
> URL: https://issues.apache.org/jira/browse/SPARK-42515
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>
>  
> local run `build/sbt clean "connect-client-jvm/test"`, 
> `ClientE2ETestSuite#write table` failed, GA not failed.
>  
> {code:java}
> [info] - rite table *** FAILED *** (41 milliseconds)
> [info]   io.grpc.StatusRuntimeException: UNKNOWN: 
> org/apache/parquet/hadoop/api/ReadSupport
> [info]   at io.grpc.Status.asRuntimeException(Status.java:535)
> [info]   at 
> io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
> [info]   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943)
> [info]   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
> [info]   at org.apache.spark.sql.SparkSession.execute(SparkSession.scala:169)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.executeWriteOperation(DataFrameWriter.scala:255)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:338)
> [info]   at 
> org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$12(ClientE2ETestSuite.scala:145)
> [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> [info]   at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
> [info]   at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> [info]   at org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
> [info]   at scala.collection.immutable.List.foreach(List.scala:431)
> [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
> [info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
> [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
> [info]   at org.scalatest.Suite.run(Suite.scala:1114)
> [info]   at org.scalatest.Suite.run$(Suite.scala:1096)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
> [info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
> [info]   at 
> org.apache.spark.sql.ClientE2ETestSuite.org$scalatest$BeforeAndAfterAll$$super$run(ClientE2ETestSuite.scala:33)
> [info]   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
> [info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
> [info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
> [info]   at 
> org.apache.spark.sql.ClientE2ETestSuite.run(ClientE2ETestSuite.scala:33)
> [info]   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)
> [info]   at 
> 

[jira] [Assigned] (SPARK-42515) ClientE2ETestSuite local test failed

2023-02-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42515:


Assignee: (was: Apache Spark)

> ClientE2ETestSuite local test failed
> 
>
> Key: SPARK-42515
> URL: https://issues.apache.org/jira/browse/SPARK-42515
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>
>  
> local run `build/sbt clean "connect-client-jvm/test"`, 
> `ClientE2ETestSuite#write table` failed, GA not failed.
>  
> {code:java}
> [info] - rite table *** FAILED *** (41 milliseconds)
> [info]   io.grpc.StatusRuntimeException: UNKNOWN: 
> org/apache/parquet/hadoop/api/ReadSupport
> [info]   at io.grpc.Status.asRuntimeException(Status.java:535)
> [info]   at 
> io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
> [info]   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943)
> [info]   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
> [info]   at org.apache.spark.sql.SparkSession.execute(SparkSession.scala:169)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.executeWriteOperation(DataFrameWriter.scala:255)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:338)
> [info]   at 
> org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$12(ClientE2ETestSuite.scala:145)
> [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> [info]   at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
> [info]   at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> [info]   at org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
> [info]   at scala.collection.immutable.List.foreach(List.scala:431)
> [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
> [info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
> [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
> [info]   at org.scalatest.Suite.run(Suite.scala:1114)
> [info]   at org.scalatest.Suite.run$(Suite.scala:1096)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
> [info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
> [info]   at 
> org.apache.spark.sql.ClientE2ETestSuite.org$scalatest$BeforeAndAfterAll$$super$run(ClientE2ETestSuite.scala:33)
> [info]   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
> [info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
> [info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
> [info]   at 
> org.apache.spark.sql.ClientE2ETestSuite.run(ClientE2ETestSuite.scala:33)
> [info]   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)
> [info]   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517)
> [info]   at 

[jira] [Assigned] (SPARK-42515) ClientE2ETestSuite local test failed

2023-02-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42515:


Assignee: Apache Spark

> ClientE2ETestSuite local test failed
> 
>
> Key: SPARK-42515
> URL: https://issues.apache.org/jira/browse/SPARK-42515
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
>  
> local run `build/sbt clean "connect-client-jvm/test"`, 
> `ClientE2ETestSuite#write table` failed, GA not failed.
>  
> {code:java}
> [info] - rite table *** FAILED *** (41 milliseconds)
> [info]   io.grpc.StatusRuntimeException: UNKNOWN: 
> org/apache/parquet/hadoop/api/ReadSupport
> [info]   at io.grpc.Status.asRuntimeException(Status.java:535)
> [info]   at 
> io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
> [info]   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943)
> [info]   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
> [info]   at org.apache.spark.sql.SparkSession.execute(SparkSession.scala:169)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.executeWriteOperation(DataFrameWriter.scala:255)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:338)
> [info]   at 
> org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$12(ClientE2ETestSuite.scala:145)
> [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> [info]   at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
> [info]   at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> [info]   at org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
> [info]   at scala.collection.immutable.List.foreach(List.scala:431)
> [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
> [info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
> [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
> [info]   at org.scalatest.Suite.run(Suite.scala:1114)
> [info]   at org.scalatest.Suite.run$(Suite.scala:1096)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
> [info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
> [info]   at 
> org.apache.spark.sql.ClientE2ETestSuite.org$scalatest$BeforeAndAfterAll$$super$run(ClientE2ETestSuite.scala:33)
> [info]   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
> [info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
> [info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
> [info]   at 
> org.apache.spark.sql.ClientE2ETestSuite.run(ClientE2ETestSuite.scala:33)
> [info]   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)
> [info]   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517)
> 

[jira] [Assigned] (SPARK-42444) DataFrame.drop should handle multi columns properly

2023-02-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42444:


Assignee: (was: Apache Spark)

> DataFrame.drop should handle multi columns properly
> ---
>
> Key: SPARK-42444
> URL: https://issues.apache.org/jira/browse/SPARK-42444
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Blocker
>
> {code:java}
> from pyspark.sql import Row
> df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], 
> ["age", "name"])
> df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, 
> name="Bob")])
> df1.join(df2, df1.name == df2.name, 'inner').drop('name', 'age').show()
> {code}
> This works in 3.3
> {code:java}
> +--+
> |height|
> +--+
> |85|
> |80|
> +--+
> {code}
> but fails in 3.4
> {code:java}
> ---
> AnalysisException Traceback (most recent call last)
> Cell In[1], line 4
>   2 df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, 
> "Bob")], ["age", "name"])
>   3 df2 = spark.createDataFrame([Row(height=80, name="Tom"), 
> Row(height=85, name="Bob")])
> > 4 df1.join(df2, df1.name == df2.name, 'inner').drop('name', 
> 'age').show()
> File ~/Dev/spark/python/pyspark/sql/dataframe.py:4913, in 
> DataFrame.drop(self, *cols)
>4911 jcols = [_to_java_column(c) for c in cols]
>4912 first_column, *remaining_columns = jcols
> -> 4913 jdf = self._jdf.drop(first_column, self._jseq(remaining_columns))
>4915 return DataFrame(jdf, self.sparkSession)
> File ~/Dev/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, 
> in JavaMember.__call__(self, *args)
>1316 command = proto.CALL_COMMAND_NAME +\
>1317 self.command_header +\
>1318 args_command +\
>1319 proto.END_COMMAND_PART
>1321 answer = self.gateway_client.send_command(command)
> -> 1322 return_value = get_return_value(
>1323 answer, self.gateway_client, self.target_id, self.name)
>1325 for temp_arg in temp_args:
>1326 if hasattr(temp_arg, "_detach"):
> File ~/Dev/spark/python/pyspark/errors/exceptions/captured.py:159, in 
> capture_sql_exception..deco(*a, **kw)
> 155 converted = convert_exception(e.java_exception)
> 156 if not isinstance(converted, UnknownException):
> 157 # Hide where the exception came from that shows a non-Pythonic
> 158 # JVM exception message.
> --> 159 raise converted from None
> 160 else:
> 161 raise
> AnalysisException: [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could 
> be: [`name`, `name`].
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42444) DataFrame.drop should handle multi columns properly

2023-02-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692467#comment-17692467
 ] 

Apache Spark commented on SPARK-42444:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40135

> DataFrame.drop should handle multi columns properly
> ---
>
> Key: SPARK-42444
> URL: https://issues.apache.org/jira/browse/SPARK-42444
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Blocker
>
> {code:java}
> from pyspark.sql import Row
> df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], 
> ["age", "name"])
> df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, 
> name="Bob")])
> df1.join(df2, df1.name == df2.name, 'inner').drop('name', 'age').show()
> {code}
> This works in 3.3
> {code:java}
> +--+
> |height|
> +--+
> |85|
> |80|
> +--+
> {code}
> but fails in 3.4
> {code:java}
> ---
> AnalysisException Traceback (most recent call last)
> Cell In[1], line 4
>   2 df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, 
> "Bob")], ["age", "name"])
>   3 df2 = spark.createDataFrame([Row(height=80, name="Tom"), 
> Row(height=85, name="Bob")])
> > 4 df1.join(df2, df1.name == df2.name, 'inner').drop('name', 
> 'age').show()
> File ~/Dev/spark/python/pyspark/sql/dataframe.py:4913, in 
> DataFrame.drop(self, *cols)
>4911 jcols = [_to_java_column(c) for c in cols]
>4912 first_column, *remaining_columns = jcols
> -> 4913 jdf = self._jdf.drop(first_column, self._jseq(remaining_columns))
>4915 return DataFrame(jdf, self.sparkSession)
> File ~/Dev/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, 
> in JavaMember.__call__(self, *args)
>1316 command = proto.CALL_COMMAND_NAME +\
>1317 self.command_header +\
>1318 args_command +\
>1319 proto.END_COMMAND_PART
>1321 answer = self.gateway_client.send_command(command)
> -> 1322 return_value = get_return_value(
>1323 answer, self.gateway_client, self.target_id, self.name)
>1325 for temp_arg in temp_args:
>1326 if hasattr(temp_arg, "_detach"):
> File ~/Dev/spark/python/pyspark/errors/exceptions/captured.py:159, in 
> capture_sql_exception..deco(*a, **kw)
> 155 converted = convert_exception(e.java_exception)
> 156 if not isinstance(converted, UnknownException):
> 157 # Hide where the exception came from that shows a non-Pythonic
> 158 # JVM exception message.
> --> 159 raise converted from None
> 160 else:
> 161 raise
> AnalysisException: [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could 
> be: [`name`, `name`].
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42444) DataFrame.drop should handle multi columns properly

2023-02-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692466#comment-17692466
 ] 

Apache Spark commented on SPARK-42444:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40135

> DataFrame.drop should handle multi columns properly
> ---
>
> Key: SPARK-42444
> URL: https://issues.apache.org/jira/browse/SPARK-42444
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Blocker
>
> {code:java}
> from pyspark.sql import Row
> df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], 
> ["age", "name"])
> df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, 
> name="Bob")])
> df1.join(df2, df1.name == df2.name, 'inner').drop('name', 'age').show()
> {code}
> This works in 3.3
> {code:java}
> +--+
> |height|
> +--+
> |85|
> |80|
> +--+
> {code}
> but fails in 3.4
> {code:java}
> ---
> AnalysisException Traceback (most recent call last)
> Cell In[1], line 4
>   2 df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, 
> "Bob")], ["age", "name"])
>   3 df2 = spark.createDataFrame([Row(height=80, name="Tom"), 
> Row(height=85, name="Bob")])
> > 4 df1.join(df2, df1.name == df2.name, 'inner').drop('name', 
> 'age').show()
> File ~/Dev/spark/python/pyspark/sql/dataframe.py:4913, in 
> DataFrame.drop(self, *cols)
>4911 jcols = [_to_java_column(c) for c in cols]
>4912 first_column, *remaining_columns = jcols
> -> 4913 jdf = self._jdf.drop(first_column, self._jseq(remaining_columns))
>4915 return DataFrame(jdf, self.sparkSession)
> File ~/Dev/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, 
> in JavaMember.__call__(self, *args)
>1316 command = proto.CALL_COMMAND_NAME +\
>1317 self.command_header +\
>1318 args_command +\
>1319 proto.END_COMMAND_PART
>1321 answer = self.gateway_client.send_command(command)
> -> 1322 return_value = get_return_value(
>1323 answer, self.gateway_client, self.target_id, self.name)
>1325 for temp_arg in temp_args:
>1326 if hasattr(temp_arg, "_detach"):
> File ~/Dev/spark/python/pyspark/errors/exceptions/captured.py:159, in 
> capture_sql_exception..deco(*a, **kw)
> 155 converted = convert_exception(e.java_exception)
> 156 if not isinstance(converted, UnknownException):
> 157 # Hide where the exception came from that shows a non-Pythonic
> 158 # JVM exception message.
> --> 159 raise converted from None
> 160 else:
> 161 raise
> AnalysisException: [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could 
> be: [`name`, `name`].
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42444) DataFrame.drop should handle multi columns properly

2023-02-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42444:


Assignee: Apache Spark

> DataFrame.drop should handle multi columns properly
> ---
>
> Key: SPARK-42444
> URL: https://issues.apache.org/jira/browse/SPARK-42444
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Blocker
>
> {code:java}
> from pyspark.sql import Row
> df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], 
> ["age", "name"])
> df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, 
> name="Bob")])
> df1.join(df2, df1.name == df2.name, 'inner').drop('name', 'age').show()
> {code}
> This works in 3.3
> {code:java}
> +--+
> |height|
> +--+
> |85|
> |80|
> +--+
> {code}
> but fails in 3.4
> {code:java}
> ---
> AnalysisException Traceback (most recent call last)
> Cell In[1], line 4
>   2 df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, 
> "Bob")], ["age", "name"])
>   3 df2 = spark.createDataFrame([Row(height=80, name="Tom"), 
> Row(height=85, name="Bob")])
> > 4 df1.join(df2, df1.name == df2.name, 'inner').drop('name', 
> 'age').show()
> File ~/Dev/spark/python/pyspark/sql/dataframe.py:4913, in 
> DataFrame.drop(self, *cols)
>4911 jcols = [_to_java_column(c) for c in cols]
>4912 first_column, *remaining_columns = jcols
> -> 4913 jdf = self._jdf.drop(first_column, self._jseq(remaining_columns))
>4915 return DataFrame(jdf, self.sparkSession)
> File ~/Dev/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, 
> in JavaMember.__call__(self, *args)
>1316 command = proto.CALL_COMMAND_NAME +\
>1317 self.command_header +\
>1318 args_command +\
>1319 proto.END_COMMAND_PART
>1321 answer = self.gateway_client.send_command(command)
> -> 1322 return_value = get_return_value(
>1323 answer, self.gateway_client, self.target_id, self.name)
>1325 for temp_arg in temp_args:
>1326 if hasattr(temp_arg, "_detach"):
> File ~/Dev/spark/python/pyspark/errors/exceptions/captured.py:159, in 
> capture_sql_exception..deco(*a, **kw)
> 155 converted = convert_exception(e.java_exception)
> 156 if not isinstance(converted, UnknownException):
> 157 # Hide where the exception came from that shows a non-Pythonic
> 158 # JVM exception message.
> --> 159 raise converted from None
> 160 else:
> 161 raise
> AnalysisException: [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could 
> be: [`name`, `name`].
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42534) Fix DB2 Limit clause

2023-02-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692440#comment-17692440
 ] 

Apache Spark commented on SPARK-42534:
--

User 'sadikovi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40134

> Fix DB2 Limit clause
> 
>
> Key: SPARK-42534
> URL: https://issues.apache.org/jira/browse/SPARK-42534
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Ivan Sadikov
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42534) Fix DB2 Limit clause

2023-02-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42534:


Assignee: (was: Apache Spark)

> Fix DB2 Limit clause
> 
>
> Key: SPARK-42534
> URL: https://issues.apache.org/jira/browse/SPARK-42534
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Ivan Sadikov
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42534) Fix DB2 Limit clause

2023-02-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692439#comment-17692439
 ] 

Apache Spark commented on SPARK-42534:
--

User 'sadikovi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40134

> Fix DB2 Limit clause
> 
>
> Key: SPARK-42534
> URL: https://issues.apache.org/jira/browse/SPARK-42534
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Ivan Sadikov
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42534) Fix DB2 Limit clause

2023-02-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42534:


Assignee: Apache Spark

> Fix DB2 Limit clause
> 
>
> Key: SPARK-42534
> URL: https://issues.apache.org/jira/browse/SPARK-42534
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Ivan Sadikov
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42533) SSL support for Scala Client

2023-02-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692420#comment-17692420
 ] 

Apache Spark commented on SPARK-42533:
--

User 'zhenlineo' has created a pull request for this issue:
https://github.com/apache/spark/pull/40133

> SSL support for Scala Client
> 
>
> Key: SPARK-42533
> URL: https://issues.apache.org/jira/browse/SPARK-42533
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> Add the basic encryption support for scala client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42533) SSL support for Scala Client

2023-02-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42533:


Assignee: (was: Apache Spark)

> SSL support for Scala Client
> 
>
> Key: SPARK-42533
> URL: https://issues.apache.org/jira/browse/SPARK-42533
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> Add the basic encryption support for scala client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42533) SSL support for Scala Client

2023-02-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42533:


Assignee: Apache Spark

> SSL support for Scala Client
> 
>
> Key: SPARK-42533
> URL: https://issues.apache.org/jira/browse/SPARK-42533
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Assignee: Apache Spark
>Priority: Major
>
> Add the basic encryption support for scala client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42533) SSL support for Scala Client

2023-02-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692417#comment-17692417
 ] 

Apache Spark commented on SPARK-42533:
--

User 'zhenlineo' has created a pull request for this issue:
https://github.com/apache/spark/pull/40133

> SSL support for Scala Client
> 
>
> Key: SPARK-42533
> URL: https://issues.apache.org/jira/browse/SPARK-42533
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> Add the basic encryption support for scala client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42532) Update YuniKorn documentation with v1.2

2023-02-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42532:


Assignee: Apache Spark

> Update YuniKorn documentation with v1.2
> ---
>
> Key: SPARK-42532
> URL: https://issues.apache.org/jira/browse/SPARK-42532
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42532) Update YuniKorn documentation with v1.2

2023-02-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42532:


Assignee: (was: Apache Spark)

> Update YuniKorn documentation with v1.2
> ---
>
> Key: SPARK-42532
> URL: https://issues.apache.org/jira/browse/SPARK-42532
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42532) Update YuniKorn documentation with v1.2

2023-02-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692401#comment-17692401
 ] 

Apache Spark commented on SPARK-42532:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40132

> Update YuniKorn documentation with v1.2
> ---
>
> Key: SPARK-42532
> URL: https://issues.apache.org/jira/browse/SPARK-42532
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42150) Upgrade Volcano to 1.7.0

2023-02-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692384#comment-17692384
 ] 

Apache Spark commented on SPARK-42150:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40131

> Upgrade Volcano to 1.7.0
> 
>
> Key: SPARK-42150
> URL: https://issues.apache.org/jira/browse/SPARK-42150
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42531) Scala Client Add Collection Functions

2023-02-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42531:


Assignee: (was: Apache Spark)

> Scala Client Add Collection Functions
> -
>
> Key: SPARK-42531
> URL: https://issues.apache.org/jira/browse/SPARK-42531
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42531) Scala Client Add Collection Functions

2023-02-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42531:


Assignee: Apache Spark

> Scala Client Add Collection Functions
> -
>
> Key: SPARK-42531
> URL: https://issues.apache.org/jira/browse/SPARK-42531
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42531) Scala Client Add Collection Functions

2023-02-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692367#comment-17692367
 ] 

Apache Spark commented on SPARK-42531:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/40130

> Scala Client Add Collection Functions
> -
>
> Key: SPARK-42531
> URL: https://issues.apache.org/jira/browse/SPARK-42531
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42529) Support Cube and Rollup

2023-02-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42529:


Assignee: Apache Spark  (was: Rui Wang)

> Support Cube and Rollup
> ---
>
> Key: SPARK-42529
> URL: https://issues.apache.org/jira/browse/SPARK-42529
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42529) Support Cube and Rollup

2023-02-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692349#comment-17692349
 ] 

Apache Spark commented on SPARK-42529:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/40129

> Support Cube and Rollup
> ---
>
> Key: SPARK-42529
> URL: https://issues.apache.org/jira/browse/SPARK-42529
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42529) Support Cube and Rollup

2023-02-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42529:


Assignee: Rui Wang  (was: Apache Spark)

> Support Cube and Rollup
> ---
>
> Key: SPARK-42529
> URL: https://issues.apache.org/jira/browse/SPARK-42529
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42466) spark.kubernetes.file.upload.path not deleting files under HDFS after job completes

2023-02-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692340#comment-17692340
 ] 

Apache Spark commented on SPARK-42466:
--

User 'shrprasa' has created a pull request for this issue:
https://github.com/apache/spark/pull/40128

> spark.kubernetes.file.upload.path not deleting files under HDFS after job 
> completes
> ---
>
> Key: SPARK-42466
> URL: https://issues.apache.org/jira/browse/SPARK-42466
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: Jagadeeswara Rao
>Priority: Major
>
> In cluster mode after uploading files to HDFS location using 
> spark.kubernetes.file.upload.path property files are not getting cleared . 
> File is successfully uploaded to hdfs location in this format 
> spark-upload-[randomUUID] using {{KubernetesUtils}} is requested to  
> uploadFileUri . 
> [https://github.com/apache/spark/blob/76a134ade60a9f354aca01eaca0b2e2477c6bd43/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala#L310]
> following is driver log  , driver is completed successfully and shutdownhook 
> is not cleared the hdfs files.
> {code:java}
> 23/02/16 18:06:56 INFO KubernetesClusterSchedulerBackend: Shutting down all 
> executors
> 23/02/16 18:06:56 INFO 
> KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each 
> executor to shut down
> 23/02/16 18:06:56 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
> been closed.
> 23/02/16 18:06:57 INFO MapOutputTrackerMasterEndpoint: 
> MapOutputTrackerMasterEndpoint stopped!
> 23/02/16 18:06:57 INFO MemoryStore: MemoryStore cleared
> 23/02/16 18:06:57 INFO BlockManager: BlockManager stopped
> 23/02/16 18:06:57 INFO BlockManagerMaster: BlockManagerMaster stopped
> 23/02/16 18:06:57 INFO 
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
> OutputCommitCoordinator stopped!
> 23/02/16 18:06:57 INFO SparkContext: Successfully stopped SparkContext
> 23/02/16 18:06:57 INFO ShutdownHookManager: Shutdown hook called
> 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory 
> /tmp/spark-efb8f725-4ead-4729-a8e0-f478280121b7
> 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory 
> /spark-local2/spark-66dbf7e6-fe7e-4655-8724-69d76d93fc1f
> 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory 
> /spark-local1/spark-53aefaee-58a5-4fce-b5b0-5e29f42e337f{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42466) spark.kubernetes.file.upload.path not deleting files under HDFS after job completes

2023-02-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692339#comment-17692339
 ] 

Apache Spark commented on SPARK-42466:
--

User 'shrprasa' has created a pull request for this issue:
https://github.com/apache/spark/pull/40128

> spark.kubernetes.file.upload.path not deleting files under HDFS after job 
> completes
> ---
>
> Key: SPARK-42466
> URL: https://issues.apache.org/jira/browse/SPARK-42466
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: Jagadeeswara Rao
>Priority: Major
>
> In cluster mode after uploading files to HDFS location using 
> spark.kubernetes.file.upload.path property files are not getting cleared . 
> File is successfully uploaded to hdfs location in this format 
> spark-upload-[randomUUID] using {{KubernetesUtils}} is requested to  
> uploadFileUri . 
> [https://github.com/apache/spark/blob/76a134ade60a9f354aca01eaca0b2e2477c6bd43/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala#L310]
> following is driver log  , driver is completed successfully and shutdownhook 
> is not cleared the hdfs files.
> {code:java}
> 23/02/16 18:06:56 INFO KubernetesClusterSchedulerBackend: Shutting down all 
> executors
> 23/02/16 18:06:56 INFO 
> KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each 
> executor to shut down
> 23/02/16 18:06:56 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
> been closed.
> 23/02/16 18:06:57 INFO MapOutputTrackerMasterEndpoint: 
> MapOutputTrackerMasterEndpoint stopped!
> 23/02/16 18:06:57 INFO MemoryStore: MemoryStore cleared
> 23/02/16 18:06:57 INFO BlockManager: BlockManager stopped
> 23/02/16 18:06:57 INFO BlockManagerMaster: BlockManagerMaster stopped
> 23/02/16 18:06:57 INFO 
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
> OutputCommitCoordinator stopped!
> 23/02/16 18:06:57 INFO SparkContext: Successfully stopped SparkContext
> 23/02/16 18:06:57 INFO ShutdownHookManager: Shutdown hook called
> 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory 
> /tmp/spark-efb8f725-4ead-4729-a8e0-f478280121b7
> 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory 
> /spark-local2/spark-66dbf7e6-fe7e-4655-8724-69d76d93fc1f
> 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory 
> /spark-local1/spark-53aefaee-58a5-4fce-b5b0-5e29f42e337f{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42466) spark.kubernetes.file.upload.path not deleting files under HDFS after job completes

2023-02-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42466:


Assignee: (was: Apache Spark)

> spark.kubernetes.file.upload.path not deleting files under HDFS after job 
> completes
> ---
>
> Key: SPARK-42466
> URL: https://issues.apache.org/jira/browse/SPARK-42466
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: Jagadeeswara Rao
>Priority: Major
>
> In cluster mode after uploading files to HDFS location using 
> spark.kubernetes.file.upload.path property files are not getting cleared . 
> File is successfully uploaded to hdfs location in this format 
> spark-upload-[randomUUID] using {{KubernetesUtils}} is requested to  
> uploadFileUri . 
> [https://github.com/apache/spark/blob/76a134ade60a9f354aca01eaca0b2e2477c6bd43/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala#L310]
> following is driver log  , driver is completed successfully and shutdownhook 
> is not cleared the hdfs files.
> {code:java}
> 23/02/16 18:06:56 INFO KubernetesClusterSchedulerBackend: Shutting down all 
> executors
> 23/02/16 18:06:56 INFO 
> KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each 
> executor to shut down
> 23/02/16 18:06:56 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
> been closed.
> 23/02/16 18:06:57 INFO MapOutputTrackerMasterEndpoint: 
> MapOutputTrackerMasterEndpoint stopped!
> 23/02/16 18:06:57 INFO MemoryStore: MemoryStore cleared
> 23/02/16 18:06:57 INFO BlockManager: BlockManager stopped
> 23/02/16 18:06:57 INFO BlockManagerMaster: BlockManagerMaster stopped
> 23/02/16 18:06:57 INFO 
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
> OutputCommitCoordinator stopped!
> 23/02/16 18:06:57 INFO SparkContext: Successfully stopped SparkContext
> 23/02/16 18:06:57 INFO ShutdownHookManager: Shutdown hook called
> 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory 
> /tmp/spark-efb8f725-4ead-4729-a8e0-f478280121b7
> 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory 
> /spark-local2/spark-66dbf7e6-fe7e-4655-8724-69d76d93fc1f
> 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory 
> /spark-local1/spark-53aefaee-58a5-4fce-b5b0-5e29f42e337f{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42466) spark.kubernetes.file.upload.path not deleting files under HDFS after job completes

2023-02-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42466:


Assignee: Apache Spark

> spark.kubernetes.file.upload.path not deleting files under HDFS after job 
> completes
> ---
>
> Key: SPARK-42466
> URL: https://issues.apache.org/jira/browse/SPARK-42466
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: Jagadeeswara Rao
>Assignee: Apache Spark
>Priority: Major
>
> In cluster mode after uploading files to HDFS location using 
> spark.kubernetes.file.upload.path property files are not getting cleared . 
> File is successfully uploaded to hdfs location in this format 
> spark-upload-[randomUUID] using {{KubernetesUtils}} is requested to  
> uploadFileUri . 
> [https://github.com/apache/spark/blob/76a134ade60a9f354aca01eaca0b2e2477c6bd43/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala#L310]
> following is driver log  , driver is completed successfully and shutdownhook 
> is not cleared the hdfs files.
> {code:java}
> 23/02/16 18:06:56 INFO KubernetesClusterSchedulerBackend: Shutting down all 
> executors
> 23/02/16 18:06:56 INFO 
> KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each 
> executor to shut down
> 23/02/16 18:06:56 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
> been closed.
> 23/02/16 18:06:57 INFO MapOutputTrackerMasterEndpoint: 
> MapOutputTrackerMasterEndpoint stopped!
> 23/02/16 18:06:57 INFO MemoryStore: MemoryStore cleared
> 23/02/16 18:06:57 INFO BlockManager: BlockManager stopped
> 23/02/16 18:06:57 INFO BlockManagerMaster: BlockManagerMaster stopped
> 23/02/16 18:06:57 INFO 
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
> OutputCommitCoordinator stopped!
> 23/02/16 18:06:57 INFO SparkContext: Successfully stopped SparkContext
> 23/02/16 18:06:57 INFO ShutdownHookManager: Shutdown hook called
> 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory 
> /tmp/spark-efb8f725-4ead-4729-a8e0-f478280121b7
> 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory 
> /spark-local2/spark-66dbf7e6-fe7e-4655-8724-69d76d93fc1f
> 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory 
> /spark-local1/spark-53aefaee-58a5-4fce-b5b0-5e29f42e337f{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42530) Remove Hadoop 2 from PySpark installation guide

2023-02-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692333#comment-17692333
 ] 

Apache Spark commented on SPARK-42530:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40127

> Remove Hadoop 2 from PySpark installation guide
> ---
>
> Key: SPARK-42530
> URL: https://issues.apache.org/jira/browse/SPARK-42530
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42530) Remove Hadoop 2 from PySpark installation guide

2023-02-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42530:


Assignee: (was: Apache Spark)

> Remove Hadoop 2 from PySpark installation guide
> ---
>
> Key: SPARK-42530
> URL: https://issues.apache.org/jira/browse/SPARK-42530
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42530) Remove Hadoop 2 from PySpark installation guide

2023-02-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692332#comment-17692332
 ] 

Apache Spark commented on SPARK-42530:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40127

> Remove Hadoop 2 from PySpark installation guide
> ---
>
> Key: SPARK-42530
> URL: https://issues.apache.org/jira/browse/SPARK-42530
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    6   7   8   9   10   11   12   13   14   15   >