[jira] (SPARK-44050) Unable to Mount ConfigMap in Driver Pod - ConfigMap Creation Issue

2023-06-13 Thread Harshwardhan Singh Dodiya (Jira)


[ https://issues.apache.org/jira/browse/SPARK-44050 ]


Harshwardhan Singh Dodiya deleted comment on SPARK-44050:
---

was (Author: JIRAUSER300640):
!image-2023-06-14-11-07-36-960.png!

> Unable to Mount ConfigMap in Driver Pod - ConfigMap Creation Issue
> --
>
> Key: SPARK-44050
> URL: https://issues.apache.org/jira/browse/SPARK-44050
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Submit
>Affects Versions: 3.3.1
>Reporter: Harshwardhan Singh Dodiya
>Priority: Critical
> Attachments: image-2023-06-14-11-07-36-960.png
>
>
> Dear Spark community,
> I am facing an issue related to mounting a ConfigMap in the driver pod of my 
> Spark application. Upon investigation, I realized that the problem is caused 
> by the ConfigMap not being created successfully.
> *Problem Description:*
> When attempting to mount the ConfigMap in the driver pod, I encounter 
> consistent failures and my pod stays in containerCreating state. Upon further 
> investigation, I discovered that the ConfigMap does not exist in the 
> Kubernetes cluster, which results in the driver pod's inability to access the 
> required configuration data.
> *Additional Information:*
> I would like to highlight that this issue is not a frequent occurrence. It 
> has been observed randomly, affecting the mounting of the ConfigMap in the 
> driver pod only approximately 5% of the time. This intermittent behavior adds 
> complexity to the troubleshooting process, as it is challenging to reproduce 
> consistently.
> *Error Message:*
> when describing driver pod (kubectl describe pod pod_name)  get the below 
> error.
> "ConfigMap '' not found."
> *To Reproduce:*
> 1. Download spark 3.3.1 from [https://spark.apache.org/downloads.html]
> 2. create an image with "bin/docker-image-tool.sh"
> 3. Submit on spark-client via bash command by passing all the details and 
> configurations.
> 4. Randomly in some of the driver pod we can observe this issue.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44050) Unable to Mount ConfigMap in Driver Pod - ConfigMap Creation Issue

2023-06-13 Thread Harshwardhan Singh Dodiya (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732373#comment-17732373
 ] 

Harshwardhan Singh Dodiya commented on SPARK-44050:
---

!image-2023-06-14-11-07-36-960.png!

> Unable to Mount ConfigMap in Driver Pod - ConfigMap Creation Issue
> --
>
> Key: SPARK-44050
> URL: https://issues.apache.org/jira/browse/SPARK-44050
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Submit
>Affects Versions: 3.3.1
>Reporter: Harshwardhan Singh Dodiya
>Priority: Critical
> Attachments: image-2023-06-14-11-07-36-960.png
>
>
> Dear Spark community,
> I am facing an issue related to mounting a ConfigMap in the driver pod of my 
> Spark application. Upon investigation, I realized that the problem is caused 
> by the ConfigMap not being created successfully.
> *Problem Description:*
> When attempting to mount the ConfigMap in the driver pod, I encounter 
> consistent failures and my pod stays in containerCreating state. Upon further 
> investigation, I discovered that the ConfigMap does not exist in the 
> Kubernetes cluster, which results in the driver pod's inability to access the 
> required configuration data.
> *Additional Information:*
> I would like to highlight that this issue is not a frequent occurrence. It 
> has been observed randomly, affecting the mounting of the ConfigMap in the 
> driver pod only approximately 5% of the time. This intermittent behavior adds 
> complexity to the troubleshooting process, as it is challenging to reproduce 
> consistently.
> *Error Message:*
> when describing driver pod (kubectl describe pod pod_name)  get the below 
> error.
> "ConfigMap '' not found."
> *To Reproduce:*
> 1. Download spark 3.3.1 from [https://spark.apache.org/downloads.html]
> 2. create an image with "bin/docker-image-tool.sh"
> 3. Submit on spark-client via bash command by passing all the details and 
> configurations.
> 4. Randomly in some of the driver pod we can observe this issue.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44050) Unable to Mount ConfigMap in Driver Pod - ConfigMap Creation Issue

2023-06-13 Thread Harshwardhan Singh Dodiya (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshwardhan Singh Dodiya updated SPARK-44050:
--
Attachment: image-2023-06-14-11-07-36-960.png

> Unable to Mount ConfigMap in Driver Pod - ConfigMap Creation Issue
> --
>
> Key: SPARK-44050
> URL: https://issues.apache.org/jira/browse/SPARK-44050
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Submit
>Affects Versions: 3.3.1
>Reporter: Harshwardhan Singh Dodiya
>Priority: Critical
> Attachments: image-2023-06-14-11-07-36-960.png
>
>
> Dear Spark community,
> I am facing an issue related to mounting a ConfigMap in the driver pod of my 
> Spark application. Upon investigation, I realized that the problem is caused 
> by the ConfigMap not being created successfully.
> *Problem Description:*
> When attempting to mount the ConfigMap in the driver pod, I encounter 
> consistent failures and my pod stays in containerCreating state. Upon further 
> investigation, I discovered that the ConfigMap does not exist in the 
> Kubernetes cluster, which results in the driver pod's inability to access the 
> required configuration data.
> *Additional Information:*
> I would like to highlight that this issue is not a frequent occurrence. It 
> has been observed randomly, affecting the mounting of the ConfigMap in the 
> driver pod only approximately 5% of the time. This intermittent behavior adds 
> complexity to the troubleshooting process, as it is challenging to reproduce 
> consistently.
> *Error Message:*
> when describing driver pod (kubectl describe pod pod_name)  get the below 
> error.
> "ConfigMap '' not found."
> *To Reproduce:*
> 1. Download spark 3.3.1 from [https://spark.apache.org/downloads.html]
> 2. create an image with "bin/docker-image-tool.sh"
> 3. Submit on spark-client via bash command by passing all the details and 
> configurations.
> 4. Randomly in some of the driver pod we can observe this issue.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44050) Unable to Mount ConfigMap in Driver Pod - ConfigMap Creation Issue

2023-06-13 Thread Harshwardhan Singh Dodiya (Jira)
Harshwardhan Singh Dodiya created SPARK-44050:
-

 Summary: Unable to Mount ConfigMap in Driver Pod - ConfigMap 
Creation Issue
 Key: SPARK-44050
 URL: https://issues.apache.org/jira/browse/SPARK-44050
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes, Spark Submit
Affects Versions: 3.3.1
Reporter: Harshwardhan Singh Dodiya


Dear Spark community,

I am facing an issue related to mounting a ConfigMap in the driver pod of my 
Spark application. Upon investigation, I realized that the problem is caused by 
the ConfigMap not being created successfully.

*Problem Description:*
When attempting to mount the ConfigMap in the driver pod, I encounter 
consistent failures and my pod stays in containerCreating state. Upon further 
investigation, I discovered that the ConfigMap does not exist in the Kubernetes 
cluster, which results in the driver pod's inability to access the required 
configuration data.

*Additional Information:*

I would like to highlight that this issue is not a frequent occurrence. It has 
been observed randomly, affecting the mounting of the ConfigMap in the driver 
pod only approximately 5% of the time. This intermittent behavior adds 
complexity to the troubleshooting process, as it is challenging to reproduce 
consistently.

*Error Message:*

when describing driver pod (kubectl describe pod pod_name)  get the below error.

"ConfigMap '' not found."

*To Reproduce:*

1. Download spark 3.3.1 from [https://spark.apache.org/downloads.html]

2. create an image with "bin/docker-image-tool.sh"

3. Submit on spark-client via bash command by passing all the details and 
configurations.

4. Randomly in some of the driver pod we can observe this issue.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44048) Remove sql-migration-old.md

2023-06-13 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732353#comment-17732353
 ] 

Snoot.io commented on SPARK-44048:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/41583

> Remove sql-migration-old.md
> ---
>
> Key: SPARK-44048
> URL: https://issues.apache.org/jira/browse/SPARK-44048
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43932) Add current_* functions to Scala and Python

2023-06-13 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732352#comment-17732352
 ] 

Snoot.io commented on SPARK-43932:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/41516

> Add current_* functions to Scala and Python
> ---
>
> Key: SPARK-43932
> URL: https://issues.apache.org/jira/browse/SPARK-43932
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.5.0
>
>
> Add following functions:
> * curdate
> * current_catalog
> * current_database
> * current_schema
> * current_timezone
> * current_user
> * user
> to:
> * Scala API
> * Python API
> * Spark Connect Scala Client
> * Spark Connect Python Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44045) Mark `WholeStageCodegenSparkSubmitSuite` as `ExtendedSQLTest`

2023-06-13 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732351#comment-17732351
 ] 

Snoot.io commented on SPARK-44045:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/41579

> Mark `WholeStageCodegenSparkSubmitSuite` as `ExtendedSQLTest`
> -
>
> Key: SPARK-44045
> URL: https://issues.apache.org/jira/browse/SPARK-44045
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43932) Add current_* functions to Scala and Python

2023-06-13 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-43932.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41516
[https://github.com/apache/spark/pull/41516]

> Add current_* functions to Scala and Python
> ---
>
> Key: SPARK-43932
> URL: https://issues.apache.org/jira/browse/SPARK-43932
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.5.0
>
>
> Add following functions:
> * curdate
> * current_catalog
> * current_database
> * current_schema
> * current_timezone
> * current_user
> * user
> to:
> * Scala API
> * Python API
> * Spark Connect Scala Client
> * Spark Connect Python Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43932) Add current_* functions to Scala and Python

2023-06-13 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-43932:
-

Assignee: Ruifeng Zheng

> Add current_* functions to Scala and Python
> ---
>
> Key: SPARK-43932
> URL: https://issues.apache.org/jira/browse/SPARK-43932
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>
> Add following functions:
> * curdate
> * current_catalog
> * current_database
> * current_schema
> * current_timezone
> * current_user
> * user
> to:
> * Scala API
> * Python API
> * Spark Connect Scala Client
> * Spark Connect Python Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43981) Basic saving / loading implementation

2023-06-13 Thread Weichen Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weichen Xu resolved SPARK-43981.

Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41478
[https://github.com/apache/spark/pull/41478]

> Basic saving / loading implementation
> -
>
> Key: SPARK-43981
> URL: https://issues.apache.org/jira/browse/SPARK-43981
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML, PySpark
>Affects Versions: 3.5.0
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Major
> Fix For: 3.5.0
>
>
> Support saving/loading  for estimator / transformer / evaluator / model.
> We have some design goals:
>  * The model format is decoupled from spark, i.e. we can run model inference 
> without spark service.
>  * We can save model to either local file system or cloud storage file system.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43655) Enable NamespaceParityTests.test_get_index_map

2023-06-13 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732348#comment-17732348
 ] 

Snoot.io commented on SPARK-43655:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/41587

> Enable NamespaceParityTests.test_get_index_map
> --
>
> Key: SPARK-43655
> URL: https://issues.apache.org/jira/browse/SPARK-43655
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable NamespaceParityTests.test_get_index_map



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43654) Enable InternalFrameParityTests.test_from_pandas

2023-06-13 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732347#comment-17732347
 ] 

Snoot.io commented on SPARK-43654:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/41587

> Enable InternalFrameParityTests.test_from_pandas
> 
>
> Key: SPARK-43654
> URL: https://issues.apache.org/jira/browse/SPARK-43654
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable InternalFrameParityTests.test_from_pandas



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43655) Enable NamespaceParityTests.test_get_index_map

2023-06-13 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732346#comment-17732346
 ] 

Snoot.io commented on SPARK-43655:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/41587

> Enable NamespaceParityTests.test_get_index_map
> --
>
> Key: SPARK-43655
> URL: https://issues.apache.org/jira/browse/SPARK-43655
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable NamespaceParityTests.test_get_index_map



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43654) Enable InternalFrameParityTests.test_from_pandas

2023-06-13 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732345#comment-17732345
 ] 

Snoot.io commented on SPARK-43654:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/41587

> Enable InternalFrameParityTests.test_from_pandas
> 
>
> Key: SPARK-43654
> URL: https://issues.apache.org/jira/browse/SPARK-43654
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable InternalFrameParityTests.test_from_pandas



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43474) Add support to create DataFrame Reference in Spark connect

2023-06-13 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732344#comment-17732344
 ] 

Snoot.io commented on SPARK-43474:
--

User 'rangadi' has created a pull request for this issue:
https://github.com/apache/spark/pull/41580

> Add support to create DataFrame Reference in Spark connect
> --
>
> Key: SPARK-43474
> URL: https://issues.apache.org/jira/browse/SPARK-43474
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Peng Zhong
>Priority: Major
>
> Add support in Spark Connect to cache a DataFrame on server side. From client 
> side, it can create a reference to that DataFrame given the cache key.
>  
> This function will be used in streaming foreachBatch(). Server needs to call 
> user function for every batch which takes a DataFrame as argument. With the 
> new function, we can just cache the DataFrame on the server. Pass the id back 
> to client which can creates the DataFrame reference. The server will replace 
> the reference when transforming.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44049) Fix KubernetesSuite to use `inNamespace` for validating driver pod cleanup

2023-06-13 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732341#comment-17732341
 ] 

Snoot.io commented on SPARK-44049:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/41586

> Fix KubernetesSuite to use `inNamespace` for validating driver pod cleanup
> --
>
> Key: SPARK-44049
> URL: https://issues.apache.org/jira/browse/SPARK-44049
> Project: Spark
>  Issue Type: Test
>  Components: Kubernetes, Tests
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44049) Fix KubernetesSuite to use `inNamespace` for validating driver pod cleanup

2023-06-13 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732342#comment-17732342
 ] 

Snoot.io commented on SPARK-44049:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/41586

> Fix KubernetesSuite to use `inNamespace` for validating driver pod cleanup
> --
>
> Key: SPARK-44049
> URL: https://issues.apache.org/jira/browse/SPARK-44049
> Project: Spark
>  Issue Type: Test
>  Components: Kubernetes, Tests
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44049) Fix KubernetesSuite to use `inNamespace` for validating driver pod cleanup

2023-06-13 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-44049:
-

 Summary: Fix KubernetesSuite to use `inNamespace` for validating 
driver pod cleanup
 Key: SPARK-44049
 URL: https://issues.apache.org/jira/browse/SPARK-44049
 Project: Spark
  Issue Type: Test
  Components: Kubernetes, Tests
Affects Versions: 3.5.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43622) Enable pyspark.pandas.spark.functions.var in Spark Connect.

2023-06-13 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-43622:
-

Assignee: Ruifeng Zheng

> Enable pyspark.pandas.spark.functions.var in Spark Connect.
> ---
>
> Key: SPARK-43622
> URL: https://issues.apache.org/jira/browse/SPARK-43622
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Ruifeng Zheng
>Priority: Major
>
> Enable pyspark.pandas.spark.functions.var in Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43663) Enable SeriesParityTests.test_compare

2023-06-13 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-43663:
-

Assignee: Haejoon Lee

> Enable SeriesParityTests.test_compare
> -
>
> Key: SPARK-43663
> URL: https://issues.apache.org/jira/browse/SPARK-43663
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> Enable SeriesParityTests.test_compare



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43663) Enable SeriesParityTests.test_compare

2023-06-13 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-43663.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41567
[https://github.com/apache/spark/pull/41567]

> Enable SeriesParityTests.test_compare
> -
>
> Key: SPARK-43663
> URL: https://issues.apache.org/jira/browse/SPARK-43663
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.5.0
>
>
> Enable SeriesParityTests.test_compare



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44048) Remove sql-migration-old.md

2023-06-13 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732336#comment-17732336
 ] 

Yuming Wang commented on SPARK-44048:
-

https://github.com/apache/spark/pull/41583

> Remove sql-migration-old.md
> ---
>
> Key: SPARK-44048
> URL: https://issues.apache.org/jira/browse/SPARK-44048
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44048) Remove sql-migration-old.md

2023-06-13 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-44048:
---

 Summary: Remove sql-migration-old.md
 Key: SPARK-44048
 URL: https://issues.apache.org/jira/browse/SPARK-44048
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 3.5.0
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43486) number of files read is incorrect if it is bucket table

2023-06-13 Thread Jia Fan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732320#comment-17732320
 ] 

Jia Fan commented on SPARK-43486:
-

I didn't reproduce it too.:(

> number of files read is incorrect if it is bucket table
> ---
>
> Key: SPARK-43486
> URL: https://issues.apache.org/jira/browse/SPARK-43486
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
> Attachments: screenshot-1.png
>
>
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44021) Add spark.sql.files.maxPartitionNum

2023-06-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44021:
--
Summary: Add spark.sql.files.maxPartitionNum  (was: Add a config to make it 
do not generate too many partitions)

> Add spark.sql.files.maxPartitionNum
> ---
>
> Key: SPARK-44021
> URL: https://issues.apache.org/jira/browse/SPARK-44021
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44021) Add a config to make it do not generate too many partitions

2023-06-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-44021:
-

Assignee: Yuming Wang

> Add a config to make it do not generate too many partitions
> ---
>
> Key: SPARK-44021
> URL: https://issues.apache.org/jira/browse/SPARK-44021
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44021) Add a config to make it do not generate too many partitions

2023-06-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-44021.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41545
[https://github.com/apache/spark/pull/41545]

> Add a config to make it do not generate too many partitions
> ---
>
> Key: SPARK-44021
> URL: https://issues.apache.org/jira/browse/SPARK-44021
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44047) Upgrade google guava for connect from 31.0.1-jre to 32.0.1-jre

2023-06-13 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-44047:
---

 Summary: Upgrade google guava for connect from 31.0.1-jre to 
32.0.1-jre
 Key: SPARK-44047
 URL: https://issues.apache.org/jira/browse/SPARK-44047
 Project: Spark
  Issue Type: Improvement
  Components: Build, Connect
Affects Versions: 3.5.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43486) number of files read is incorrect if it is bucket table

2023-06-13 Thread BingKun Pan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732308#comment-17732308
 ] 

BingKun Pan commented on SPARK-43486:
-

Sorry, I didn't reproduce it.

> number of files read is incorrect if it is bucket table
> ---
>
> Key: SPARK-43486
> URL: https://issues.apache.org/jira/browse/SPARK-43486
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
> Attachments: screenshot-1.png
>
>
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43934) Add regexp_* functions to Scala and Python

2023-06-13 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-43934:
-

Assignee: jiaan.geng

> Add regexp_* functions to Scala and Python
> --
>
> Key: SPARK-43934
> URL: https://issues.apache.org/jira/browse/SPARK-43934
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Assignee: jiaan.geng
>Priority: Major
>
> Add following functions:
> * rlike
> * regexp
> * regexp_count
> * regexp_extract_all
> * regexp_instr
> * regexp_like
> * regexp_substr
> to:
> * Scala API
> * Python API
> * Spark Connect Scala Client
> * Spark Connect Python Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43934) Add regexp_* functions to Scala and Python

2023-06-13 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-43934.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41515
[https://github.com/apache/spark/pull/41515]

> Add regexp_* functions to Scala and Python
> --
>
> Key: SPARK-43934
> URL: https://issues.apache.org/jira/browse/SPARK-43934
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.5.0
>
>
> Add following functions:
> * rlike
> * regexp
> * regexp_count
> * regexp_extract_all
> * regexp_instr
> * regexp_like
> * regexp_substr
> to:
> * Scala API
> * Python API
> * Spark Connect Scala Client
> * Spark Connect Python Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43691) Enable NumOpsParityTests.test_ne.

2023-06-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-43691:


Assignee: Haejoon Lee

> Enable NumOpsParityTests.test_ne.
> -
>
> Key: SPARK-43691
> URL: https://issues.apache.org/jira/browse/SPARK-43691
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43684) Fix NullOps.eq to work with Spark Connect Column.

2023-06-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-43684:


Assignee: Haejoon Lee

> Fix NullOps.eq to work with Spark Connect Column.
> -
>
> Key: SPARK-43684
> URL: https://issues.apache.org/jira/browse/SPARK-43684
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43684) Fix NullOps.eq to work with Spark Connect Column.

2023-06-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-43684.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41514
[https://github.com/apache/spark/pull/41514]

> Fix NullOps.eq to work with Spark Connect Column.
> -
>
> Key: SPARK-43684
> URL: https://issues.apache.org/jira/browse/SPARK-43684
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43691) Enable NumOpsParityTests.test_ne.

2023-06-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-43691.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41514
[https://github.com/apache/spark/pull/41514]

> Enable NumOpsParityTests.test_ne.
> -
>
> Key: SPARK-43691
> URL: https://issues.apache.org/jira/browse/SPARK-43691
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43686) Enable NumOpsParityTests.test_eq

2023-06-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-43686.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41514
[https://github.com/apache/spark/pull/41514]

> Enable NumOpsParityTests.test_eq
> 
>
> Key: SPARK-43686
> URL: https://issues.apache.org/jira/browse/SPARK-43686
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43685) Fix NullOps.ne to work with Spark Connect Column.

2023-06-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-43685:


Assignee: Haejoon Lee

> Fix NullOps.ne to work with Spark Connect Column.
> -
>
> Key: SPARK-43685
> URL: https://issues.apache.org/jira/browse/SPARK-43685
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43685) Fix NullOps.ne to work with Spark Connect Column.

2023-06-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-43685.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41514
[https://github.com/apache/spark/pull/41514]

> Fix NullOps.ne to work with Spark Connect Column.
> -
>
> Key: SPARK-43685
> URL: https://issues.apache.org/jira/browse/SPARK-43685
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43686) Enable NumOpsParityTests.test_eq

2023-06-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-43686:


Assignee: Haejoon Lee

> Enable NumOpsParityTests.test_eq
> 
>
> Key: SPARK-43686
> URL: https://issues.apache.org/jira/browse/SPARK-43686
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44046) Pyspark StreamingQueryListener listListener

2023-06-13 Thread Wei Liu (Jira)
Wei Liu created SPARK-44046:
---

 Summary: Pyspark StreamingQueryListener listListener
 Key: SPARK-44046
 URL: https://issues.apache.org/jira/browse/SPARK-44046
 Project: Spark
  Issue Type: New Feature
  Components: Structured Streaming
Affects Versions: 3.5.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43922) Add named argument support in parser for function call

2023-06-13 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732290#comment-17732290
 ] 

Hudson commented on SPARK-43922:


User 'learningchess2003' has created a pull request for this issue:
https://github.com/apache/spark/pull/41429

> Add named argument support in parser for function call
> --
>
> Key: SPARK-43922
> URL: https://issues.apache.org/jira/browse/SPARK-43922
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Richard Yu
>Priority: Major
>
> Today, we are implementing named argument support for user defined functions, 
> some built-in functions, and table-valued functions. For the first step 
> towards building such a feature, we need to make some requisite changes in 
> the parser. 
> To accomplish this, in this issue, we plan to add some new syntax tokens to 
> the parser in Spark. Changes will also be made in the abstract syntax tree 
> builder as well to reflect these new tokens. Such changes will first be 
> restricted to normal function calls (table value functions will be treated 
> separately). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38162) Optimize one row plan in normal and AQE Optimizer

2023-06-13 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reassigned SPARK-38162:
---

Assignee: XiDuo You

> Optimize one row plan in normal and AQE Optimizer
> -
>
> Key: SPARK-38162
> URL: https://issues.apache.org/jira/browse/SPARK-38162
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.3.0
>
>
> Optimize the plan if its max row is equal to or less than 1 in these cases:
> - if the child of sort max rows less than or equal to 1, remove the sort
> - if the child of local sort max rows per partition less than or equal to 1, 
> remove the local sort
> - if the child of aggregate max rows less than or equal to 1 and it's 
> grouping only (include the rewritten distinct plan), remove the aggregate
> - if the child of aggregate max rows less than or equal to 1, set distinct to 
> false in all aggregate expression



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44045) Mark `WholeStageCodegenSparkSubmitSuite` as `ExtendedSQLTest`

2023-06-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-44045:
-

Assignee: Dongjoon Hyun

> Mark `WholeStageCodegenSparkSubmitSuite` as `ExtendedSQLTest`
> -
>
> Key: SPARK-44045
> URL: https://issues.apache.org/jira/browse/SPARK-44045
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44045) Mark `WholeStageCodegenSparkSubmitSuite` as `ExtendedSQLTest`

2023-06-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-44045.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41579
[https://github.com/apache/spark/pull/41579]

> Mark `WholeStageCodegenSparkSubmitSuite` as `ExtendedSQLTest`
> -
>
> Key: SPARK-44045
> URL: https://issues.apache.org/jira/browse/SPARK-44045
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44045) Mark `WholeStageCodegenSparkSubmitSuite` as `ExtendedSQLTest`

2023-06-13 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-44045:
-

 Summary: Mark `WholeStageCodegenSparkSubmitSuite` as 
`ExtendedSQLTest`
 Key: SPARK-44045
 URL: https://issues.apache.org/jira/browse/SPARK-44045
 Project: Spark
  Issue Type: Test
  Components: SQL, Tests
Affects Versions: 3.5.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44041) Upgrade ammonite to 2.5.9

2023-06-13 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732236#comment-17732236
 ] 

Dongjoon Hyun commented on SPARK-44041:
---

Feel free to ping me when you make a PR~ I'm highly interested in validating 
and bringing this into Apache Spark repo, [~LuciferYang].

> Upgrade ammonite to 2.5.9
> -
>
> Key: SPARK-44041
> URL: https://issues.apache.org/jira/browse/SPARK-44041
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> For support Scala 2.12.18 & 2.13.11
>  
> already has a tag : 
> [https://github.com/com-lihaoyi/Ammonite/releases/tag/2.5.9]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44044) Improve Error message for SQL Window functions

2023-06-13 Thread Siying Dong (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732218#comment-17732218
 ] 

Siying Dong commented on SPARK-44044:
-

OSS PR created: [https://github.com/apache/spark/pull/41578/]

[~kabhwan] can you help take a look?

> Improve Error message for SQL Window functions
> --
>
> Key: SPARK-44044
> URL: https://issues.apache.org/jira/browse/SPARK-44044
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Siying Dong
>Priority: Trivial
>
> Right now, if window spec is used with a stream query, the error message 
> looks like following:
> Non-time-based windows are not supported on streaming DataFrames/Datasets;
> Window [... 
> The message isn't very helpful to identify what's the problem is and some 
> customers and even support engineers got confused by this. It is suggested 
> that we call out aggregation function over the window spec so that the users 
> can locate the part of the query that caused the problem easier.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44044) Improve Error message for SQL Window functions

2023-06-13 Thread Siying Dong (Jira)
Siying Dong created SPARK-44044:
---

 Summary: Improve Error message for SQL Window functions
 Key: SPARK-44044
 URL: https://issues.apache.org/jira/browse/SPARK-44044
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.5.0
Reporter: Siying Dong


Right now, if window spec is used with a stream query, the error message looks 
like following:
Non-time-based windows are not supported on streaming DataFrames/Datasets;
Window [... 
The message isn't very helpful to identify what's the problem is and some 
customers and even support engineers got confused by this. It is suggested that 
we call out aggregation function over the window spec so that the users can 
locate the part of the query that caused the problem easier.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44018) Improve the hashCode for Some DS V2 Expression

2023-06-13 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732189#comment-17732189
 ] 

Dongjoon Hyun commented on SPARK-44018:
---

It seems that I'm confused. "Improve XXX" means this is not a bug fix. Is this 
just an improvement PR for Apache Spark 3.5.0?

> Improve the hashCode for Some DS V2 Expression
> --
>
> Key: SPARK-44018
> URL: https://issues.apache.org/jira/browse/SPARK-44018
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> The hashCode() of UserDefinedScalarFunc and GeneralScalarExpression is not 
> good enough. UserDefinedAggregateFunc and GeneralAggregateFunc missing 
> hashCode()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44018) Improve the hashCode for Some DS V2 Expression

2023-06-13 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732188#comment-17732188
 ] 

Dongjoon Hyun commented on SPARK-44018:
---

Hi, [~beliefer]. We need to update `Affected Version` of this JIRA. This JIRA 
is 3.5.0 which means this is irrelevant to `branch-3.4` or Apache Spark 3.4.1.

> Improve the hashCode for Some DS V2 Expression
> --
>
> Key: SPARK-44018
> URL: https://issues.apache.org/jira/browse/SPARK-44018
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> The hashCode() of UserDefinedScalarFunc and GeneralScalarExpression is not 
> good enough. UserDefinedAggregateFunc and GeneralAggregateFunc missing 
> hashCode()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44043) Reuse main scan exchange in group-based UPDATEs

2023-06-13 Thread Anton Okolnychyi (Jira)
Anton Okolnychyi created SPARK-44043:


 Summary: Reuse main scan exchange in group-based UPDATEs
 Key: SPARK-44043
 URL: https://issues.apache.org/jira/browse/SPARK-44043
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Anton Okolnychyi


Group-based UPDATE operations rewritten using UNION should reuse the main scan 
exchange.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44040) Incorrect result after count distinct

2023-06-13 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-44040:

Affects Version/s: 3.3.2

> Incorrect result after count distinct
> -
>
> Key: SPARK-44040
> URL: https://issues.apache.org/jira/browse/SPARK-44040
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Aleksandr Aleksandrov
>Priority: Critical
>
> When i try to call count after distinct function for Decimal null field, 
> spark return incorrect result starting from spark 3.4.0.
> A minimal example to reproduce:
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.\{Column, DataFrame, Dataset, Row, SparkSession}
> import org.apache.spark.sql.types.\{StringType, StructField, StructType}
> val schema = StructType( Array(
> StructField("money", DecimalType(38,6), true),
> StructField("reference_id", StringType, true)
> ))
> val payDf = spark.createDataFrame(sc.emptyRDD[Row], schema)
> val aggDf = payDf.agg(sum("money").as("money")).withColumn("name", lit("df1"))
> val aggDf1 = payDf.agg(sum("money").as("money")).withColumn("name", 
> lit("df2"))
> val unionDF: DataFrame = aggDf.union(aggDf1)
> unionDF.select("money").distinct.show // return correct result
> unionDF.select("money").distinct.count // return 2 instead of 1
> unionDF.select("money").distinct.count == 1 // return false
> This block of code returns some assertion error and after that an incorrect 
> count (in spark 3.2.1 everything works fine and i get correct result = 1):
> *scala> unionDF.select("money").distinct.show // return correct result*
> java.lang.AssertionError: assertion failed:
> Decimal$DecimalIsFractional
> while compiling: 
> during phase: globalPhase=terminal, enteringPhase=jvm
> library version: version 2.12.17
> compiler version: version 2.12.17
> reconstructed args: -classpath 
> /Users/aleksandrov/.ivy2/jars/org.apache.spark_spark-connect_2.12-3.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-core_2.12-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-storage-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar:/Users/aleksandrov/.ivy2/jars/org.antlr_antlr4-runtime-4.9.3.jar
>  -Yrepl-class-based -Yrepl-outdir 
> /private/var/folders/qj/_dn4xbp14jn37qmdk7ylyfwcgr/T/spark-f37bb154-75f3-4db7-aea8-3c4363377bd8/repl-350f37a1-1df1-4816-bd62-97929c60a6c1
> last tree to typer: TypeTree(class Byte)
> tree position: line 6 of 
> tree tpe: Byte
> symbol: (final abstract) class Byte in package scala
> symbol definition: final abstract class Byte extends (a ClassSymbol)
> symbol package: scala
> symbol owners: class Byte
> call site: constructor $eval in object $eval in package $line19
> == Source file context for tree position ==
> 3
> 4object $eval {
> 5lazyval $result = 
> $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.res0
> 6lazyval $print: {_}root{_}.java.lang.String = {
> 7 $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw
> 8
> 9""
> at 
> scala.reflect.internal.SymbolTable.throwAssertionError(SymbolTable.scala:185)
> at scala.reflect.internal.Symbols$Symbol.completeInfo(Symbols.scala:1525)
> at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1514)
> at scala.reflect.internal.Symbols$Symbol.flatOwnerInfo(Symbols.scala:2353)
> at 
> scala.reflect.internal.Symbols$ClassSymbol.companionModule0(Symbols.scala:3346)
> at 
> scala.reflect.internal.Symbols$ClassSymbol.companionModule(Symbols.scala:3348)
> at 
> scala.reflect.internal.Symbols$ModuleClassSymbol.sourceModule(Symbols.scala:3487)
> at 
> scala.reflect.internal.Symbols.$anonfun$forEachRelevantSymbols$1$adapted(Symbols.scala:3802)
> at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
> at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
> at scala.reflect.internal.Symbols.markFlagsCompleted(Symbols.scala:3799)
> at scala.reflect.internal.Symbols.markFlagsCompleted$(Symbols.scala:3805)
> at scala.reflect.internal.SymbolTable.markFlagsCompleted(SymbolTable.scala:28)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.finishSym$1(UnPickler.scala:324)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:342)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.readSymbolRef(UnPickler.scala:645)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.readType(UnPickler.scala:413)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.$anonfun$readSymbol$10(UnPickler.scala:357)
> at scala.reflect.internal.pickling.UnPickler$Scan.at(UnPickler.scala:188)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:357)
> at 
> 

[jira] [Resolved] (SPARK-44016) Artifacts with name as an absolute path may overwrite other files

2023-06-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-44016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-44016.
---
Resolution: Fixed

> Artifacts with name as an absolute path may overwrite other files 
> --
>
> Key: SPARK-44016
> URL: https://issues.apache.org/jira/browse/SPARK-44016
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Venkata Sai Akhil Gudesa
>Assignee: Venkata Sai Akhil Gudesa
>Priority: Major
> Fix For: 3.5.0
>
>
> In `SparkConnectAddArtifactsHandler`, an artifact being moved to a staging 
> location may overwrite another file when the `name`/`path` of the artifact is 
> an `absolute` path. 
> This happens when the 
> [stagedPath|https://github.com/apache/spark/blob/master/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectAddArtifactsHandler.scala#L172]
>  is being computed with the help of the `.resolve(...)` method where the 
> `resolve` method returns the `other` path (in this case, the name of the 
> artifact) if the `other` path is an absolute path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44041) Upgrade ammonite to 2.5.9

2023-06-13 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732173#comment-17732173
 ] 

Dongjoon Hyun commented on SPARK-44041:
---

Great! Thank you for making this happen, [~LuciferYang]!

> Upgrade ammonite to 2.5.9
> -
>
> Key: SPARK-44041
> URL: https://issues.apache.org/jira/browse/SPARK-44041
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> For support Scala 2.12.18 & 2.13.11
>  
> already has a tag : 
> [https://github.com/com-lihaoyi/Ammonite/releases/tag/2.5.9]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44040) Incorrect result after count distinct

2023-06-13 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732170#comment-17732170
 ] 

Yuming Wang commented on SPARK-44040:
-

https://github.com/apache/spark/pull/41576

> Incorrect result after count distinct
> -
>
> Key: SPARK-44040
> URL: https://issues.apache.org/jira/browse/SPARK-44040
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Aleksandr Aleksandrov
>Priority: Critical
>
> When i try to call count after distinct function for Decimal null field, 
> spark return incorrect result starting from spark 3.4.0.
> A minimal example to reproduce:
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.\{Column, DataFrame, Dataset, Row, SparkSession}
> import org.apache.spark.sql.types.\{StringType, StructField, StructType}
> val schema = StructType( Array(
> StructField("money", DecimalType(38,6), true),
> StructField("reference_id", StringType, true)
> ))
> val payDf = spark.createDataFrame(sc.emptyRDD[Row], schema)
> val aggDf = payDf.agg(sum("money").as("money")).withColumn("name", lit("df1"))
> val aggDf1 = payDf.agg(sum("money").as("money")).withColumn("name", 
> lit("df2"))
> val unionDF: DataFrame = aggDf.union(aggDf1)
> unionDF.select("money").distinct.show // return correct result
> unionDF.select("money").distinct.count // return 2 instead of 1
> unionDF.select("money").distinct.count == 1 // return false
> This block of code returns some assertion error and after that an incorrect 
> count (in spark 3.2.1 everything works fine and i get correct result = 1):
> *scala> unionDF.select("money").distinct.show // return correct result*
> java.lang.AssertionError: assertion failed:
> Decimal$DecimalIsFractional
> while compiling: 
> during phase: globalPhase=terminal, enteringPhase=jvm
> library version: version 2.12.17
> compiler version: version 2.12.17
> reconstructed args: -classpath 
> /Users/aleksandrov/.ivy2/jars/org.apache.spark_spark-connect_2.12-3.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-core_2.12-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-storage-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar:/Users/aleksandrov/.ivy2/jars/org.antlr_antlr4-runtime-4.9.3.jar
>  -Yrepl-class-based -Yrepl-outdir 
> /private/var/folders/qj/_dn4xbp14jn37qmdk7ylyfwcgr/T/spark-f37bb154-75f3-4db7-aea8-3c4363377bd8/repl-350f37a1-1df1-4816-bd62-97929c60a6c1
> last tree to typer: TypeTree(class Byte)
> tree position: line 6 of 
> tree tpe: Byte
> symbol: (final abstract) class Byte in package scala
> symbol definition: final abstract class Byte extends (a ClassSymbol)
> symbol package: scala
> symbol owners: class Byte
> call site: constructor $eval in object $eval in package $line19
> == Source file context for tree position ==
> 3
> 4object $eval {
> 5lazyval $result = 
> $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.res0
> 6lazyval $print: {_}root{_}.java.lang.String = {
> 7 $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw
> 8
> 9""
> at 
> scala.reflect.internal.SymbolTable.throwAssertionError(SymbolTable.scala:185)
> at scala.reflect.internal.Symbols$Symbol.completeInfo(Symbols.scala:1525)
> at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1514)
> at scala.reflect.internal.Symbols$Symbol.flatOwnerInfo(Symbols.scala:2353)
> at 
> scala.reflect.internal.Symbols$ClassSymbol.companionModule0(Symbols.scala:3346)
> at 
> scala.reflect.internal.Symbols$ClassSymbol.companionModule(Symbols.scala:3348)
> at 
> scala.reflect.internal.Symbols$ModuleClassSymbol.sourceModule(Symbols.scala:3487)
> at 
> scala.reflect.internal.Symbols.$anonfun$forEachRelevantSymbols$1$adapted(Symbols.scala:3802)
> at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
> at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
> at scala.reflect.internal.Symbols.markFlagsCompleted(Symbols.scala:3799)
> at scala.reflect.internal.Symbols.markFlagsCompleted$(Symbols.scala:3805)
> at scala.reflect.internal.SymbolTable.markFlagsCompleted(SymbolTable.scala:28)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.finishSym$1(UnPickler.scala:324)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:342)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.readSymbolRef(UnPickler.scala:645)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.readType(UnPickler.scala:413)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.$anonfun$readSymbol$10(UnPickler.scala:357)
> at scala.reflect.internal.pickling.UnPickler$Scan.at(UnPickler.scala:188)
> at 
> 

[jira] [Commented] (SPARK-44040) Incorrect result after count distinct

2023-06-13 Thread Bruce Robbins (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732163#comment-17732163
 ] 

Bruce Robbins commented on SPARK-44040:
---

It seems this can be reproduced in {{spark-sql}} as well.

Interestingly, turning off AQE seems to fix the issue (for both the above 
dataframe version and the below SQL version):
{noformat}
spark-sql (default)> create or replace temp view v1 as
select 1 as c1 limit 0;
Time taken: 0.959 seconds
spark-sql (default)> create or replace temp view agg1 as
select sum(c1) as c1, "agg1" as name
from v1;
Time taken: 0.16 seconds
spark-sql (default)> create or replace temp view agg2 as
select sum(c1) as c1, "agg2" as name
from v1;
Time taken: 0.035 seconds
spark-sql (default)> create or replace temp view union1 as
select * from agg1
union
select * from agg2;
Time taken: 0.088 seconds
spark-sql (default)> -- the following incorrectly produces 2 rows
select distinct c1 from union1;
NULL
NULL
Time taken: 1.649 seconds, Fetched 2 row(s)
spark-sql (default)> set spark.sql.adaptive.enabled=false;
spark.sql.adaptive.enabled  false
Time taken: 0.019 seconds, Fetched 1 row(s)
spark-sql (default)> -- the following correctly produces 1 row
select distinct c1 from union1;
NULL
Time taken: 1.372 seconds, Fetched 1 row(s)
spark-sql (default)> 
{noformat}

> Incorrect result after count distinct
> -
>
> Key: SPARK-44040
> URL: https://issues.apache.org/jira/browse/SPARK-44040
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Aleksandr Aleksandrov
>Priority: Critical
>
> When i try to call count after distinct function for Decimal null field, 
> spark return incorrect result starting from spark 3.4.0.
> A minimal example to reproduce:
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.\{Column, DataFrame, Dataset, Row, SparkSession}
> import org.apache.spark.sql.types.\{StringType, StructField, StructType}
> val schema = StructType( Array(
> StructField("money", DecimalType(38,6), true),
> StructField("reference_id", StringType, true)
> ))
> val payDf = spark.createDataFrame(sc.emptyRDD[Row], schema)
> val aggDf = payDf.agg(sum("money").as("money")).withColumn("name", lit("df1"))
> val aggDf1 = payDf.agg(sum("money").as("money")).withColumn("name", 
> lit("df2"))
> val unionDF: DataFrame = aggDf.union(aggDf1)
> unionDF.select("money").distinct.show // return correct result
> unionDF.select("money").distinct.count // return 2 instead of 1
> unionDF.select("money").distinct.count == 1 // return false
> This block of code returns some assertion error and after that an incorrect 
> count (in spark 3.2.1 everything works fine and i get correct result = 1):
> *scala> unionDF.select("money").distinct.show // return correct result*
> java.lang.AssertionError: assertion failed:
> Decimal$DecimalIsFractional
> while compiling: 
> during phase: globalPhase=terminal, enteringPhase=jvm
> library version: version 2.12.17
> compiler version: version 2.12.17
> reconstructed args: -classpath 
> /Users/aleksandrov/.ivy2/jars/org.apache.spark_spark-connect_2.12-3.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-core_2.12-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-storage-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar:/Users/aleksandrov/.ivy2/jars/org.antlr_antlr4-runtime-4.9.3.jar
>  -Yrepl-class-based -Yrepl-outdir 
> /private/var/folders/qj/_dn4xbp14jn37qmdk7ylyfwcgr/T/spark-f37bb154-75f3-4db7-aea8-3c4363377bd8/repl-350f37a1-1df1-4816-bd62-97929c60a6c1
> last tree to typer: TypeTree(class Byte)
> tree position: line 6 of 
> tree tpe: Byte
> symbol: (final abstract) class Byte in package scala
> symbol definition: final abstract class Byte extends (a ClassSymbol)
> symbol package: scala
> symbol owners: class Byte
> call site: constructor $eval in object $eval in package $line19
> == Source file context for tree position ==
> 3
> 4object $eval {
> 5lazyval $result = 
> $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.res0
> 6lazyval $print: {_}root{_}.java.lang.String = {
> 7 $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw
> 8
> 9""
> at 
> scala.reflect.internal.SymbolTable.throwAssertionError(SymbolTable.scala:185)
> at scala.reflect.internal.Symbols$Symbol.completeInfo(Symbols.scala:1525)
> at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1514)
> at scala.reflect.internal.Symbols$Symbol.flatOwnerInfo(Symbols.scala:2353)
> at 
> scala.reflect.internal.Symbols$ClassSymbol.companionModule0(Symbols.scala:3346)
> at 
> scala.reflect.internal.Symbols$ClassSymbol.companionModule(Symbols.scala:3348)
> at 
> scala.reflect.internal.Symbols$ModuleClassSymbol.sourceModule(Symbols.scala:3487)
> at 
> 

[jira] [Commented] (SPARK-44040) Incorrect result after count distinct

2023-06-13 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732159#comment-17732159
 ] 

Yuming Wang commented on SPARK-44040:
-

Thanks for reporting this bug. We will fix it soon.

> Incorrect result after count distinct
> -
>
> Key: SPARK-44040
> URL: https://issues.apache.org/jira/browse/SPARK-44040
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Aleksandr Aleksandrov
>Priority: Critical
>
> When i try to call count after distinct function for Decimal null field, 
> spark return incorrect result starting from spark 3.4.0.
> A minimal example to reproduce:
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.\{Column, DataFrame, Dataset, Row, SparkSession}
> import org.apache.spark.sql.types.\{StringType, StructField, StructType}
> val schema = StructType( Array(
> StructField("money", DecimalType(38,6), true),
> StructField("reference_id", StringType, true)
> ))
> val payDf = spark.createDataFrame(sc.emptyRDD[Row], schema)
> val aggDf = payDf.agg(sum("money").as("money")).withColumn("name", lit("df1"))
> val aggDf1 = payDf.agg(sum("money").as("money")).withColumn("name", 
> lit("df2"))
> val unionDF: DataFrame = aggDf.union(aggDf1)
> unionDF.select("money").distinct.show // return correct result
> unionDF.select("money").distinct.count // return 2 instead of 1
> unionDF.select("money").distinct.count == 1 // return false
> This block of code returns some assertion error and after that an incorrect 
> count (in spark 3.2.1 everything works fine and i get correct result = 1):
> *scala> unionDF.select("money").distinct.show // return correct result*
> java.lang.AssertionError: assertion failed:
> Decimal$DecimalIsFractional
> while compiling: 
> during phase: globalPhase=terminal, enteringPhase=jvm
> library version: version 2.12.17
> compiler version: version 2.12.17
> reconstructed args: -classpath 
> /Users/aleksandrov/.ivy2/jars/org.apache.spark_spark-connect_2.12-3.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-core_2.12-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-storage-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar:/Users/aleksandrov/.ivy2/jars/org.antlr_antlr4-runtime-4.9.3.jar
>  -Yrepl-class-based -Yrepl-outdir 
> /private/var/folders/qj/_dn4xbp14jn37qmdk7ylyfwcgr/T/spark-f37bb154-75f3-4db7-aea8-3c4363377bd8/repl-350f37a1-1df1-4816-bd62-97929c60a6c1
> last tree to typer: TypeTree(class Byte)
> tree position: line 6 of 
> tree tpe: Byte
> symbol: (final abstract) class Byte in package scala
> symbol definition: final abstract class Byte extends (a ClassSymbol)
> symbol package: scala
> symbol owners: class Byte
> call site: constructor $eval in object $eval in package $line19
> == Source file context for tree position ==
> 3
> 4object $eval {
> 5lazyval $result = 
> $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.res0
> 6lazyval $print: {_}root{_}.java.lang.String = {
> 7 $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw
> 8
> 9""
> at 
> scala.reflect.internal.SymbolTable.throwAssertionError(SymbolTable.scala:185)
> at scala.reflect.internal.Symbols$Symbol.completeInfo(Symbols.scala:1525)
> at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1514)
> at scala.reflect.internal.Symbols$Symbol.flatOwnerInfo(Symbols.scala:2353)
> at 
> scala.reflect.internal.Symbols$ClassSymbol.companionModule0(Symbols.scala:3346)
> at 
> scala.reflect.internal.Symbols$ClassSymbol.companionModule(Symbols.scala:3348)
> at 
> scala.reflect.internal.Symbols$ModuleClassSymbol.sourceModule(Symbols.scala:3487)
> at 
> scala.reflect.internal.Symbols.$anonfun$forEachRelevantSymbols$1$adapted(Symbols.scala:3802)
> at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
> at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
> at scala.reflect.internal.Symbols.markFlagsCompleted(Symbols.scala:3799)
> at scala.reflect.internal.Symbols.markFlagsCompleted$(Symbols.scala:3805)
> at scala.reflect.internal.SymbolTable.markFlagsCompleted(SymbolTable.scala:28)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.finishSym$1(UnPickler.scala:324)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:342)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.readSymbolRef(UnPickler.scala:645)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.readType(UnPickler.scala:413)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.$anonfun$readSymbol$10(UnPickler.scala:357)
> at scala.reflect.internal.pickling.UnPickler$Scan.at(UnPickler.scala:188)
> at 
> 

[jira] [Resolved] (SPARK-44028) Upgrade commons-io to 2.13.0

2023-06-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-44028.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41556
[https://github.com/apache/spark/pull/41556]

> Upgrade commons-io to 2.13.0
> 
>
> Key: SPARK-44028
> URL: https://issues.apache.org/jira/browse/SPARK-44028
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.5.0
>
>
> https://commons.apache.org/proper/commons-io/changes-report.html#a2.13.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44028) Upgrade commons-io to 2.13.0

2023-06-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-44028:
-

Assignee: Yang Jie

> Upgrade commons-io to 2.13.0
> 
>
> Key: SPARK-44028
> URL: https://issues.apache.org/jira/browse/SPARK-44028
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>
> https://commons.apache.org/proper/commons-io/changes-report.html#a2.13.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44040) Incorrect result after count distinct

2023-06-13 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-44040:

Target Version/s: 3.4.1

> Incorrect result after count distinct
> -
>
> Key: SPARK-44040
> URL: https://issues.apache.org/jira/browse/SPARK-44040
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Aleksandr Aleksandrov
>Priority: Critical
>
> When i try to call count after distinct function for Decimal null field, 
> spark return incorrect result starting from spark 3.4.0.
> A minimal example to reproduce:
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.\{Column, DataFrame, Dataset, Row, SparkSession}
> import org.apache.spark.sql.types.\{StringType, StructField, StructType}
> val schema = StructType( Array(
> StructField("money", DecimalType(38,6), true),
> StructField("reference_id", StringType, true)
> ))
> val payDf = spark.createDataFrame(sc.emptyRDD[Row], schema)
> val aggDf = payDf.agg(sum("money").as("money")).withColumn("name", lit("df1"))
> val aggDf1 = payDf.agg(sum("money").as("money")).withColumn("name", 
> lit("df2"))
> val unionDF: DataFrame = aggDf.union(aggDf1)
> unionDF.select("money").distinct.show // return correct result
> unionDF.select("money").distinct.count // return 2 instead of 1
> unionDF.select("money").distinct.count == 1 // return false
> This block of code returns some assertion error and after that an incorrect 
> count (in spark 3.2.1 everything works fine and i get correct result = 1):
> *scala> unionDF.select("money").distinct.show // return correct result*
> java.lang.AssertionError: assertion failed:
> Decimal$DecimalIsFractional
> while compiling: 
> during phase: globalPhase=terminal, enteringPhase=jvm
> library version: version 2.12.17
> compiler version: version 2.12.17
> reconstructed args: -classpath 
> /Users/aleksandrov/.ivy2/jars/org.apache.spark_spark-connect_2.12-3.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-core_2.12-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-storage-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar:/Users/aleksandrov/.ivy2/jars/org.antlr_antlr4-runtime-4.9.3.jar
>  -Yrepl-class-based -Yrepl-outdir 
> /private/var/folders/qj/_dn4xbp14jn37qmdk7ylyfwcgr/T/spark-f37bb154-75f3-4db7-aea8-3c4363377bd8/repl-350f37a1-1df1-4816-bd62-97929c60a6c1
> last tree to typer: TypeTree(class Byte)
> tree position: line 6 of 
> tree tpe: Byte
> symbol: (final abstract) class Byte in package scala
> symbol definition: final abstract class Byte extends (a ClassSymbol)
> symbol package: scala
> symbol owners: class Byte
> call site: constructor $eval in object $eval in package $line19
> == Source file context for tree position ==
> 3
> 4object $eval {
> 5lazyval $result = 
> $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.res0
> 6lazyval $print: {_}root{_}.java.lang.String = {
> 7 $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw
> 8
> 9""
> at 
> scala.reflect.internal.SymbolTable.throwAssertionError(SymbolTable.scala:185)
> at scala.reflect.internal.Symbols$Symbol.completeInfo(Symbols.scala:1525)
> at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1514)
> at scala.reflect.internal.Symbols$Symbol.flatOwnerInfo(Symbols.scala:2353)
> at 
> scala.reflect.internal.Symbols$ClassSymbol.companionModule0(Symbols.scala:3346)
> at 
> scala.reflect.internal.Symbols$ClassSymbol.companionModule(Symbols.scala:3348)
> at 
> scala.reflect.internal.Symbols$ModuleClassSymbol.sourceModule(Symbols.scala:3487)
> at 
> scala.reflect.internal.Symbols.$anonfun$forEachRelevantSymbols$1$adapted(Symbols.scala:3802)
> at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
> at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
> at scala.reflect.internal.Symbols.markFlagsCompleted(Symbols.scala:3799)
> at scala.reflect.internal.Symbols.markFlagsCompleted$(Symbols.scala:3805)
> at scala.reflect.internal.SymbolTable.markFlagsCompleted(SymbolTable.scala:28)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.finishSym$1(UnPickler.scala:324)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:342)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.readSymbolRef(UnPickler.scala:645)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.readType(UnPickler.scala:413)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.$anonfun$readSymbol$10(UnPickler.scala:357)
> at scala.reflect.internal.pickling.UnPickler$Scan.at(UnPickler.scala:188)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:357)
> at 
> 

[jira] [Created] (SPARK-44042) SPIP: PySpark Test Framework

2023-06-13 Thread Amanda Liu (Jira)
Amanda Liu created SPARK-44042:
--

 Summary: SPIP: PySpark Test Framework
 Key: SPARK-44042
 URL: https://issues.apache.org/jira/browse/SPARK-44042
 Project: Spark
  Issue Type: New Feature
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Amanda Liu


Currently, there's no official PySpark test framework, but only various 
open-source repos and blog posts. Many of these open-source resources are very 
popular, which demonstrates user-demand for PySpark testing capabilities. 
[spark-testing-base|https://github.com/holdenk/spark-testing-base] has 1.4k 
stars, and [chispa|https://github.com/MrPowers/chispa] has 532k 
downloads/month. However, it can be confusing for users to piece together 
disparate resources to write their own PySpark tests (see [The Elephant in the 
Room: How to Write PySpark 
Tests|https://towardsdatascience.com/the-elephant-in-the-room-how-to-write-pyspark-unit-tests-a5073acabc34]).
 We can streamline and simplify the testing process by incorporating test 
features, such as a PySpark Test Base class (which allows tests to share Spark 
sessions) and test util functions (for example, asserting dataframe and schema 
equality). Please see the full SPIP document attached: 
[https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44039) Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite

2023-06-13 Thread BingKun Pan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan updated SPARK-44039:

Description: 
Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite, include:
- When generating `GOLDEN` files, we should first delete the corresponding 
directories and generate new ones to avoid submitting some redundant files 
during the review process. eg:
When we write a test named `make_timestamp_ltz` for the overloaded method, and 
during the review process, the reviewer wishes to add more tests for the 
method. The name of this method has changed during the next submission process, 
such as `make_timestamp_ltz without timezone`.At this point, if the 
`queries/function_make_timestamp_ltz.json`, 
`queries/function_make_timestamp_ltz.proto.bin` and 
`explain-results/function_make_timestamp_ltz.explain` files of 
`function_make_timestamp_ltz` are already in the commit, and there are many of 
these files, we generally do not notice the above problem, which leads to the 
incorrect submission of `queries/function_make_timestamp_ltz.json`, 
`queries/function_make_timestamp_ltz.proto.bin` and 
`explain-results/function_make_timestamp_ltz.explain` files without any impact 
on UT. These files are redundant.

- Clear and update some redundant files submitted incorrectly

> Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite
> 
>
> Key: SPARK-44039
> URL: https://issues.apache.org/jira/browse/SPARK-44039
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Tests
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>
> Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite, include:
> - When generating `GOLDEN` files, we should first delete the corresponding 
> directories and generate new ones to avoid submitting some redundant files 
> during the review process. eg:
> When we write a test named `make_timestamp_ltz` for the overloaded method, 
> and during the review process, the reviewer wishes to add more tests for the 
> method. The name of this method has changed during the next submission 
> process, such as `make_timestamp_ltz without timezone`.At this point, if the 
> `queries/function_make_timestamp_ltz.json`, 
> `queries/function_make_timestamp_ltz.proto.bin` and 
> `explain-results/function_make_timestamp_ltz.explain` files of 
> `function_make_timestamp_ltz` are already in the commit, and there are many 
> of these files, we generally do not notice the above problem, which leads to 
> the incorrect submission of `queries/function_make_timestamp_ltz.json`, 
> `queries/function_make_timestamp_ltz.proto.bin` and 
> `explain-results/function_make_timestamp_ltz.explain` files without any 
> impact on UT. These files are redundant.
> - Clear and update some redundant files submitted incorrectly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44041) Upgrade ammonite to 2.5.9

2023-06-13 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732113#comment-17732113
 ] 

Yang Jie commented on SPARK-44041:
--

Waiting for can download through Maven

 

> Upgrade ammonite to 2.5.9
> -
>
> Key: SPARK-44041
> URL: https://issues.apache.org/jira/browse/SPARK-44041
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> For support Scala 2.12.18 & 2.13.11
>  
> already has a tag : 
> [https://github.com/com-lihaoyi/Ammonite/releases/tag/2.5.9]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44039) Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite

2023-06-13 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732112#comment-17732112
 ] 

Ignite TC Bot commented on SPARK-44039:
---

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/41572

> Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite
> 
>
> Key: SPARK-44039
> URL: https://issues.apache.org/jira/browse/SPARK-44039
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Tests
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38477) Use error classes in org.apache.spark.storage

2023-06-13 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732111#comment-17732111
 ] 

Ignite TC Bot commented on SPARK-38477:
---

User 'bozhang2820' has created a pull request for this issue:
https://github.com/apache/spark/pull/41575

> Use error classes in org.apache.spark.storage
> -
>
> Key: SPARK-38477
> URL: https://issues.apache.org/jira/browse/SPARK-38477
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Bo Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44041) Upgrade ammonite to 2.5.9

2023-06-13 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732106#comment-17732106
 ] 

Yang Jie commented on SPARK-44041:
--

cc [~dongjoon] 

> Upgrade ammonite to 2.5.9
> -
>
> Key: SPARK-44041
> URL: https://issues.apache.org/jira/browse/SPARK-44041
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> For support Scala 2.12.18 & 2.13.11
>  
> already has a tag : 
> [https://github.com/com-lihaoyi/Ammonite/releases/tag/2.5.9]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44041) Upgrade ammonite to 2.5.9

2023-06-13 Thread Yang Jie (Jira)
Yang Jie created SPARK-44041:


 Summary: Upgrade ammonite to 2.5.9
 Key: SPARK-44041
 URL: https://issues.apache.org/jira/browse/SPARK-44041
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.5.0
Reporter: Yang Jie


For support Scala 2.12.18 & 2.13.11

 

already has a tag : [https://github.com/com-lihaoyi/Ammonite/releases/tag/2.5.9]

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44040) Incorrect result after count distinct

2023-06-13 Thread Aleksandr Aleksandrov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Aleksandrov updated SPARK-44040:
--
Description: 
When i try to call count after distinct function for Decimal null field, spark 
return incorrect result starting from spark 3.4.0.
A minimal example to reproduce:

import org.apache.spark.sql.types._
import org.apache.spark.sql.\{Column, DataFrame, Dataset, Row, SparkSession}
import org.apache.spark.sql.types.\{StringType, StructField, StructType}
val schema = StructType( Array(
StructField("money", DecimalType(38,6), true),
StructField("reference_id", StringType, true)
))

val payDf = spark.createDataFrame(sc.emptyRDD[Row], schema)

val aggDf = payDf.agg(sum("money").as("money")).withColumn("name", lit("df1"))
val aggDf1 = payDf.agg(sum("money").as("money")).withColumn("name", lit("df2"))
val unionDF: DataFrame = aggDf.union(aggDf1)
unionDF.select("money").distinct.show // return correct result
unionDF.select("money").distinct.count // return 2 instead of 1
unionDF.select("money").distinct.count == 1 // return false


This block of code returns some assertion error and after that an incorrect 
count (in spark 3.2.1 everything works fine and i get correct result = 1):

*scala> unionDF.select("money").distinct.show // return correct result*
java.lang.AssertionError: assertion failed:
Decimal$DecimalIsFractional
while compiling: 
during phase: globalPhase=terminal, enteringPhase=jvm
library version: version 2.12.17
compiler version: version 2.12.17
reconstructed args: -classpath 
/Users/aleksandrov/.ivy2/jars/org.apache.spark_spark-connect_2.12-3.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-core_2.12-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-storage-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar:/Users/aleksandrov/.ivy2/jars/org.antlr_antlr4-runtime-4.9.3.jar
 -Yrepl-class-based -Yrepl-outdir 
/private/var/folders/qj/_dn4xbp14jn37qmdk7ylyfwcgr/T/spark-f37bb154-75f3-4db7-aea8-3c4363377bd8/repl-350f37a1-1df1-4816-bd62-97929c60a6c1

last tree to typer: TypeTree(class Byte)
tree position: line 6 of 
tree tpe: Byte
symbol: (final abstract) class Byte in package scala
symbol definition: final abstract class Byte extends (a ClassSymbol)
symbol package: scala
symbol owners: class Byte
call site: constructor $eval in object $eval in package $line19

== Source file context for tree position ==

3
4object $eval {
5lazyval $result = 
$line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.res0
6lazyval $print: {_}root{_}.java.lang.String = {
7 $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw
8
9""
at scala.reflect.internal.SymbolTable.throwAssertionError(SymbolTable.scala:185)
at scala.reflect.internal.Symbols$Symbol.completeInfo(Symbols.scala:1525)
at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1514)
at scala.reflect.internal.Symbols$Symbol.flatOwnerInfo(Symbols.scala:2353)
at 
scala.reflect.internal.Symbols$ClassSymbol.companionModule0(Symbols.scala:3346)
at 
scala.reflect.internal.Symbols$ClassSymbol.companionModule(Symbols.scala:3348)
at 
scala.reflect.internal.Symbols$ModuleClassSymbol.sourceModule(Symbols.scala:3487)
at 
scala.reflect.internal.Symbols.$anonfun$forEachRelevantSymbols$1$adapted(Symbols.scala:3802)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at scala.reflect.internal.Symbols.markFlagsCompleted(Symbols.scala:3799)
at scala.reflect.internal.Symbols.markFlagsCompleted$(Symbols.scala:3805)
at scala.reflect.internal.SymbolTable.markFlagsCompleted(SymbolTable.scala:28)
at 
scala.reflect.internal.pickling.UnPickler$Scan.finishSym$1(UnPickler.scala:324)
at 
scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:342)
at 
scala.reflect.internal.pickling.UnPickler$Scan.readSymbolRef(UnPickler.scala:645)
at scala.reflect.internal.pickling.UnPickler$Scan.readType(UnPickler.scala:413)
at 
scala.reflect.internal.pickling.UnPickler$Scan.$anonfun$readSymbol$10(UnPickler.scala:357)
at scala.reflect.internal.pickling.UnPickler$Scan.at(UnPickler.scala:188)
at 
scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:357)
at 
scala.reflect.internal.pickling.UnPickler$Scan.$anonfun$run$1(UnPickler.scala:96)
at scala.reflect.internal.pickling.UnPickler$Scan.run(UnPickler.scala:88)
at scala.reflect.internal.pickling.UnPickler.unpickle(UnPickler.scala:47)
at 
scala.tools.nsc.symtab.classfile.ClassfileParser.unpickleOrParseInnerClasses(ClassfileParser.scala:1173)
at 
scala.tools.nsc.symtab.classfile.ClassfileParser.parseClass(ClassfileParser.scala:467)
at 
scala.tools.nsc.symtab.classfile.ClassfileParser.$anonfun$parse$2(ClassfileParser.scala:160)
at 

[jira] [Created] (SPARK-44040) Incorrect result after count distinct

2023-06-13 Thread Aleksandr Aleksandrov (Jira)
Aleksandr Aleksandrov created SPARK-44040:
-

 Summary: Incorrect result after count distinct
 Key: SPARK-44040
 URL: https://issues.apache.org/jira/browse/SPARK-44040
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Aleksandr Aleksandrov


When i try to call count after distinct function for Decimal null field, spark 
return incorrect result starting from spark 3.4.0.
A minimal example to reproduce:
import org.apache.spark.sql.types._
import org.apache.spark.sql.\{Column, DataFrame, Dataset, Row, SparkSession}
import org.apache.spark.sql.types.\{StringType, StructField, StructType}
val schema = StructType( Array(
 StructField("money", DecimalType(38,6), true),
 StructField("reference_id", StringType, true)
 ))

val payDf = spark.createDataFrame(sc.emptyRDD[Row], schema)

val aggDf = payDf.agg(sum("money").as("money")).withColumn("name", lit("df1"))
val aggDf1 = payDf.agg(sum("money").as("money")).withColumn("name", lit("df2"))
val unionDF: DataFrame = aggDf.union(aggDf1)
unionDF.select("money").distinct.show // return correct result
unionDF.select("money").distinct.count // return 2 instead of 1
unionDF.select("money").distinct.count == 1 // return false
This block of code returns some assertion error and after that an incorrect 
count (in spark 3.2.1 everything works fine and i get correct result = 1):


*scala> unionDF.select("money").distinct.show // return correct result*
java.lang.AssertionError: assertion failed:
Decimal$DecimalIsFractional
while compiling: 
during phase: globalPhase=terminal, enteringPhase=jvm
library version: version 2.12.17
compiler version: version 2.12.17
reconstructed args: -classpath 
/Users/aleksandrov/.ivy2/jars/org.apache.spark_spark-connect_2.12-3.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-core_2.12-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-storage-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar:/Users/aleksandrov/.ivy2/jars/org.antlr_antlr4-runtime-4.9.3.jar
 -Yrepl-class-based -Yrepl-outdir 
/private/var/folders/qj/_dn4xbp14jn37qmdk7ylyfwcgr/T/spark-f37bb154-75f3-4db7-aea8-3c4363377bd8/repl-350f37a1-1df1-4816-bd62-97929c60a6c1

last tree to typer: TypeTree(class Byte)
tree position: line 6 of 
tree tpe: Byte
symbol: (final abstract) class Byte in package scala
symbol definition: final abstract class Byte extends (a ClassSymbol)
symbol package: scala
symbol owners: class Byte
call site: constructor $eval in object $eval in package $line19

== Source file context for tree position ==

3
4object $eval {
5lazyval $result = 
$line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.res0
6lazyval $print: _root_.java.lang.String = {
7 $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw
8
9""
at scala.reflect.internal.SymbolTable.throwAssertionError(SymbolTable.scala:185)
at scala.reflect.internal.Symbols$Symbol.completeInfo(Symbols.scala:1525)
at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1514)
at scala.reflect.internal.Symbols$Symbol.flatOwnerInfo(Symbols.scala:2353)
at 
scala.reflect.internal.Symbols$ClassSymbol.companionModule0(Symbols.scala:3346)
at 
scala.reflect.internal.Symbols$ClassSymbol.companionModule(Symbols.scala:3348)
at 
scala.reflect.internal.Symbols$ModuleClassSymbol.sourceModule(Symbols.scala:3487)
at 
scala.reflect.internal.Symbols.$anonfun$forEachRelevantSymbols$1$adapted(Symbols.scala:3802)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at scala.reflect.internal.Symbols.markFlagsCompleted(Symbols.scala:3799)
at scala.reflect.internal.Symbols.markFlagsCompleted$(Symbols.scala:3805)
at scala.reflect.internal.SymbolTable.markFlagsCompleted(SymbolTable.scala:28)
at 
scala.reflect.internal.pickling.UnPickler$Scan.finishSym$1(UnPickler.scala:324)
at 
scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:342)
at 
scala.reflect.internal.pickling.UnPickler$Scan.readSymbolRef(UnPickler.scala:645)
at scala.reflect.internal.pickling.UnPickler$Scan.readType(UnPickler.scala:413)
at 
scala.reflect.internal.pickling.UnPickler$Scan.$anonfun$readSymbol$10(UnPickler.scala:357)
at scala.reflect.internal.pickling.UnPickler$Scan.at(UnPickler.scala:188)
at 
scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:357)
at 
scala.reflect.internal.pickling.UnPickler$Scan.$anonfun$run$1(UnPickler.scala:96)
at scala.reflect.internal.pickling.UnPickler$Scan.run(UnPickler.scala:88)
at scala.reflect.internal.pickling.UnPickler.unpickle(UnPickler.scala:47)
at 
scala.tools.nsc.symtab.classfile.ClassfileParser.unpickleOrParseInnerClasses(ClassfileParser.scala:1173)
at 

[jira] [Commented] (SPARK-43486) number of files read is incorrect if it is bucket table

2023-06-13 Thread Jia Fan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732045#comment-17732045
 ] 

Jia Fan commented on SPARK-43486:
-

[~panbingkun] Hi, any update for this?

> number of files read is incorrect if it is bucket table
> ---
>
> Key: SPARK-43486
> URL: https://issues.apache.org/jira/browse/SPARK-43486
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
> Attachments: screenshot-1.png
>
>
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44036) Cleanup & consolidate tickets to simplify the tasks.

2023-06-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44036.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41566
[https://github.com/apache/spark/pull/41566]

> Cleanup & consolidate tickets to simplify the tasks.
> 
>
> Key: SPARK-44036
> URL: https://issues.apache.org/jira/browse/SPARK-44036
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.5.0
>
>
> We have so many tickets for pandas API on Spark with Spark Connect, so it 
> would be great if we can simplify them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44036) Cleanup & consolidate tickets to simplify the tasks.

2023-06-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-44036:


Assignee: Haejoon Lee

> Cleanup & consolidate tickets to simplify the tasks.
> 
>
> Key: SPARK-44036
> URL: https://issues.apache.org/jira/browse/SPARK-44036
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> We have so many tickets for pandas API on Spark with Spark Connect, so it 
> would be great if we can simplify them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44039) Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite

2023-06-13 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-44039:
---

 Summary: Improve for PlanGenerationTestSuite & 
ProtoToParsedPlanTestSuite
 Key: SPARK-44039
 URL: https://issues.apache.org/jira/browse/SPARK-44039
 Project: Spark
  Issue Type: Improvement
  Components: Connect, Tests
Affects Versions: 3.5.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43891) Support SHOW VIEWS IN . when not is not the current selected catalog

2023-06-13 Thread Jia Fan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732031#comment-17732031
 ] 

Jia Fan commented on SPARK-43891:
-

cc [~cloud_fan] [~dongjoon] 

> Support SHOW VIEWS IN . when not  is not the 
> current selected catalog
> ---
>
> Key: SPARK-43891
> URL: https://issues.apache.org/jira/browse/SPARK-43891
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-43891) Support SHOW VIEWS IN . when not is not the current selected catalog

2023-06-13 Thread Jia Fan (Jira)


[ https://issues.apache.org/jira/browse/SPARK-43891 ]


Jia Fan deleted comment on SPARK-43891:
-

was (Author: fanjia):
I can work for this.

> Support SHOW VIEWS IN . when not  is not the 
> current selected catalog
> ---
>
> Key: SPARK-43891
> URL: https://issues.apache.org/jira/browse/SPARK-43891
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43891) Support SHOW VIEWS IN . when not is not the current selected catalog

2023-06-13 Thread Jia Fan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732030#comment-17732030
 ] 

Jia Fan commented on SPARK-43891:
-

Hi [~amaliujia] , I have a question about view, I find Spark add ViewCatalog 
for DataSourceV2, but we never use it(Can't create view through ViewCatalog at 
now). In my view, this ticket will be implement on DataSourceV2, so we can view 
different catalog view. But we don't support create view, what's the meaning of 
show view?

> Support SHOW VIEWS IN . when not  is not the 
> current selected catalog
> ---
>
> Key: SPARK-43891
> URL: https://issues.apache.org/jira/browse/SPARK-43891
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44038) Update YuniKorn docs with v1.3

2023-06-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-44038.
---
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

Issue resolved by pull request 41571
[https://github.com/apache/spark/pull/41571]

> Update YuniKorn docs with v1.3
> --
>
> Key: SPARK-44038
> URL: https://issues.apache.org/jira/browse/SPARK-44038
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Kubernetes
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.5.0, 3.4.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44038) Update YuniKorn docs with v1.3

2023-06-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-44038:
-

Assignee: Dongjoon Hyun

> Update YuniKorn docs with v1.3
> --
>
> Key: SPARK-44038
> URL: https://issues.apache.org/jira/browse/SPARK-44038
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Kubernetes
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43753) Incorrect result of MINUS in spark sql.

2023-06-13 Thread Jia Fan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732012#comment-17732012
 ] 

Jia Fan commented on SPARK-43753:
-

Seem already fixed on the master branch.

> Incorrect result of MINUS in spark sql.
> ---
>
> Key: SPARK-43753
> URL: https://issues.apache.org/jira/browse/SPARK-43753
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.8, 3.0.3, 3.1.3
>Reporter: Kernel Force
>Priority: Major
>
> sql("""
> with va as (
>   select '123' id, 'a' name
>    union all
>   select '123' id, 'b' name
> )
> select '123' id, 'a' name from va t where t.name = 'a'
>  minus 
> select '123' id, 'a' name from va s where s.name = 'b'
> """).show
> +---++
> | id|name|
> +---++
> |123|   a|
> +---++
> which is expected to be empty result set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44038) Update YuniKorn docs with v1.3

2023-06-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44038:
--
Issue Type: Documentation  (was: Improvement)

> Update YuniKorn docs with v1.3
> --
>
> Key: SPARK-44038
> URL: https://issues.apache.org/jira/browse/SPARK-44038
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Kubernetes
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44038) Update YuniKorn docs with v1.3

2023-06-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44038:
--
Affects Version/s: 3.4.1

> Update YuniKorn docs with v1.3
> --
>
> Key: SPARK-44038
> URL: https://issues.apache.org/jira/browse/SPARK-44038
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Kubernetes
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44038) Update YuniKorn docs with v1.3

2023-06-13 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-44038:
-

 Summary: Update YuniKorn docs with v1.3
 Key: SPARK-44038
 URL: https://issues.apache.org/jira/browse/SPARK-44038
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, Kubernetes
Affects Versions: 3.5.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43891) Support SHOW VIEWS IN . when not is not the current selected catalog

2023-06-13 Thread Jia Fan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732006#comment-17732006
 ] 

Jia Fan commented on SPARK-43891:
-

I can work for this.

> Support SHOW VIEWS IN . when not  is not the 
> current selected catalog
> ---
>
> Key: SPARK-43891
> URL: https://issues.apache.org/jira/browse/SPARK-43891
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44037) Add maxCharsPerRow option for CSV datasource

2023-06-13 Thread Dmitry Sysolyatin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sysolyatin updated SPARK-44037:
--
Description: 
CSV datasource supports maxColumns and maxCharsPerColumn options. But those two 
options do not allow limit row size properly.

For instance, if I want to limit the row size to be less than or equal to 100, 
and I set maxColumns to 10 and maxCharsPerColumn to 10, then
# User can not read column with size > 10 even if row size <= 100
# User can not read more than 10 columns even if row size <= 100

I suggest to add additional option maxCharsPerRow

  was:
CSV datasource supports maxColumns and maxCharsPerColumn options. But those two 
options do not allow limit row size properly.

For instance, if I want to limit the row size to be less than or equal to 100, 
and I set maxColumns to 10 and maxCharsPerColumn to 10, then
# User can not read column with size > 10 even if row size <= 100
# User can not read more than 10 columns where each column < 5 chars even if 
row size <= 100

I suggest to add additional option maxCharsPerRow


> Add maxCharsPerRow option for CSV datasource
> 
>
> Key: SPARK-44037
> URL: https://issues.apache.org/jira/browse/SPARK-44037
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dmitry Sysolyatin
>Priority: Major
>
> CSV datasource supports maxColumns and maxCharsPerColumn options. But those 
> two options do not allow limit row size properly.
> For instance, if I want to limit the row size to be less than or equal to 
> 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then
> # User can not read column with size > 10 even if row size <= 100
> # User can not read more than 10 columns even if row size <= 100
> I suggest to add additional option maxCharsPerRow



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44037) Add maxCharsPerRow option for CSV datasource

2023-06-13 Thread Dmitry Sysolyatin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sysolyatin updated SPARK-44037:
--
Description: 
CSV datasource supports maxColumns and maxCharsPerColumn options. But those two 
options do not allow limit row size properly.

For instance, if I want to limit the row size to be less than or equal to 100, 
and I set maxColumns to 10 and maxCharsPerColumn to 10, then
# User can not read column with size > 10 even if row size <= 100
# User can not read more than 10 columns where each column < 5 chars even if 
row size <= 100

I suggest to add additional option maxCharsPerRow

  was:
CSV datasource supports maxColumns and maxCharsPerColumn options. But those two 
options do not allow limit row size properly.

For instance, if I want to limit the row size to be less than or equal to 100, 
and I set maxColumns to 10 and maxCharsPerColumn to 10, then
# User can not read column with size > 10 even if row size <= 100
# User can not read more then 10 columns where each column < 5 chars even if 
row size <= 100

I suggest to add additional option maxCharsPerRow


> Add maxCharsPerRow option for CSV datasource
> 
>
> Key: SPARK-44037
> URL: https://issues.apache.org/jira/browse/SPARK-44037
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dmitry Sysolyatin
>Priority: Major
>
> CSV datasource supports maxColumns and maxCharsPerColumn options. But those 
> two options do not allow limit row size properly.
> For instance, if I want to limit the row size to be less than or equal to 
> 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then
> # User can not read column with size > 10 even if row size <= 100
> # User can not read more than 10 columns where each column < 5 chars even if 
> row size <= 100
> I suggest to add additional option maxCharsPerRow



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-44018) Improve the hashCode for Some DS V2 Expression

2023-06-13 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17731988#comment-17731988
 ] 

jiaan.geng edited comment on SPARK-44018 at 6/13/23 8:38 AM:
-

[~dongjoon]Yes. I have created PR for this. 
https://github.com/apache/spark/pull/41543


was (Author: beliefer):
[~dongjoon]Yes. I have created PR for this.

> Improve the hashCode for Some DS V2 Expression
> --
>
> Key: SPARK-44018
> URL: https://issues.apache.org/jira/browse/SPARK-44018
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> The hashCode() of UserDefinedScalarFunc and GeneralScalarExpression is not 
> good enough. UserDefinedAggregateFunc and GeneralAggregateFunc missing 
> hashCode()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44037) Add maxCharsPerRow option for CSV datasource

2023-06-13 Thread Dmitry Sysolyatin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sysolyatin updated SPARK-44037:
--
Description: 
CSV datasource supports maxColumns and maxCharsPerColumn options. But those two 
options do not allow limit row size properly.

For instance, if I want to limit the row size to be less than or equal to 100, 
and I set maxColumns to 10 and maxCharsPerColumn to 10, then
# User can not read column with size > 10 even if row size <= 100
# User can not read more then 10 columns where each column < 5 chars even if 
row size <= 100

I suggest to add additional option maxCharsPerRow

  was:
CSV datasource supports maxColumns and maxCharsPerColumn options. But those two 
option does not allow restrict row size properly.

For instance, if I want to limit the row size to be less than or equal to 100, 
and I set maxColumns to 10 and maxCharsPerColumn to 10, then
# User can not read column with size > 10 even if row size <= 100
# User can not read more then 10 columns where each column < 5 chars even if 
row size <= 100

I suggest to add additional option maxCharsPerRow


> Add maxCharsPerRow option for CSV datasource
> 
>
> Key: SPARK-44037
> URL: https://issues.apache.org/jira/browse/SPARK-44037
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dmitry Sysolyatin
>Priority: Major
>
> CSV datasource supports maxColumns and maxCharsPerColumn options. But those 
> two options do not allow limit row size properly.
> For instance, if I want to limit the row size to be less than or equal to 
> 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then
> # User can not read column with size > 10 even if row size <= 100
> # User can not read more then 10 columns where each column < 5 chars even if 
> row size <= 100
> I suggest to add additional option maxCharsPerRow



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44037) Add maxCharsPerRow option for CSV datasource

2023-06-13 Thread Dmitry Sysolyatin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sysolyatin updated SPARK-44037:
--
Description: 
CSV datasource supports maxColumns and maxCharsPerColumn options. But those two 
option does not allow restrict row size properly.

For instance, if I want to limit the row size to be less than or equal to 100, 
and I set maxColumns to 10 and maxCharsPerColumn to 10, then
# User can not read column with size > 10 even if row size <= 100
# User can not read more then 10 columns where each column < 5 chars even if 
row size <= 100

I suggest to add additional option maxCharsPerRow

  was:
CSV datasource supports maxColumns and maxCharsPerColumn options. But those two 
option does not allow restrict row size properly.

For instance, if I want to limit the row size to be less than or equal to 100, 
and I set maxColumns to 10 and maxCharsPerColumn to 10, then
# User can not read column with size > 10 even if row size <= 100
# User can not read more then 10 columns where each column < 5 chars even if 
row size <= 100


> Add maxCharsPerRow option for CSV datasource
> 
>
> Key: SPARK-44037
> URL: https://issues.apache.org/jira/browse/SPARK-44037
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dmitry Sysolyatin
>Priority: Major
>
> CSV datasource supports maxColumns and maxCharsPerColumn options. But those 
> two option does not allow restrict row size properly.
> For instance, if I want to limit the row size to be less than or equal to 
> 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then
> # User can not read column with size > 10 even if row size <= 100
> # User can not read more then 10 columns where each column < 5 chars even if 
> row size <= 100
> I suggest to add additional option maxCharsPerRow



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44018) Improve the hashCode for Some DS V2 Expression

2023-06-13 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17731988#comment-17731988
 ] 

jiaan.geng commented on SPARK-44018:


[~dongjoon]Yes. I have created PR for this.

> Improve the hashCode for Some DS V2 Expression
> --
>
> Key: SPARK-44018
> URL: https://issues.apache.org/jira/browse/SPARK-44018
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> The hashCode() of UserDefinedScalarFunc and GeneralScalarExpression is not 
> good enough. UserDefinedAggregateFunc and GeneralAggregateFunc missing 
> hashCode()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44037) Add maxCharsPerRow option for CSV datasource

2023-06-13 Thread Dmitry Sysolyatin (Jira)
Dmitry Sysolyatin created SPARK-44037:
-

 Summary: Add maxCharsPerRow option for CSV datasource
 Key: SPARK-44037
 URL: https://issues.apache.org/jira/browse/SPARK-44037
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Dmitry Sysolyatin


CSV datasource supports maxColumns and maxCharsPerColumn options. But those two 
option does not allow restrict row size properly.

For instance, if I want to limit the row size to be less than or equal to 100, 
and I set maxColumns to 10 and maxCharsPerColumn to 10, then
# User can not read column with size > 10 even if row size <= 100
# User can not read more then 10 columns where each column < 5 chars even if 
row size <= 100



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43619) Enable DataFrameSlowParityTests.test_udt

2023-06-13 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43619:

Fix Version/s: 3.5.0

> Enable DataFrameSlowParityTests.test_udt
> 
>
> Key: SPARK-43619
> URL: https://issues.apache.org/jira/browse/SPARK-43619
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
> Fix For: 3.5.0
>
>
> Repro:
> {code:java}
> sparse_values = {0: 0.1, 1: 1.1}
> sparse_vector = SparseVector(len(sparse_values), sparse_values)
> pdf = pd.DataFrame({"a": [sparse_vector], "b": [10]})
> psdf = ps.from_pandas(pdf) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43661) Enable ReshapeParityTests.test_get_dummies_date_datetime

2023-06-13 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43661:

Fix Version/s: 3.5.0

> Enable ReshapeParityTests.test_get_dummies_date_datetime
> 
>
> Key: SPARK-43661
> URL: https://issues.apache.org/jira/browse/SPARK-43661
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
> Fix For: 3.5.0
>
>
> Enable ReshapeParityTests.test_get_dummies_date_datetime



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43619) Enable DataFrameSlowParityTests.test_udt

2023-06-13 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee resolved SPARK-43619.
-
Resolution: Resolved

This is contained by SPARK-44036.

> Enable DataFrameSlowParityTests.test_udt
> 
>
> Key: SPARK-43619
> URL: https://issues.apache.org/jira/browse/SPARK-43619
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Repro:
> {code:java}
> sparse_values = {0: 0.1, 1: 1.1}
> sparse_vector = SparseVector(len(sparse_values), sparse_values)
> pdf = pd.DataFrame({"a": [sparse_vector], "b": [10]})
> psdf = ps.from_pandas(pdf) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43661) Enable ReshapeParityTests.test_get_dummies_date_datetime

2023-06-13 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee resolved SPARK-43661.
-
Resolution: Resolved

This is contained by SPARK-44036.

> Enable ReshapeParityTests.test_get_dummies_date_datetime
> 
>
> Key: SPARK-43661
> URL: https://issues.apache.org/jira/browse/SPARK-43661
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable ReshapeParityTests.test_get_dummies_date_datetime



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44036) Cleanup & consolidate tickets to simplify the tasks.

2023-06-13 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-44036:
---

 Summary: Cleanup & consolidate tickets to simplify the tasks.
 Key: SPARK-44036
 URL: https://issues.apache.org/jira/browse/SPARK-44036
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, Pandas API on Spark
Affects Versions: 3.5.0
Reporter: Haejoon Lee


We have so many tickets for pandas API on Spark with Spark Connect, so it would 
be great if we can simplify them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43710) Support functions.date_part for Spark Connect

2023-06-13 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee resolved SPARK-43710.
-
Resolution: Duplicate

It's duplicated by SPARK-43705

> Support functions.date_part for Spark Connect
> -
>
> Key: SPARK-43710
> URL: https://issues.apache.org/jira/browse/SPARK-43710
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Repro: run `TimedeltaIndexParityTests.test_properties`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44035) Split `pyspark.pandas.tests.connect.test_parity_ops_on_diff_frames_slow`

2023-06-13 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-44035:
-

 Summary: Split 
`pyspark.pandas.tests.connect.test_parity_ops_on_diff_frames_slow`
 Key: SPARK-44035
 URL: https://issues.apache.org/jira/browse/SPARK-44035
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, Pandas API on Spark, Tests
Affects Versions: 3.5.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43653) Enable GroupBySlowParityTests.test_split_apply_combine_on_series

2023-06-13 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee resolved SPARK-43653.
-
Resolution: Duplicate

This is duplicated by SPARK-43445

> Enable GroupBySlowParityTests.test_split_apply_combine_on_series
> 
>
> Key: SPARK-43653
> URL: https://issues.apache.org/jira/browse/SPARK-43653
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable GroupBySlowParityTests.test_split_apply_combine_on_series



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43652) Enable GroupBy.rank with Spark Connect

2023-06-13 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee resolved SPARK-43652.
-
Resolution: Duplicate

This is duplicated by SPARK-43611.

> Enable GroupBy.rank with Spark Connect
> --
>
> Key: SPARK-43652
> URL: https://issues.apache.org/jira/browse/SPARK-43652
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable Groupby.rank with Spark Connect



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org