[jira] (SPARK-44050) Unable to Mount ConfigMap in Driver Pod - ConfigMap Creation Issue
[ https://issues.apache.org/jira/browse/SPARK-44050 ] Harshwardhan Singh Dodiya deleted comment on SPARK-44050: --- was (Author: JIRAUSER300640): !image-2023-06-14-11-07-36-960.png! > Unable to Mount ConfigMap in Driver Pod - ConfigMap Creation Issue > -- > > Key: SPARK-44050 > URL: https://issues.apache.org/jira/browse/SPARK-44050 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Submit >Affects Versions: 3.3.1 >Reporter: Harshwardhan Singh Dodiya >Priority: Critical > Attachments: image-2023-06-14-11-07-36-960.png > > > Dear Spark community, > I am facing an issue related to mounting a ConfigMap in the driver pod of my > Spark application. Upon investigation, I realized that the problem is caused > by the ConfigMap not being created successfully. > *Problem Description:* > When attempting to mount the ConfigMap in the driver pod, I encounter > consistent failures and my pod stays in containerCreating state. Upon further > investigation, I discovered that the ConfigMap does not exist in the > Kubernetes cluster, which results in the driver pod's inability to access the > required configuration data. > *Additional Information:* > I would like to highlight that this issue is not a frequent occurrence. It > has been observed randomly, affecting the mounting of the ConfigMap in the > driver pod only approximately 5% of the time. This intermittent behavior adds > complexity to the troubleshooting process, as it is challenging to reproduce > consistently. > *Error Message:* > when describing driver pod (kubectl describe pod pod_name) get the below > error. > "ConfigMap '' not found." > *To Reproduce:* > 1. Download spark 3.3.1 from [https://spark.apache.org/downloads.html] > 2. create an image with "bin/docker-image-tool.sh" > 3. Submit on spark-client via bash command by passing all the details and > configurations. > 4. Randomly in some of the driver pod we can observe this issue. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44050) Unable to Mount ConfigMap in Driver Pod - ConfigMap Creation Issue
[ https://issues.apache.org/jira/browse/SPARK-44050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732373#comment-17732373 ] Harshwardhan Singh Dodiya commented on SPARK-44050: --- !image-2023-06-14-11-07-36-960.png! > Unable to Mount ConfigMap in Driver Pod - ConfigMap Creation Issue > -- > > Key: SPARK-44050 > URL: https://issues.apache.org/jira/browse/SPARK-44050 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Submit >Affects Versions: 3.3.1 >Reporter: Harshwardhan Singh Dodiya >Priority: Critical > Attachments: image-2023-06-14-11-07-36-960.png > > > Dear Spark community, > I am facing an issue related to mounting a ConfigMap in the driver pod of my > Spark application. Upon investigation, I realized that the problem is caused > by the ConfigMap not being created successfully. > *Problem Description:* > When attempting to mount the ConfigMap in the driver pod, I encounter > consistent failures and my pod stays in containerCreating state. Upon further > investigation, I discovered that the ConfigMap does not exist in the > Kubernetes cluster, which results in the driver pod's inability to access the > required configuration data. > *Additional Information:* > I would like to highlight that this issue is not a frequent occurrence. It > has been observed randomly, affecting the mounting of the ConfigMap in the > driver pod only approximately 5% of the time. This intermittent behavior adds > complexity to the troubleshooting process, as it is challenging to reproduce > consistently. > *Error Message:* > when describing driver pod (kubectl describe pod pod_name) get the below > error. > "ConfigMap '' not found." > *To Reproduce:* > 1. Download spark 3.3.1 from [https://spark.apache.org/downloads.html] > 2. create an image with "bin/docker-image-tool.sh" > 3. Submit on spark-client via bash command by passing all the details and > configurations. > 4. Randomly in some of the driver pod we can observe this issue. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44050) Unable to Mount ConfigMap in Driver Pod - ConfigMap Creation Issue
[ https://issues.apache.org/jira/browse/SPARK-44050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshwardhan Singh Dodiya updated SPARK-44050: -- Attachment: image-2023-06-14-11-07-36-960.png > Unable to Mount ConfigMap in Driver Pod - ConfigMap Creation Issue > -- > > Key: SPARK-44050 > URL: https://issues.apache.org/jira/browse/SPARK-44050 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Submit >Affects Versions: 3.3.1 >Reporter: Harshwardhan Singh Dodiya >Priority: Critical > Attachments: image-2023-06-14-11-07-36-960.png > > > Dear Spark community, > I am facing an issue related to mounting a ConfigMap in the driver pod of my > Spark application. Upon investigation, I realized that the problem is caused > by the ConfigMap not being created successfully. > *Problem Description:* > When attempting to mount the ConfigMap in the driver pod, I encounter > consistent failures and my pod stays in containerCreating state. Upon further > investigation, I discovered that the ConfigMap does not exist in the > Kubernetes cluster, which results in the driver pod's inability to access the > required configuration data. > *Additional Information:* > I would like to highlight that this issue is not a frequent occurrence. It > has been observed randomly, affecting the mounting of the ConfigMap in the > driver pod only approximately 5% of the time. This intermittent behavior adds > complexity to the troubleshooting process, as it is challenging to reproduce > consistently. > *Error Message:* > when describing driver pod (kubectl describe pod pod_name) get the below > error. > "ConfigMap '' not found." > *To Reproduce:* > 1. Download spark 3.3.1 from [https://spark.apache.org/downloads.html] > 2. create an image with "bin/docker-image-tool.sh" > 3. Submit on spark-client via bash command by passing all the details and > configurations. > 4. Randomly in some of the driver pod we can observe this issue. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44050) Unable to Mount ConfigMap in Driver Pod - ConfigMap Creation Issue
Harshwardhan Singh Dodiya created SPARK-44050: - Summary: Unable to Mount ConfigMap in Driver Pod - ConfigMap Creation Issue Key: SPARK-44050 URL: https://issues.apache.org/jira/browse/SPARK-44050 Project: Spark Issue Type: Bug Components: Kubernetes, Spark Submit Affects Versions: 3.3.1 Reporter: Harshwardhan Singh Dodiya Dear Spark community, I am facing an issue related to mounting a ConfigMap in the driver pod of my Spark application. Upon investigation, I realized that the problem is caused by the ConfigMap not being created successfully. *Problem Description:* When attempting to mount the ConfigMap in the driver pod, I encounter consistent failures and my pod stays in containerCreating state. Upon further investigation, I discovered that the ConfigMap does not exist in the Kubernetes cluster, which results in the driver pod's inability to access the required configuration data. *Additional Information:* I would like to highlight that this issue is not a frequent occurrence. It has been observed randomly, affecting the mounting of the ConfigMap in the driver pod only approximately 5% of the time. This intermittent behavior adds complexity to the troubleshooting process, as it is challenging to reproduce consistently. *Error Message:* when describing driver pod (kubectl describe pod pod_name) get the below error. "ConfigMap '' not found." *To Reproduce:* 1. Download spark 3.3.1 from [https://spark.apache.org/downloads.html] 2. create an image with "bin/docker-image-tool.sh" 3. Submit on spark-client via bash command by passing all the details and configurations. 4. Randomly in some of the driver pod we can observe this issue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44048) Remove sql-migration-old.md
[ https://issues.apache.org/jira/browse/SPARK-44048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732353#comment-17732353 ] Snoot.io commented on SPARK-44048: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/41583 > Remove sql-migration-old.md > --- > > Key: SPARK-44048 > URL: https://issues.apache.org/jira/browse/SPARK-44048 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43932) Add current_* functions to Scala and Python
[ https://issues.apache.org/jira/browse/SPARK-43932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732352#comment-17732352 ] Snoot.io commented on SPARK-43932: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/41516 > Add current_* functions to Scala and Python > --- > > Key: SPARK-43932 > URL: https://issues.apache.org/jira/browse/SPARK-43932 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.5.0 > > > Add following functions: > * curdate > * current_catalog > * current_database > * current_schema > * current_timezone > * current_user > * user > to: > * Scala API > * Python API > * Spark Connect Scala Client > * Spark Connect Python Client -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44045) Mark `WholeStageCodegenSparkSubmitSuite` as `ExtendedSQLTest`
[ https://issues.apache.org/jira/browse/SPARK-44045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732351#comment-17732351 ] Snoot.io commented on SPARK-44045: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/41579 > Mark `WholeStageCodegenSparkSubmitSuite` as `ExtendedSQLTest` > - > > Key: SPARK-44045 > URL: https://issues.apache.org/jira/browse/SPARK-44045 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43932) Add current_* functions to Scala and Python
[ https://issues.apache.org/jira/browse/SPARK-43932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43932. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41516 [https://github.com/apache/spark/pull/41516] > Add current_* functions to Scala and Python > --- > > Key: SPARK-43932 > URL: https://issues.apache.org/jira/browse/SPARK-43932 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.5.0 > > > Add following functions: > * curdate > * current_catalog > * current_database > * current_schema > * current_timezone > * current_user > * user > to: > * Scala API > * Python API > * Spark Connect Scala Client > * Spark Connect Python Client -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43932) Add current_* functions to Scala and Python
[ https://issues.apache.org/jira/browse/SPARK-43932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43932: - Assignee: Ruifeng Zheng > Add current_* functions to Scala and Python > --- > > Key: SPARK-43932 > URL: https://issues.apache.org/jira/browse/SPARK-43932 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > > Add following functions: > * curdate > * current_catalog > * current_database > * current_schema > * current_timezone > * current_user > * user > to: > * Scala API > * Python API > * Spark Connect Scala Client > * Spark Connect Python Client -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43981) Basic saving / loading implementation
[ https://issues.apache.org/jira/browse/SPARK-43981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu resolved SPARK-43981. Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41478 [https://github.com/apache/spark/pull/41478] > Basic saving / loading implementation > - > > Key: SPARK-43981 > URL: https://issues.apache.org/jira/browse/SPARK-43981 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML, PySpark >Affects Versions: 3.5.0 >Reporter: Weichen Xu >Assignee: Weichen Xu >Priority: Major > Fix For: 3.5.0 > > > Support saving/loading for estimator / transformer / evaluator / model. > We have some design goals: > * The model format is decoupled from spark, i.e. we can run model inference > without spark service. > * We can save model to either local file system or cloud storage file system. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43655) Enable NamespaceParityTests.test_get_index_map
[ https://issues.apache.org/jira/browse/SPARK-43655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732348#comment-17732348 ] Snoot.io commented on SPARK-43655: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/41587 > Enable NamespaceParityTests.test_get_index_map > -- > > Key: SPARK-43655 > URL: https://issues.apache.org/jira/browse/SPARK-43655 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > Enable NamespaceParityTests.test_get_index_map -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43654) Enable InternalFrameParityTests.test_from_pandas
[ https://issues.apache.org/jira/browse/SPARK-43654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732347#comment-17732347 ] Snoot.io commented on SPARK-43654: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/41587 > Enable InternalFrameParityTests.test_from_pandas > > > Key: SPARK-43654 > URL: https://issues.apache.org/jira/browse/SPARK-43654 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > Enable InternalFrameParityTests.test_from_pandas -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43655) Enable NamespaceParityTests.test_get_index_map
[ https://issues.apache.org/jira/browse/SPARK-43655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732346#comment-17732346 ] Snoot.io commented on SPARK-43655: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/41587 > Enable NamespaceParityTests.test_get_index_map > -- > > Key: SPARK-43655 > URL: https://issues.apache.org/jira/browse/SPARK-43655 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > Enable NamespaceParityTests.test_get_index_map -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43654) Enable InternalFrameParityTests.test_from_pandas
[ https://issues.apache.org/jira/browse/SPARK-43654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732345#comment-17732345 ] Snoot.io commented on SPARK-43654: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/41587 > Enable InternalFrameParityTests.test_from_pandas > > > Key: SPARK-43654 > URL: https://issues.apache.org/jira/browse/SPARK-43654 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > Enable InternalFrameParityTests.test_from_pandas -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43474) Add support to create DataFrame Reference in Spark connect
[ https://issues.apache.org/jira/browse/SPARK-43474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732344#comment-17732344 ] Snoot.io commented on SPARK-43474: -- User 'rangadi' has created a pull request for this issue: https://github.com/apache/spark/pull/41580 > Add support to create DataFrame Reference in Spark connect > -- > > Key: SPARK-43474 > URL: https://issues.apache.org/jira/browse/SPARK-43474 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 3.5.0 >Reporter: Peng Zhong >Priority: Major > > Add support in Spark Connect to cache a DataFrame on server side. From client > side, it can create a reference to that DataFrame given the cache key. > > This function will be used in streaming foreachBatch(). Server needs to call > user function for every batch which takes a DataFrame as argument. With the > new function, we can just cache the DataFrame on the server. Pass the id back > to client which can creates the DataFrame reference. The server will replace > the reference when transforming. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44049) Fix KubernetesSuite to use `inNamespace` for validating driver pod cleanup
[ https://issues.apache.org/jira/browse/SPARK-44049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732341#comment-17732341 ] Snoot.io commented on SPARK-44049: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/41586 > Fix KubernetesSuite to use `inNamespace` for validating driver pod cleanup > -- > > Key: SPARK-44049 > URL: https://issues.apache.org/jira/browse/SPARK-44049 > Project: Spark > Issue Type: Test > Components: Kubernetes, Tests >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44049) Fix KubernetesSuite to use `inNamespace` for validating driver pod cleanup
[ https://issues.apache.org/jira/browse/SPARK-44049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732342#comment-17732342 ] Snoot.io commented on SPARK-44049: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/41586 > Fix KubernetesSuite to use `inNamespace` for validating driver pod cleanup > -- > > Key: SPARK-44049 > URL: https://issues.apache.org/jira/browse/SPARK-44049 > Project: Spark > Issue Type: Test > Components: Kubernetes, Tests >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44049) Fix KubernetesSuite to use `inNamespace` for validating driver pod cleanup
Dongjoon Hyun created SPARK-44049: - Summary: Fix KubernetesSuite to use `inNamespace` for validating driver pod cleanup Key: SPARK-44049 URL: https://issues.apache.org/jira/browse/SPARK-44049 Project: Spark Issue Type: Test Components: Kubernetes, Tests Affects Versions: 3.5.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43622) Enable pyspark.pandas.spark.functions.var in Spark Connect.
[ https://issues.apache.org/jira/browse/SPARK-43622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43622: - Assignee: Ruifeng Zheng > Enable pyspark.pandas.spark.functions.var in Spark Connect. > --- > > Key: SPARK-43622 > URL: https://issues.apache.org/jira/browse/SPARK-43622 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Ruifeng Zheng >Priority: Major > > Enable pyspark.pandas.spark.functions.var in Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43663) Enable SeriesParityTests.test_compare
[ https://issues.apache.org/jira/browse/SPARK-43663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43663: - Assignee: Haejoon Lee > Enable SeriesParityTests.test_compare > - > > Key: SPARK-43663 > URL: https://issues.apache.org/jira/browse/SPARK-43663 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > Enable SeriesParityTests.test_compare -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43663) Enable SeriesParityTests.test_compare
[ https://issues.apache.org/jira/browse/SPARK-43663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43663. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41567 [https://github.com/apache/spark/pull/41567] > Enable SeriesParityTests.test_compare > - > > Key: SPARK-43663 > URL: https://issues.apache.org/jira/browse/SPARK-43663 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.5.0 > > > Enable SeriesParityTests.test_compare -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44048) Remove sql-migration-old.md
[ https://issues.apache.org/jira/browse/SPARK-44048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732336#comment-17732336 ] Yuming Wang commented on SPARK-44048: - https://github.com/apache/spark/pull/41583 > Remove sql-migration-old.md > --- > > Key: SPARK-44048 > URL: https://issues.apache.org/jira/browse/SPARK-44048 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44048) Remove sql-migration-old.md
Yuming Wang created SPARK-44048: --- Summary: Remove sql-migration-old.md Key: SPARK-44048 URL: https://issues.apache.org/jira/browse/SPARK-44048 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 3.5.0 Reporter: Yuming Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43486) number of files read is incorrect if it is bucket table
[ https://issues.apache.org/jira/browse/SPARK-43486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732320#comment-17732320 ] Jia Fan commented on SPARK-43486: - I didn't reproduce it too.:( > number of files read is incorrect if it is bucket table > --- > > Key: SPARK-43486 > URL: https://issues.apache.org/jira/browse/SPARK-43486 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yuming Wang >Priority: Major > Attachments: screenshot-1.png > > > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44021) Add spark.sql.files.maxPartitionNum
[ https://issues.apache.org/jira/browse/SPARK-44021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44021: -- Summary: Add spark.sql.files.maxPartitionNum (was: Add a config to make it do not generate too many partitions) > Add spark.sql.files.maxPartitionNum > --- > > Key: SPARK-44021 > URL: https://issues.apache.org/jira/browse/SPARK-44021 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44021) Add a config to make it do not generate too many partitions
[ https://issues.apache.org/jira/browse/SPARK-44021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-44021: - Assignee: Yuming Wang > Add a config to make it do not generate too many partitions > --- > > Key: SPARK-44021 > URL: https://issues.apache.org/jira/browse/SPARK-44021 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44021) Add a config to make it do not generate too many partitions
[ https://issues.apache.org/jira/browse/SPARK-44021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-44021. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41545 [https://github.com/apache/spark/pull/41545] > Add a config to make it do not generate too many partitions > --- > > Key: SPARK-44021 > URL: https://issues.apache.org/jira/browse/SPARK-44021 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44047) Upgrade google guava for connect from 31.0.1-jre to 32.0.1-jre
BingKun Pan created SPARK-44047: --- Summary: Upgrade google guava for connect from 31.0.1-jre to 32.0.1-jre Key: SPARK-44047 URL: https://issues.apache.org/jira/browse/SPARK-44047 Project: Spark Issue Type: Improvement Components: Build, Connect Affects Versions: 3.5.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43486) number of files read is incorrect if it is bucket table
[ https://issues.apache.org/jira/browse/SPARK-43486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732308#comment-17732308 ] BingKun Pan commented on SPARK-43486: - Sorry, I didn't reproduce it. > number of files read is incorrect if it is bucket table > --- > > Key: SPARK-43486 > URL: https://issues.apache.org/jira/browse/SPARK-43486 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yuming Wang >Priority: Major > Attachments: screenshot-1.png > > > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43934) Add regexp_* functions to Scala and Python
[ https://issues.apache.org/jira/browse/SPARK-43934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43934: - Assignee: jiaan.geng > Add regexp_* functions to Scala and Python > -- > > Key: SPARK-43934 > URL: https://issues.apache.org/jira/browse/SPARK-43934 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Assignee: jiaan.geng >Priority: Major > > Add following functions: > * rlike > * regexp > * regexp_count > * regexp_extract_all > * regexp_instr > * regexp_like > * regexp_substr > to: > * Scala API > * Python API > * Spark Connect Scala Client > * Spark Connect Python Client -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43934) Add regexp_* functions to Scala and Python
[ https://issues.apache.org/jira/browse/SPARK-43934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43934. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41515 [https://github.com/apache/spark/pull/41515] > Add regexp_* functions to Scala and Python > -- > > Key: SPARK-43934 > URL: https://issues.apache.org/jira/browse/SPARK-43934 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.5.0 > > > Add following functions: > * rlike > * regexp > * regexp_count > * regexp_extract_all > * regexp_instr > * regexp_like > * regexp_substr > to: > * Scala API > * Python API > * Spark Connect Scala Client > * Spark Connect Python Client -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43691) Enable NumOpsParityTests.test_ne.
[ https://issues.apache.org/jira/browse/SPARK-43691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-43691: Assignee: Haejoon Lee > Enable NumOpsParityTests.test_ne. > - > > Key: SPARK-43691 > URL: https://issues.apache.org/jira/browse/SPARK-43691 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43684) Fix NullOps.eq to work with Spark Connect Column.
[ https://issues.apache.org/jira/browse/SPARK-43684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-43684: Assignee: Haejoon Lee > Fix NullOps.eq to work with Spark Connect Column. > - > > Key: SPARK-43684 > URL: https://issues.apache.org/jira/browse/SPARK-43684 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43684) Fix NullOps.eq to work with Spark Connect Column.
[ https://issues.apache.org/jira/browse/SPARK-43684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43684. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41514 [https://github.com/apache/spark/pull/41514] > Fix NullOps.eq to work with Spark Connect Column. > - > > Key: SPARK-43684 > URL: https://issues.apache.org/jira/browse/SPARK-43684 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43691) Enable NumOpsParityTests.test_ne.
[ https://issues.apache.org/jira/browse/SPARK-43691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43691. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41514 [https://github.com/apache/spark/pull/41514] > Enable NumOpsParityTests.test_ne. > - > > Key: SPARK-43691 > URL: https://issues.apache.org/jira/browse/SPARK-43691 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43686) Enable NumOpsParityTests.test_eq
[ https://issues.apache.org/jira/browse/SPARK-43686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43686. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41514 [https://github.com/apache/spark/pull/41514] > Enable NumOpsParityTests.test_eq > > > Key: SPARK-43686 > URL: https://issues.apache.org/jira/browse/SPARK-43686 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43685) Fix NullOps.ne to work with Spark Connect Column.
[ https://issues.apache.org/jira/browse/SPARK-43685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-43685: Assignee: Haejoon Lee > Fix NullOps.ne to work with Spark Connect Column. > - > > Key: SPARK-43685 > URL: https://issues.apache.org/jira/browse/SPARK-43685 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43685) Fix NullOps.ne to work with Spark Connect Column.
[ https://issues.apache.org/jira/browse/SPARK-43685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43685. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41514 [https://github.com/apache/spark/pull/41514] > Fix NullOps.ne to work with Spark Connect Column. > - > > Key: SPARK-43685 > URL: https://issues.apache.org/jira/browse/SPARK-43685 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43686) Enable NumOpsParityTests.test_eq
[ https://issues.apache.org/jira/browse/SPARK-43686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-43686: Assignee: Haejoon Lee > Enable NumOpsParityTests.test_eq > > > Key: SPARK-43686 > URL: https://issues.apache.org/jira/browse/SPARK-43686 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44046) Pyspark StreamingQueryListener listListener
Wei Liu created SPARK-44046: --- Summary: Pyspark StreamingQueryListener listListener Key: SPARK-44046 URL: https://issues.apache.org/jira/browse/SPARK-44046 Project: Spark Issue Type: New Feature Components: Structured Streaming Affects Versions: 3.5.0 Reporter: Wei Liu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43922) Add named argument support in parser for function call
[ https://issues.apache.org/jira/browse/SPARK-43922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732290#comment-17732290 ] Hudson commented on SPARK-43922: User 'learningchess2003' has created a pull request for this issue: https://github.com/apache/spark/pull/41429 > Add named argument support in parser for function call > -- > > Key: SPARK-43922 > URL: https://issues.apache.org/jira/browse/SPARK-43922 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.5.0 >Reporter: Richard Yu >Priority: Major > > Today, we are implementing named argument support for user defined functions, > some built-in functions, and table-valued functions. For the first step > towards building such a feature, we need to make some requisite changes in > the parser. > To accomplish this, in this issue, we plan to add some new syntax tokens to > the parser in Spark. Changes will also be made in the abstract syntax tree > builder as well to reflect these new tokens. Such changes will first be > restricted to normal function calls (table value functions will be treated > separately). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38162) Optimize one row plan in normal and AQE Optimizer
[ https://issues.apache.org/jira/browse/SPARK-38162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang reassigned SPARK-38162: --- Assignee: XiDuo You > Optimize one row plan in normal and AQE Optimizer > - > > Key: SPARK-38162 > URL: https://issues.apache.org/jira/browse/SPARK-38162 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Major > Fix For: 3.3.0 > > > Optimize the plan if its max row is equal to or less than 1 in these cases: > - if the child of sort max rows less than or equal to 1, remove the sort > - if the child of local sort max rows per partition less than or equal to 1, > remove the local sort > - if the child of aggregate max rows less than or equal to 1 and it's > grouping only (include the rewritten distinct plan), remove the aggregate > - if the child of aggregate max rows less than or equal to 1, set distinct to > false in all aggregate expression -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44045) Mark `WholeStageCodegenSparkSubmitSuite` as `ExtendedSQLTest`
[ https://issues.apache.org/jira/browse/SPARK-44045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-44045: - Assignee: Dongjoon Hyun > Mark `WholeStageCodegenSparkSubmitSuite` as `ExtendedSQLTest` > - > > Key: SPARK-44045 > URL: https://issues.apache.org/jira/browse/SPARK-44045 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44045) Mark `WholeStageCodegenSparkSubmitSuite` as `ExtendedSQLTest`
[ https://issues.apache.org/jira/browse/SPARK-44045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-44045. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41579 [https://github.com/apache/spark/pull/41579] > Mark `WholeStageCodegenSparkSubmitSuite` as `ExtendedSQLTest` > - > > Key: SPARK-44045 > URL: https://issues.apache.org/jira/browse/SPARK-44045 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44045) Mark `WholeStageCodegenSparkSubmitSuite` as `ExtendedSQLTest`
Dongjoon Hyun created SPARK-44045: - Summary: Mark `WholeStageCodegenSparkSubmitSuite` as `ExtendedSQLTest` Key: SPARK-44045 URL: https://issues.apache.org/jira/browse/SPARK-44045 Project: Spark Issue Type: Test Components: SQL, Tests Affects Versions: 3.5.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44041) Upgrade ammonite to 2.5.9
[ https://issues.apache.org/jira/browse/SPARK-44041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732236#comment-17732236 ] Dongjoon Hyun commented on SPARK-44041: --- Feel free to ping me when you make a PR~ I'm highly interested in validating and bringing this into Apache Spark repo, [~LuciferYang]. > Upgrade ammonite to 2.5.9 > - > > Key: SPARK-44041 > URL: https://issues.apache.org/jira/browse/SPARK-44041 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > > For support Scala 2.12.18 & 2.13.11 > > already has a tag : > [https://github.com/com-lihaoyi/Ammonite/releases/tag/2.5.9] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44044) Improve Error message for SQL Window functions
[ https://issues.apache.org/jira/browse/SPARK-44044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732218#comment-17732218 ] Siying Dong commented on SPARK-44044: - OSS PR created: [https://github.com/apache/spark/pull/41578/] [~kabhwan] can you help take a look? > Improve Error message for SQL Window functions > -- > > Key: SPARK-44044 > URL: https://issues.apache.org/jira/browse/SPARK-44044 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Siying Dong >Priority: Trivial > > Right now, if window spec is used with a stream query, the error message > looks like following: > Non-time-based windows are not supported on streaming DataFrames/Datasets; > Window [... > The message isn't very helpful to identify what's the problem is and some > customers and even support engineers got confused by this. It is suggested > that we call out aggregation function over the window spec so that the users > can locate the part of the query that caused the problem easier. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44044) Improve Error message for SQL Window functions
Siying Dong created SPARK-44044: --- Summary: Improve Error message for SQL Window functions Key: SPARK-44044 URL: https://issues.apache.org/jira/browse/SPARK-44044 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 3.5.0 Reporter: Siying Dong Right now, if window spec is used with a stream query, the error message looks like following: Non-time-based windows are not supported on streaming DataFrames/Datasets; Window [... The message isn't very helpful to identify what's the problem is and some customers and even support engineers got confused by this. It is suggested that we call out aggregation function over the window spec so that the users can locate the part of the query that caused the problem easier. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44018) Improve the hashCode for Some DS V2 Expression
[ https://issues.apache.org/jira/browse/SPARK-44018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732189#comment-17732189 ] Dongjoon Hyun commented on SPARK-44018: --- It seems that I'm confused. "Improve XXX" means this is not a bug fix. Is this just an improvement PR for Apache Spark 3.5.0? > Improve the hashCode for Some DS V2 Expression > -- > > Key: SPARK-44018 > URL: https://issues.apache.org/jira/browse/SPARK-44018 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > > The hashCode() of UserDefinedScalarFunc and GeneralScalarExpression is not > good enough. UserDefinedAggregateFunc and GeneralAggregateFunc missing > hashCode() -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44018) Improve the hashCode for Some DS V2 Expression
[ https://issues.apache.org/jira/browse/SPARK-44018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732188#comment-17732188 ] Dongjoon Hyun commented on SPARK-44018: --- Hi, [~beliefer]. We need to update `Affected Version` of this JIRA. This JIRA is 3.5.0 which means this is irrelevant to `branch-3.4` or Apache Spark 3.4.1. > Improve the hashCode for Some DS V2 Expression > -- > > Key: SPARK-44018 > URL: https://issues.apache.org/jira/browse/SPARK-44018 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > > The hashCode() of UserDefinedScalarFunc and GeneralScalarExpression is not > good enough. UserDefinedAggregateFunc and GeneralAggregateFunc missing > hashCode() -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44043) Reuse main scan exchange in group-based UPDATEs
Anton Okolnychyi created SPARK-44043: Summary: Reuse main scan exchange in group-based UPDATEs Key: SPARK-44043 URL: https://issues.apache.org/jira/browse/SPARK-44043 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Anton Okolnychyi Group-based UPDATE operations rewritten using UNION should reuse the main scan exchange. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44040) Incorrect result after count distinct
[ https://issues.apache.org/jira/browse/SPARK-44040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-44040: Affects Version/s: 3.3.2 > Incorrect result after count distinct > - > > Key: SPARK-44040 > URL: https://issues.apache.org/jira/browse/SPARK-44040 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.2, 3.4.0 >Reporter: Aleksandr Aleksandrov >Priority: Critical > > When i try to call count after distinct function for Decimal null field, > spark return incorrect result starting from spark 3.4.0. > A minimal example to reproduce: > import org.apache.spark.sql.types._ > import org.apache.spark.sql.\{Column, DataFrame, Dataset, Row, SparkSession} > import org.apache.spark.sql.types.\{StringType, StructField, StructType} > val schema = StructType( Array( > StructField("money", DecimalType(38,6), true), > StructField("reference_id", StringType, true) > )) > val payDf = spark.createDataFrame(sc.emptyRDD[Row], schema) > val aggDf = payDf.agg(sum("money").as("money")).withColumn("name", lit("df1")) > val aggDf1 = payDf.agg(sum("money").as("money")).withColumn("name", > lit("df2")) > val unionDF: DataFrame = aggDf.union(aggDf1) > unionDF.select("money").distinct.show // return correct result > unionDF.select("money").distinct.count // return 2 instead of 1 > unionDF.select("money").distinct.count == 1 // return false > This block of code returns some assertion error and after that an incorrect > count (in spark 3.2.1 everything works fine and i get correct result = 1): > *scala> unionDF.select("money").distinct.show // return correct result* > java.lang.AssertionError: assertion failed: > Decimal$DecimalIsFractional > while compiling: > during phase: globalPhase=terminal, enteringPhase=jvm > library version: version 2.12.17 > compiler version: version 2.12.17 > reconstructed args: -classpath > /Users/aleksandrov/.ivy2/jars/org.apache.spark_spark-connect_2.12-3.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-core_2.12-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-storage-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar:/Users/aleksandrov/.ivy2/jars/org.antlr_antlr4-runtime-4.9.3.jar > -Yrepl-class-based -Yrepl-outdir > /private/var/folders/qj/_dn4xbp14jn37qmdk7ylyfwcgr/T/spark-f37bb154-75f3-4db7-aea8-3c4363377bd8/repl-350f37a1-1df1-4816-bd62-97929c60a6c1 > last tree to typer: TypeTree(class Byte) > tree position: line 6 of > tree tpe: Byte > symbol: (final abstract) class Byte in package scala > symbol definition: final abstract class Byte extends (a ClassSymbol) > symbol package: scala > symbol owners: class Byte > call site: constructor $eval in object $eval in package $line19 > == Source file context for tree position == > 3 > 4object $eval { > 5lazyval $result = > $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.res0 > 6lazyval $print: {_}root{_}.java.lang.String = { > 7 $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw > 8 > 9"" > at > scala.reflect.internal.SymbolTable.throwAssertionError(SymbolTable.scala:185) > at scala.reflect.internal.Symbols$Symbol.completeInfo(Symbols.scala:1525) > at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1514) > at scala.reflect.internal.Symbols$Symbol.flatOwnerInfo(Symbols.scala:2353) > at > scala.reflect.internal.Symbols$ClassSymbol.companionModule0(Symbols.scala:3346) > at > scala.reflect.internal.Symbols$ClassSymbol.companionModule(Symbols.scala:3348) > at > scala.reflect.internal.Symbols$ModuleClassSymbol.sourceModule(Symbols.scala:3487) > at > scala.reflect.internal.Symbols.$anonfun$forEachRelevantSymbols$1$adapted(Symbols.scala:3802) > at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) > at scala.reflect.internal.Symbols.markFlagsCompleted(Symbols.scala:3799) > at scala.reflect.internal.Symbols.markFlagsCompleted$(Symbols.scala:3805) > at scala.reflect.internal.SymbolTable.markFlagsCompleted(SymbolTable.scala:28) > at > scala.reflect.internal.pickling.UnPickler$Scan.finishSym$1(UnPickler.scala:324) > at > scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:342) > at > scala.reflect.internal.pickling.UnPickler$Scan.readSymbolRef(UnPickler.scala:645) > at > scala.reflect.internal.pickling.UnPickler$Scan.readType(UnPickler.scala:413) > at > scala.reflect.internal.pickling.UnPickler$Scan.$anonfun$readSymbol$10(UnPickler.scala:357) > at scala.reflect.internal.pickling.UnPickler$Scan.at(UnPickler.scala:188) > at > scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:357) > at >
[jira] [Resolved] (SPARK-44016) Artifacts with name as an absolute path may overwrite other files
[ https://issues.apache.org/jira/browse/SPARK-44016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-44016. --- Resolution: Fixed > Artifacts with name as an absolute path may overwrite other files > -- > > Key: SPARK-44016 > URL: https://issues.apache.org/jira/browse/SPARK-44016 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Venkata Sai Akhil Gudesa >Assignee: Venkata Sai Akhil Gudesa >Priority: Major > Fix For: 3.5.0 > > > In `SparkConnectAddArtifactsHandler`, an artifact being moved to a staging > location may overwrite another file when the `name`/`path` of the artifact is > an `absolute` path. > This happens when the > [stagedPath|https://github.com/apache/spark/blob/master/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectAddArtifactsHandler.scala#L172] > is being computed with the help of the `.resolve(...)` method where the > `resolve` method returns the `other` path (in this case, the name of the > artifact) if the `other` path is an absolute path. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44041) Upgrade ammonite to 2.5.9
[ https://issues.apache.org/jira/browse/SPARK-44041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732173#comment-17732173 ] Dongjoon Hyun commented on SPARK-44041: --- Great! Thank you for making this happen, [~LuciferYang]! > Upgrade ammonite to 2.5.9 > - > > Key: SPARK-44041 > URL: https://issues.apache.org/jira/browse/SPARK-44041 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > > For support Scala 2.12.18 & 2.13.11 > > already has a tag : > [https://github.com/com-lihaoyi/Ammonite/releases/tag/2.5.9] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44040) Incorrect result after count distinct
[ https://issues.apache.org/jira/browse/SPARK-44040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732170#comment-17732170 ] Yuming Wang commented on SPARK-44040: - https://github.com/apache/spark/pull/41576 > Incorrect result after count distinct > - > > Key: SPARK-44040 > URL: https://issues.apache.org/jira/browse/SPARK-44040 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Aleksandr Aleksandrov >Priority: Critical > > When i try to call count after distinct function for Decimal null field, > spark return incorrect result starting from spark 3.4.0. > A minimal example to reproduce: > import org.apache.spark.sql.types._ > import org.apache.spark.sql.\{Column, DataFrame, Dataset, Row, SparkSession} > import org.apache.spark.sql.types.\{StringType, StructField, StructType} > val schema = StructType( Array( > StructField("money", DecimalType(38,6), true), > StructField("reference_id", StringType, true) > )) > val payDf = spark.createDataFrame(sc.emptyRDD[Row], schema) > val aggDf = payDf.agg(sum("money").as("money")).withColumn("name", lit("df1")) > val aggDf1 = payDf.agg(sum("money").as("money")).withColumn("name", > lit("df2")) > val unionDF: DataFrame = aggDf.union(aggDf1) > unionDF.select("money").distinct.show // return correct result > unionDF.select("money").distinct.count // return 2 instead of 1 > unionDF.select("money").distinct.count == 1 // return false > This block of code returns some assertion error and after that an incorrect > count (in spark 3.2.1 everything works fine and i get correct result = 1): > *scala> unionDF.select("money").distinct.show // return correct result* > java.lang.AssertionError: assertion failed: > Decimal$DecimalIsFractional > while compiling: > during phase: globalPhase=terminal, enteringPhase=jvm > library version: version 2.12.17 > compiler version: version 2.12.17 > reconstructed args: -classpath > /Users/aleksandrov/.ivy2/jars/org.apache.spark_spark-connect_2.12-3.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-core_2.12-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-storage-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar:/Users/aleksandrov/.ivy2/jars/org.antlr_antlr4-runtime-4.9.3.jar > -Yrepl-class-based -Yrepl-outdir > /private/var/folders/qj/_dn4xbp14jn37qmdk7ylyfwcgr/T/spark-f37bb154-75f3-4db7-aea8-3c4363377bd8/repl-350f37a1-1df1-4816-bd62-97929c60a6c1 > last tree to typer: TypeTree(class Byte) > tree position: line 6 of > tree tpe: Byte > symbol: (final abstract) class Byte in package scala > symbol definition: final abstract class Byte extends (a ClassSymbol) > symbol package: scala > symbol owners: class Byte > call site: constructor $eval in object $eval in package $line19 > == Source file context for tree position == > 3 > 4object $eval { > 5lazyval $result = > $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.res0 > 6lazyval $print: {_}root{_}.java.lang.String = { > 7 $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw > 8 > 9"" > at > scala.reflect.internal.SymbolTable.throwAssertionError(SymbolTable.scala:185) > at scala.reflect.internal.Symbols$Symbol.completeInfo(Symbols.scala:1525) > at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1514) > at scala.reflect.internal.Symbols$Symbol.flatOwnerInfo(Symbols.scala:2353) > at > scala.reflect.internal.Symbols$ClassSymbol.companionModule0(Symbols.scala:3346) > at > scala.reflect.internal.Symbols$ClassSymbol.companionModule(Symbols.scala:3348) > at > scala.reflect.internal.Symbols$ModuleClassSymbol.sourceModule(Symbols.scala:3487) > at > scala.reflect.internal.Symbols.$anonfun$forEachRelevantSymbols$1$adapted(Symbols.scala:3802) > at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) > at scala.reflect.internal.Symbols.markFlagsCompleted(Symbols.scala:3799) > at scala.reflect.internal.Symbols.markFlagsCompleted$(Symbols.scala:3805) > at scala.reflect.internal.SymbolTable.markFlagsCompleted(SymbolTable.scala:28) > at > scala.reflect.internal.pickling.UnPickler$Scan.finishSym$1(UnPickler.scala:324) > at > scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:342) > at > scala.reflect.internal.pickling.UnPickler$Scan.readSymbolRef(UnPickler.scala:645) > at > scala.reflect.internal.pickling.UnPickler$Scan.readType(UnPickler.scala:413) > at > scala.reflect.internal.pickling.UnPickler$Scan.$anonfun$readSymbol$10(UnPickler.scala:357) > at scala.reflect.internal.pickling.UnPickler$Scan.at(UnPickler.scala:188) > at >
[jira] [Commented] (SPARK-44040) Incorrect result after count distinct
[ https://issues.apache.org/jira/browse/SPARK-44040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732163#comment-17732163 ] Bruce Robbins commented on SPARK-44040: --- It seems this can be reproduced in {{spark-sql}} as well. Interestingly, turning off AQE seems to fix the issue (for both the above dataframe version and the below SQL version): {noformat} spark-sql (default)> create or replace temp view v1 as select 1 as c1 limit 0; Time taken: 0.959 seconds spark-sql (default)> create or replace temp view agg1 as select sum(c1) as c1, "agg1" as name from v1; Time taken: 0.16 seconds spark-sql (default)> create or replace temp view agg2 as select sum(c1) as c1, "agg2" as name from v1; Time taken: 0.035 seconds spark-sql (default)> create or replace temp view union1 as select * from agg1 union select * from agg2; Time taken: 0.088 seconds spark-sql (default)> -- the following incorrectly produces 2 rows select distinct c1 from union1; NULL NULL Time taken: 1.649 seconds, Fetched 2 row(s) spark-sql (default)> set spark.sql.adaptive.enabled=false; spark.sql.adaptive.enabled false Time taken: 0.019 seconds, Fetched 1 row(s) spark-sql (default)> -- the following correctly produces 1 row select distinct c1 from union1; NULL Time taken: 1.372 seconds, Fetched 1 row(s) spark-sql (default)> {noformat} > Incorrect result after count distinct > - > > Key: SPARK-44040 > URL: https://issues.apache.org/jira/browse/SPARK-44040 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Aleksandr Aleksandrov >Priority: Critical > > When i try to call count after distinct function for Decimal null field, > spark return incorrect result starting from spark 3.4.0. > A minimal example to reproduce: > import org.apache.spark.sql.types._ > import org.apache.spark.sql.\{Column, DataFrame, Dataset, Row, SparkSession} > import org.apache.spark.sql.types.\{StringType, StructField, StructType} > val schema = StructType( Array( > StructField("money", DecimalType(38,6), true), > StructField("reference_id", StringType, true) > )) > val payDf = spark.createDataFrame(sc.emptyRDD[Row], schema) > val aggDf = payDf.agg(sum("money").as("money")).withColumn("name", lit("df1")) > val aggDf1 = payDf.agg(sum("money").as("money")).withColumn("name", > lit("df2")) > val unionDF: DataFrame = aggDf.union(aggDf1) > unionDF.select("money").distinct.show // return correct result > unionDF.select("money").distinct.count // return 2 instead of 1 > unionDF.select("money").distinct.count == 1 // return false > This block of code returns some assertion error and after that an incorrect > count (in spark 3.2.1 everything works fine and i get correct result = 1): > *scala> unionDF.select("money").distinct.show // return correct result* > java.lang.AssertionError: assertion failed: > Decimal$DecimalIsFractional > while compiling: > during phase: globalPhase=terminal, enteringPhase=jvm > library version: version 2.12.17 > compiler version: version 2.12.17 > reconstructed args: -classpath > /Users/aleksandrov/.ivy2/jars/org.apache.spark_spark-connect_2.12-3.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-core_2.12-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-storage-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar:/Users/aleksandrov/.ivy2/jars/org.antlr_antlr4-runtime-4.9.3.jar > -Yrepl-class-based -Yrepl-outdir > /private/var/folders/qj/_dn4xbp14jn37qmdk7ylyfwcgr/T/spark-f37bb154-75f3-4db7-aea8-3c4363377bd8/repl-350f37a1-1df1-4816-bd62-97929c60a6c1 > last tree to typer: TypeTree(class Byte) > tree position: line 6 of > tree tpe: Byte > symbol: (final abstract) class Byte in package scala > symbol definition: final abstract class Byte extends (a ClassSymbol) > symbol package: scala > symbol owners: class Byte > call site: constructor $eval in object $eval in package $line19 > == Source file context for tree position == > 3 > 4object $eval { > 5lazyval $result = > $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.res0 > 6lazyval $print: {_}root{_}.java.lang.String = { > 7 $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw > 8 > 9"" > at > scala.reflect.internal.SymbolTable.throwAssertionError(SymbolTable.scala:185) > at scala.reflect.internal.Symbols$Symbol.completeInfo(Symbols.scala:1525) > at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1514) > at scala.reflect.internal.Symbols$Symbol.flatOwnerInfo(Symbols.scala:2353) > at > scala.reflect.internal.Symbols$ClassSymbol.companionModule0(Symbols.scala:3346) > at > scala.reflect.internal.Symbols$ClassSymbol.companionModule(Symbols.scala:3348) > at > scala.reflect.internal.Symbols$ModuleClassSymbol.sourceModule(Symbols.scala:3487) > at >
[jira] [Commented] (SPARK-44040) Incorrect result after count distinct
[ https://issues.apache.org/jira/browse/SPARK-44040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732159#comment-17732159 ] Yuming Wang commented on SPARK-44040: - Thanks for reporting this bug. We will fix it soon. > Incorrect result after count distinct > - > > Key: SPARK-44040 > URL: https://issues.apache.org/jira/browse/SPARK-44040 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Aleksandr Aleksandrov >Priority: Critical > > When i try to call count after distinct function for Decimal null field, > spark return incorrect result starting from spark 3.4.0. > A minimal example to reproduce: > import org.apache.spark.sql.types._ > import org.apache.spark.sql.\{Column, DataFrame, Dataset, Row, SparkSession} > import org.apache.spark.sql.types.\{StringType, StructField, StructType} > val schema = StructType( Array( > StructField("money", DecimalType(38,6), true), > StructField("reference_id", StringType, true) > )) > val payDf = spark.createDataFrame(sc.emptyRDD[Row], schema) > val aggDf = payDf.agg(sum("money").as("money")).withColumn("name", lit("df1")) > val aggDf1 = payDf.agg(sum("money").as("money")).withColumn("name", > lit("df2")) > val unionDF: DataFrame = aggDf.union(aggDf1) > unionDF.select("money").distinct.show // return correct result > unionDF.select("money").distinct.count // return 2 instead of 1 > unionDF.select("money").distinct.count == 1 // return false > This block of code returns some assertion error and after that an incorrect > count (in spark 3.2.1 everything works fine and i get correct result = 1): > *scala> unionDF.select("money").distinct.show // return correct result* > java.lang.AssertionError: assertion failed: > Decimal$DecimalIsFractional > while compiling: > during phase: globalPhase=terminal, enteringPhase=jvm > library version: version 2.12.17 > compiler version: version 2.12.17 > reconstructed args: -classpath > /Users/aleksandrov/.ivy2/jars/org.apache.spark_spark-connect_2.12-3.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-core_2.12-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-storage-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar:/Users/aleksandrov/.ivy2/jars/org.antlr_antlr4-runtime-4.9.3.jar > -Yrepl-class-based -Yrepl-outdir > /private/var/folders/qj/_dn4xbp14jn37qmdk7ylyfwcgr/T/spark-f37bb154-75f3-4db7-aea8-3c4363377bd8/repl-350f37a1-1df1-4816-bd62-97929c60a6c1 > last tree to typer: TypeTree(class Byte) > tree position: line 6 of > tree tpe: Byte > symbol: (final abstract) class Byte in package scala > symbol definition: final abstract class Byte extends (a ClassSymbol) > symbol package: scala > symbol owners: class Byte > call site: constructor $eval in object $eval in package $line19 > == Source file context for tree position == > 3 > 4object $eval { > 5lazyval $result = > $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.res0 > 6lazyval $print: {_}root{_}.java.lang.String = { > 7 $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw > 8 > 9"" > at > scala.reflect.internal.SymbolTable.throwAssertionError(SymbolTable.scala:185) > at scala.reflect.internal.Symbols$Symbol.completeInfo(Symbols.scala:1525) > at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1514) > at scala.reflect.internal.Symbols$Symbol.flatOwnerInfo(Symbols.scala:2353) > at > scala.reflect.internal.Symbols$ClassSymbol.companionModule0(Symbols.scala:3346) > at > scala.reflect.internal.Symbols$ClassSymbol.companionModule(Symbols.scala:3348) > at > scala.reflect.internal.Symbols$ModuleClassSymbol.sourceModule(Symbols.scala:3487) > at > scala.reflect.internal.Symbols.$anonfun$forEachRelevantSymbols$1$adapted(Symbols.scala:3802) > at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) > at scala.reflect.internal.Symbols.markFlagsCompleted(Symbols.scala:3799) > at scala.reflect.internal.Symbols.markFlagsCompleted$(Symbols.scala:3805) > at scala.reflect.internal.SymbolTable.markFlagsCompleted(SymbolTable.scala:28) > at > scala.reflect.internal.pickling.UnPickler$Scan.finishSym$1(UnPickler.scala:324) > at > scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:342) > at > scala.reflect.internal.pickling.UnPickler$Scan.readSymbolRef(UnPickler.scala:645) > at > scala.reflect.internal.pickling.UnPickler$Scan.readType(UnPickler.scala:413) > at > scala.reflect.internal.pickling.UnPickler$Scan.$anonfun$readSymbol$10(UnPickler.scala:357) > at scala.reflect.internal.pickling.UnPickler$Scan.at(UnPickler.scala:188) > at >
[jira] [Resolved] (SPARK-44028) Upgrade commons-io to 2.13.0
[ https://issues.apache.org/jira/browse/SPARK-44028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-44028. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41556 [https://github.com/apache/spark/pull/41556] > Upgrade commons-io to 2.13.0 > > > Key: SPARK-44028 > URL: https://issues.apache.org/jira/browse/SPARK-44028 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 3.5.0 > > > https://commons.apache.org/proper/commons-io/changes-report.html#a2.13.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44028) Upgrade commons-io to 2.13.0
[ https://issues.apache.org/jira/browse/SPARK-44028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-44028: - Assignee: Yang Jie > Upgrade commons-io to 2.13.0 > > > Key: SPARK-44028 > URL: https://issues.apache.org/jira/browse/SPARK-44028 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > > https://commons.apache.org/proper/commons-io/changes-report.html#a2.13.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44040) Incorrect result after count distinct
[ https://issues.apache.org/jira/browse/SPARK-44040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-44040: Target Version/s: 3.4.1 > Incorrect result after count distinct > - > > Key: SPARK-44040 > URL: https://issues.apache.org/jira/browse/SPARK-44040 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Aleksandr Aleksandrov >Priority: Critical > > When i try to call count after distinct function for Decimal null field, > spark return incorrect result starting from spark 3.4.0. > A minimal example to reproduce: > import org.apache.spark.sql.types._ > import org.apache.spark.sql.\{Column, DataFrame, Dataset, Row, SparkSession} > import org.apache.spark.sql.types.\{StringType, StructField, StructType} > val schema = StructType( Array( > StructField("money", DecimalType(38,6), true), > StructField("reference_id", StringType, true) > )) > val payDf = spark.createDataFrame(sc.emptyRDD[Row], schema) > val aggDf = payDf.agg(sum("money").as("money")).withColumn("name", lit("df1")) > val aggDf1 = payDf.agg(sum("money").as("money")).withColumn("name", > lit("df2")) > val unionDF: DataFrame = aggDf.union(aggDf1) > unionDF.select("money").distinct.show // return correct result > unionDF.select("money").distinct.count // return 2 instead of 1 > unionDF.select("money").distinct.count == 1 // return false > This block of code returns some assertion error and after that an incorrect > count (in spark 3.2.1 everything works fine and i get correct result = 1): > *scala> unionDF.select("money").distinct.show // return correct result* > java.lang.AssertionError: assertion failed: > Decimal$DecimalIsFractional > while compiling: > during phase: globalPhase=terminal, enteringPhase=jvm > library version: version 2.12.17 > compiler version: version 2.12.17 > reconstructed args: -classpath > /Users/aleksandrov/.ivy2/jars/org.apache.spark_spark-connect_2.12-3.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-core_2.12-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-storage-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar:/Users/aleksandrov/.ivy2/jars/org.antlr_antlr4-runtime-4.9.3.jar > -Yrepl-class-based -Yrepl-outdir > /private/var/folders/qj/_dn4xbp14jn37qmdk7ylyfwcgr/T/spark-f37bb154-75f3-4db7-aea8-3c4363377bd8/repl-350f37a1-1df1-4816-bd62-97929c60a6c1 > last tree to typer: TypeTree(class Byte) > tree position: line 6 of > tree tpe: Byte > symbol: (final abstract) class Byte in package scala > symbol definition: final abstract class Byte extends (a ClassSymbol) > symbol package: scala > symbol owners: class Byte > call site: constructor $eval in object $eval in package $line19 > == Source file context for tree position == > 3 > 4object $eval { > 5lazyval $result = > $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.res0 > 6lazyval $print: {_}root{_}.java.lang.String = { > 7 $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw > 8 > 9"" > at > scala.reflect.internal.SymbolTable.throwAssertionError(SymbolTable.scala:185) > at scala.reflect.internal.Symbols$Symbol.completeInfo(Symbols.scala:1525) > at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1514) > at scala.reflect.internal.Symbols$Symbol.flatOwnerInfo(Symbols.scala:2353) > at > scala.reflect.internal.Symbols$ClassSymbol.companionModule0(Symbols.scala:3346) > at > scala.reflect.internal.Symbols$ClassSymbol.companionModule(Symbols.scala:3348) > at > scala.reflect.internal.Symbols$ModuleClassSymbol.sourceModule(Symbols.scala:3487) > at > scala.reflect.internal.Symbols.$anonfun$forEachRelevantSymbols$1$adapted(Symbols.scala:3802) > at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) > at scala.reflect.internal.Symbols.markFlagsCompleted(Symbols.scala:3799) > at scala.reflect.internal.Symbols.markFlagsCompleted$(Symbols.scala:3805) > at scala.reflect.internal.SymbolTable.markFlagsCompleted(SymbolTable.scala:28) > at > scala.reflect.internal.pickling.UnPickler$Scan.finishSym$1(UnPickler.scala:324) > at > scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:342) > at > scala.reflect.internal.pickling.UnPickler$Scan.readSymbolRef(UnPickler.scala:645) > at > scala.reflect.internal.pickling.UnPickler$Scan.readType(UnPickler.scala:413) > at > scala.reflect.internal.pickling.UnPickler$Scan.$anonfun$readSymbol$10(UnPickler.scala:357) > at scala.reflect.internal.pickling.UnPickler$Scan.at(UnPickler.scala:188) > at > scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:357) > at >
[jira] [Created] (SPARK-44042) SPIP: PySpark Test Framework
Amanda Liu created SPARK-44042: -- Summary: SPIP: PySpark Test Framework Key: SPARK-44042 URL: https://issues.apache.org/jira/browse/SPARK-44042 Project: Spark Issue Type: New Feature Components: PySpark Affects Versions: 3.5.0 Reporter: Amanda Liu Currently, there's no official PySpark test framework, but only various open-source repos and blog posts. Many of these open-source resources are very popular, which demonstrates user-demand for PySpark testing capabilities. [spark-testing-base|https://github.com/holdenk/spark-testing-base] has 1.4k stars, and [chispa|https://github.com/MrPowers/chispa] has 532k downloads/month. However, it can be confusing for users to piece together disparate resources to write their own PySpark tests (see [The Elephant in the Room: How to Write PySpark Tests|https://towardsdatascience.com/the-elephant-in-the-room-how-to-write-pyspark-unit-tests-a5073acabc34]). We can streamline and simplify the testing process by incorporating test features, such as a PySpark Test Base class (which allows tests to share Spark sessions) and test util functions (for example, asserting dataframe and schema equality). Please see the full SPIP document attached: [https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44039) Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite
[ https://issues.apache.org/jira/browse/SPARK-44039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan updated SPARK-44039: Description: Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite, include: - When generating `GOLDEN` files, we should first delete the corresponding directories and generate new ones to avoid submitting some redundant files during the review process. eg: When we write a test named `make_timestamp_ltz` for the overloaded method, and during the review process, the reviewer wishes to add more tests for the method. The name of this method has changed during the next submission process, such as `make_timestamp_ltz without timezone`.At this point, if the `queries/function_make_timestamp_ltz.json`, `queries/function_make_timestamp_ltz.proto.bin` and `explain-results/function_make_timestamp_ltz.explain` files of `function_make_timestamp_ltz` are already in the commit, and there are many of these files, we generally do not notice the above problem, which leads to the incorrect submission of `queries/function_make_timestamp_ltz.json`, `queries/function_make_timestamp_ltz.proto.bin` and `explain-results/function_make_timestamp_ltz.explain` files without any impact on UT. These files are redundant. - Clear and update some redundant files submitted incorrectly > Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite > > > Key: SPARK-44039 > URL: https://issues.apache.org/jira/browse/SPARK-44039 > Project: Spark > Issue Type: Improvement > Components: Connect, Tests >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > > Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite, include: > - When generating `GOLDEN` files, we should first delete the corresponding > directories and generate new ones to avoid submitting some redundant files > during the review process. eg: > When we write a test named `make_timestamp_ltz` for the overloaded method, > and during the review process, the reviewer wishes to add more tests for the > method. The name of this method has changed during the next submission > process, such as `make_timestamp_ltz without timezone`.At this point, if the > `queries/function_make_timestamp_ltz.json`, > `queries/function_make_timestamp_ltz.proto.bin` and > `explain-results/function_make_timestamp_ltz.explain` files of > `function_make_timestamp_ltz` are already in the commit, and there are many > of these files, we generally do not notice the above problem, which leads to > the incorrect submission of `queries/function_make_timestamp_ltz.json`, > `queries/function_make_timestamp_ltz.proto.bin` and > `explain-results/function_make_timestamp_ltz.explain` files without any > impact on UT. These files are redundant. > - Clear and update some redundant files submitted incorrectly -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44041) Upgrade ammonite to 2.5.9
[ https://issues.apache.org/jira/browse/SPARK-44041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732113#comment-17732113 ] Yang Jie commented on SPARK-44041: -- Waiting for can download through Maven > Upgrade ammonite to 2.5.9 > - > > Key: SPARK-44041 > URL: https://issues.apache.org/jira/browse/SPARK-44041 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > > For support Scala 2.12.18 & 2.13.11 > > already has a tag : > [https://github.com/com-lihaoyi/Ammonite/releases/tag/2.5.9] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44039) Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite
[ https://issues.apache.org/jira/browse/SPARK-44039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732112#comment-17732112 ] Ignite TC Bot commented on SPARK-44039: --- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/41572 > Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite > > > Key: SPARK-44039 > URL: https://issues.apache.org/jira/browse/SPARK-44039 > Project: Spark > Issue Type: Improvement > Components: Connect, Tests >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38477) Use error classes in org.apache.spark.storage
[ https://issues.apache.org/jira/browse/SPARK-38477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732111#comment-17732111 ] Ignite TC Bot commented on SPARK-38477: --- User 'bozhang2820' has created a pull request for this issue: https://github.com/apache/spark/pull/41575 > Use error classes in org.apache.spark.storage > - > > Key: SPARK-38477 > URL: https://issues.apache.org/jira/browse/SPARK-38477 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Bo Zhang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44041) Upgrade ammonite to 2.5.9
[ https://issues.apache.org/jira/browse/SPARK-44041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732106#comment-17732106 ] Yang Jie commented on SPARK-44041: -- cc [~dongjoon] > Upgrade ammonite to 2.5.9 > - > > Key: SPARK-44041 > URL: https://issues.apache.org/jira/browse/SPARK-44041 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > > For support Scala 2.12.18 & 2.13.11 > > already has a tag : > [https://github.com/com-lihaoyi/Ammonite/releases/tag/2.5.9] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44041) Upgrade ammonite to 2.5.9
Yang Jie created SPARK-44041: Summary: Upgrade ammonite to 2.5.9 Key: SPARK-44041 URL: https://issues.apache.org/jira/browse/SPARK-44041 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.5.0 Reporter: Yang Jie For support Scala 2.12.18 & 2.13.11 already has a tag : [https://github.com/com-lihaoyi/Ammonite/releases/tag/2.5.9] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44040) Incorrect result after count distinct
[ https://issues.apache.org/jira/browse/SPARK-44040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Aleksandrov updated SPARK-44040: -- Description: When i try to call count after distinct function for Decimal null field, spark return incorrect result starting from spark 3.4.0. A minimal example to reproduce: import org.apache.spark.sql.types._ import org.apache.spark.sql.\{Column, DataFrame, Dataset, Row, SparkSession} import org.apache.spark.sql.types.\{StringType, StructField, StructType} val schema = StructType( Array( StructField("money", DecimalType(38,6), true), StructField("reference_id", StringType, true) )) val payDf = spark.createDataFrame(sc.emptyRDD[Row], schema) val aggDf = payDf.agg(sum("money").as("money")).withColumn("name", lit("df1")) val aggDf1 = payDf.agg(sum("money").as("money")).withColumn("name", lit("df2")) val unionDF: DataFrame = aggDf.union(aggDf1) unionDF.select("money").distinct.show // return correct result unionDF.select("money").distinct.count // return 2 instead of 1 unionDF.select("money").distinct.count == 1 // return false This block of code returns some assertion error and after that an incorrect count (in spark 3.2.1 everything works fine and i get correct result = 1): *scala> unionDF.select("money").distinct.show // return correct result* java.lang.AssertionError: assertion failed: Decimal$DecimalIsFractional while compiling: during phase: globalPhase=terminal, enteringPhase=jvm library version: version 2.12.17 compiler version: version 2.12.17 reconstructed args: -classpath /Users/aleksandrov/.ivy2/jars/org.apache.spark_spark-connect_2.12-3.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-core_2.12-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-storage-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar:/Users/aleksandrov/.ivy2/jars/org.antlr_antlr4-runtime-4.9.3.jar -Yrepl-class-based -Yrepl-outdir /private/var/folders/qj/_dn4xbp14jn37qmdk7ylyfwcgr/T/spark-f37bb154-75f3-4db7-aea8-3c4363377bd8/repl-350f37a1-1df1-4816-bd62-97929c60a6c1 last tree to typer: TypeTree(class Byte) tree position: line 6 of tree tpe: Byte symbol: (final abstract) class Byte in package scala symbol definition: final abstract class Byte extends (a ClassSymbol) symbol package: scala symbol owners: class Byte call site: constructor $eval in object $eval in package $line19 == Source file context for tree position == 3 4object $eval { 5lazyval $result = $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.res0 6lazyval $print: {_}root{_}.java.lang.String = { 7 $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw 8 9"" at scala.reflect.internal.SymbolTable.throwAssertionError(SymbolTable.scala:185) at scala.reflect.internal.Symbols$Symbol.completeInfo(Symbols.scala:1525) at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1514) at scala.reflect.internal.Symbols$Symbol.flatOwnerInfo(Symbols.scala:2353) at scala.reflect.internal.Symbols$ClassSymbol.companionModule0(Symbols.scala:3346) at scala.reflect.internal.Symbols$ClassSymbol.companionModule(Symbols.scala:3348) at scala.reflect.internal.Symbols$ModuleClassSymbol.sourceModule(Symbols.scala:3487) at scala.reflect.internal.Symbols.$anonfun$forEachRelevantSymbols$1$adapted(Symbols.scala:3802) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at scala.reflect.internal.Symbols.markFlagsCompleted(Symbols.scala:3799) at scala.reflect.internal.Symbols.markFlagsCompleted$(Symbols.scala:3805) at scala.reflect.internal.SymbolTable.markFlagsCompleted(SymbolTable.scala:28) at scala.reflect.internal.pickling.UnPickler$Scan.finishSym$1(UnPickler.scala:324) at scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:342) at scala.reflect.internal.pickling.UnPickler$Scan.readSymbolRef(UnPickler.scala:645) at scala.reflect.internal.pickling.UnPickler$Scan.readType(UnPickler.scala:413) at scala.reflect.internal.pickling.UnPickler$Scan.$anonfun$readSymbol$10(UnPickler.scala:357) at scala.reflect.internal.pickling.UnPickler$Scan.at(UnPickler.scala:188) at scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:357) at scala.reflect.internal.pickling.UnPickler$Scan.$anonfun$run$1(UnPickler.scala:96) at scala.reflect.internal.pickling.UnPickler$Scan.run(UnPickler.scala:88) at scala.reflect.internal.pickling.UnPickler.unpickle(UnPickler.scala:47) at scala.tools.nsc.symtab.classfile.ClassfileParser.unpickleOrParseInnerClasses(ClassfileParser.scala:1173) at scala.tools.nsc.symtab.classfile.ClassfileParser.parseClass(ClassfileParser.scala:467) at scala.tools.nsc.symtab.classfile.ClassfileParser.$anonfun$parse$2(ClassfileParser.scala:160) at
[jira] [Created] (SPARK-44040) Incorrect result after count distinct
Aleksandr Aleksandrov created SPARK-44040: - Summary: Incorrect result after count distinct Key: SPARK-44040 URL: https://issues.apache.org/jira/browse/SPARK-44040 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.4.0 Reporter: Aleksandr Aleksandrov When i try to call count after distinct function for Decimal null field, spark return incorrect result starting from spark 3.4.0. A minimal example to reproduce: import org.apache.spark.sql.types._ import org.apache.spark.sql.\{Column, DataFrame, Dataset, Row, SparkSession} import org.apache.spark.sql.types.\{StringType, StructField, StructType} val schema = StructType( Array( StructField("money", DecimalType(38,6), true), StructField("reference_id", StringType, true) )) val payDf = spark.createDataFrame(sc.emptyRDD[Row], schema) val aggDf = payDf.agg(sum("money").as("money")).withColumn("name", lit("df1")) val aggDf1 = payDf.agg(sum("money").as("money")).withColumn("name", lit("df2")) val unionDF: DataFrame = aggDf.union(aggDf1) unionDF.select("money").distinct.show // return correct result unionDF.select("money").distinct.count // return 2 instead of 1 unionDF.select("money").distinct.count == 1 // return false This block of code returns some assertion error and after that an incorrect count (in spark 3.2.1 everything works fine and i get correct result = 1): *scala> unionDF.select("money").distinct.show // return correct result* java.lang.AssertionError: assertion failed: Decimal$DecimalIsFractional while compiling: during phase: globalPhase=terminal, enteringPhase=jvm library version: version 2.12.17 compiler version: version 2.12.17 reconstructed args: -classpath /Users/aleksandrov/.ivy2/jars/org.apache.spark_spark-connect_2.12-3.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-core_2.12-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-storage-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar:/Users/aleksandrov/.ivy2/jars/org.antlr_antlr4-runtime-4.9.3.jar -Yrepl-class-based -Yrepl-outdir /private/var/folders/qj/_dn4xbp14jn37qmdk7ylyfwcgr/T/spark-f37bb154-75f3-4db7-aea8-3c4363377bd8/repl-350f37a1-1df1-4816-bd62-97929c60a6c1 last tree to typer: TypeTree(class Byte) tree position: line 6 of tree tpe: Byte symbol: (final abstract) class Byte in package scala symbol definition: final abstract class Byte extends (a ClassSymbol) symbol package: scala symbol owners: class Byte call site: constructor $eval in object $eval in package $line19 == Source file context for tree position == 3 4object $eval { 5lazyval $result = $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.res0 6lazyval $print: _root_.java.lang.String = { 7 $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw 8 9"" at scala.reflect.internal.SymbolTable.throwAssertionError(SymbolTable.scala:185) at scala.reflect.internal.Symbols$Symbol.completeInfo(Symbols.scala:1525) at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1514) at scala.reflect.internal.Symbols$Symbol.flatOwnerInfo(Symbols.scala:2353) at scala.reflect.internal.Symbols$ClassSymbol.companionModule0(Symbols.scala:3346) at scala.reflect.internal.Symbols$ClassSymbol.companionModule(Symbols.scala:3348) at scala.reflect.internal.Symbols$ModuleClassSymbol.sourceModule(Symbols.scala:3487) at scala.reflect.internal.Symbols.$anonfun$forEachRelevantSymbols$1$adapted(Symbols.scala:3802) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at scala.reflect.internal.Symbols.markFlagsCompleted(Symbols.scala:3799) at scala.reflect.internal.Symbols.markFlagsCompleted$(Symbols.scala:3805) at scala.reflect.internal.SymbolTable.markFlagsCompleted(SymbolTable.scala:28) at scala.reflect.internal.pickling.UnPickler$Scan.finishSym$1(UnPickler.scala:324) at scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:342) at scala.reflect.internal.pickling.UnPickler$Scan.readSymbolRef(UnPickler.scala:645) at scala.reflect.internal.pickling.UnPickler$Scan.readType(UnPickler.scala:413) at scala.reflect.internal.pickling.UnPickler$Scan.$anonfun$readSymbol$10(UnPickler.scala:357) at scala.reflect.internal.pickling.UnPickler$Scan.at(UnPickler.scala:188) at scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:357) at scala.reflect.internal.pickling.UnPickler$Scan.$anonfun$run$1(UnPickler.scala:96) at scala.reflect.internal.pickling.UnPickler$Scan.run(UnPickler.scala:88) at scala.reflect.internal.pickling.UnPickler.unpickle(UnPickler.scala:47) at scala.tools.nsc.symtab.classfile.ClassfileParser.unpickleOrParseInnerClasses(ClassfileParser.scala:1173) at
[jira] [Commented] (SPARK-43486) number of files read is incorrect if it is bucket table
[ https://issues.apache.org/jira/browse/SPARK-43486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732045#comment-17732045 ] Jia Fan commented on SPARK-43486: - [~panbingkun] Hi, any update for this? > number of files read is incorrect if it is bucket table > --- > > Key: SPARK-43486 > URL: https://issues.apache.org/jira/browse/SPARK-43486 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yuming Wang >Priority: Major > Attachments: screenshot-1.png > > > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44036) Cleanup & consolidate tickets to simplify the tasks.
[ https://issues.apache.org/jira/browse/SPARK-44036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44036. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41566 [https://github.com/apache/spark/pull/41566] > Cleanup & consolidate tickets to simplify the tasks. > > > Key: SPARK-44036 > URL: https://issues.apache.org/jira/browse/SPARK-44036 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.5.0 > > > We have so many tickets for pandas API on Spark with Spark Connect, so it > would be great if we can simplify them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44036) Cleanup & consolidate tickets to simplify the tasks.
[ https://issues.apache.org/jira/browse/SPARK-44036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44036: Assignee: Haejoon Lee > Cleanup & consolidate tickets to simplify the tasks. > > > Key: SPARK-44036 > URL: https://issues.apache.org/jira/browse/SPARK-44036 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > We have so many tickets for pandas API on Spark with Spark Connect, so it > would be great if we can simplify them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44039) Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite
BingKun Pan created SPARK-44039: --- Summary: Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite Key: SPARK-44039 URL: https://issues.apache.org/jira/browse/SPARK-44039 Project: Spark Issue Type: Improvement Components: Connect, Tests Affects Versions: 3.5.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43891) Support SHOW VIEWS IN . when not is not the current selected catalog
[ https://issues.apache.org/jira/browse/SPARK-43891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732031#comment-17732031 ] Jia Fan commented on SPARK-43891: - cc [~cloud_fan] [~dongjoon] > Support SHOW VIEWS IN . when not is not the > current selected catalog > --- > > Key: SPARK-43891 > URL: https://issues.apache.org/jira/browse/SPARK-43891 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-43891) Support SHOW VIEWS IN . when not is not the current selected catalog
[ https://issues.apache.org/jira/browse/SPARK-43891 ] Jia Fan deleted comment on SPARK-43891: - was (Author: fanjia): I can work for this. > Support SHOW VIEWS IN . when not is not the > current selected catalog > --- > > Key: SPARK-43891 > URL: https://issues.apache.org/jira/browse/SPARK-43891 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43891) Support SHOW VIEWS IN . when not is not the current selected catalog
[ https://issues.apache.org/jira/browse/SPARK-43891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732030#comment-17732030 ] Jia Fan commented on SPARK-43891: - Hi [~amaliujia] , I have a question about view, I find Spark add ViewCatalog for DataSourceV2, but we never use it(Can't create view through ViewCatalog at now). In my view, this ticket will be implement on DataSourceV2, so we can view different catalog view. But we don't support create view, what's the meaning of show view? > Support SHOW VIEWS IN . when not is not the > current selected catalog > --- > > Key: SPARK-43891 > URL: https://issues.apache.org/jira/browse/SPARK-43891 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44038) Update YuniKorn docs with v1.3
[ https://issues.apache.org/jira/browse/SPARK-44038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-44038. --- Fix Version/s: 3.5.0 3.4.1 Resolution: Fixed Issue resolved by pull request 41571 [https://github.com/apache/spark/pull/41571] > Update YuniKorn docs with v1.3 > -- > > Key: SPARK-44038 > URL: https://issues.apache.org/jira/browse/SPARK-44038 > Project: Spark > Issue Type: Documentation > Components: Documentation, Kubernetes >Affects Versions: 3.4.1, 3.5.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 3.5.0, 3.4.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44038) Update YuniKorn docs with v1.3
[ https://issues.apache.org/jira/browse/SPARK-44038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-44038: - Assignee: Dongjoon Hyun > Update YuniKorn docs with v1.3 > -- > > Key: SPARK-44038 > URL: https://issues.apache.org/jira/browse/SPARK-44038 > Project: Spark > Issue Type: Documentation > Components: Documentation, Kubernetes >Affects Versions: 3.4.1, 3.5.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43753) Incorrect result of MINUS in spark sql.
[ https://issues.apache.org/jira/browse/SPARK-43753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732012#comment-17732012 ] Jia Fan commented on SPARK-43753: - Seem already fixed on the master branch. > Incorrect result of MINUS in spark sql. > --- > > Key: SPARK-43753 > URL: https://issues.apache.org/jira/browse/SPARK-43753 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.8, 3.0.3, 3.1.3 >Reporter: Kernel Force >Priority: Major > > sql(""" > with va as ( > select '123' id, 'a' name > union all > select '123' id, 'b' name > ) > select '123' id, 'a' name from va t where t.name = 'a' > minus > select '123' id, 'a' name from va s where s.name = 'b' > """).show > +---++ > | id|name| > +---++ > |123| a| > +---++ > which is expected to be empty result set. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44038) Update YuniKorn docs with v1.3
[ https://issues.apache.org/jira/browse/SPARK-44038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44038: -- Issue Type: Documentation (was: Improvement) > Update YuniKorn docs with v1.3 > -- > > Key: SPARK-44038 > URL: https://issues.apache.org/jira/browse/SPARK-44038 > Project: Spark > Issue Type: Documentation > Components: Documentation, Kubernetes >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44038) Update YuniKorn docs with v1.3
[ https://issues.apache.org/jira/browse/SPARK-44038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44038: -- Affects Version/s: 3.4.1 > Update YuniKorn docs with v1.3 > -- > > Key: SPARK-44038 > URL: https://issues.apache.org/jira/browse/SPARK-44038 > Project: Spark > Issue Type: Documentation > Components: Documentation, Kubernetes >Affects Versions: 3.4.1, 3.5.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44038) Update YuniKorn docs with v1.3
Dongjoon Hyun created SPARK-44038: - Summary: Update YuniKorn docs with v1.3 Key: SPARK-44038 URL: https://issues.apache.org/jira/browse/SPARK-44038 Project: Spark Issue Type: Improvement Components: Documentation, Kubernetes Affects Versions: 3.5.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43891) Support SHOW VIEWS IN . when not is not the current selected catalog
[ https://issues.apache.org/jira/browse/SPARK-43891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732006#comment-17732006 ] Jia Fan commented on SPARK-43891: - I can work for this. > Support SHOW VIEWS IN . when not is not the > current selected catalog > --- > > Key: SPARK-43891 > URL: https://issues.apache.org/jira/browse/SPARK-43891 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44037) Add maxCharsPerRow option for CSV datasource
[ https://issues.apache.org/jira/browse/SPARK-44037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sysolyatin updated SPARK-44037: -- Description: CSV datasource supports maxColumns and maxCharsPerColumn options. But those two options do not allow limit row size properly. For instance, if I want to limit the row size to be less than or equal to 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then # User can not read column with size > 10 even if row size <= 100 # User can not read more than 10 columns even if row size <= 100 I suggest to add additional option maxCharsPerRow was: CSV datasource supports maxColumns and maxCharsPerColumn options. But those two options do not allow limit row size properly. For instance, if I want to limit the row size to be less than or equal to 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then # User can not read column with size > 10 even if row size <= 100 # User can not read more than 10 columns where each column < 5 chars even if row size <= 100 I suggest to add additional option maxCharsPerRow > Add maxCharsPerRow option for CSV datasource > > > Key: SPARK-44037 > URL: https://issues.apache.org/jira/browse/SPARK-44037 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Dmitry Sysolyatin >Priority: Major > > CSV datasource supports maxColumns and maxCharsPerColumn options. But those > two options do not allow limit row size properly. > For instance, if I want to limit the row size to be less than or equal to > 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then > # User can not read column with size > 10 even if row size <= 100 > # User can not read more than 10 columns even if row size <= 100 > I suggest to add additional option maxCharsPerRow -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44037) Add maxCharsPerRow option for CSV datasource
[ https://issues.apache.org/jira/browse/SPARK-44037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sysolyatin updated SPARK-44037: -- Description: CSV datasource supports maxColumns and maxCharsPerColumn options. But those two options do not allow limit row size properly. For instance, if I want to limit the row size to be less than or equal to 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then # User can not read column with size > 10 even if row size <= 100 # User can not read more than 10 columns where each column < 5 chars even if row size <= 100 I suggest to add additional option maxCharsPerRow was: CSV datasource supports maxColumns and maxCharsPerColumn options. But those two options do not allow limit row size properly. For instance, if I want to limit the row size to be less than or equal to 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then # User can not read column with size > 10 even if row size <= 100 # User can not read more then 10 columns where each column < 5 chars even if row size <= 100 I suggest to add additional option maxCharsPerRow > Add maxCharsPerRow option for CSV datasource > > > Key: SPARK-44037 > URL: https://issues.apache.org/jira/browse/SPARK-44037 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Dmitry Sysolyatin >Priority: Major > > CSV datasource supports maxColumns and maxCharsPerColumn options. But those > two options do not allow limit row size properly. > For instance, if I want to limit the row size to be less than or equal to > 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then > # User can not read column with size > 10 even if row size <= 100 > # User can not read more than 10 columns where each column < 5 chars even if > row size <= 100 > I suggest to add additional option maxCharsPerRow -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-44018) Improve the hashCode for Some DS V2 Expression
[ https://issues.apache.org/jira/browse/SPARK-44018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17731988#comment-17731988 ] jiaan.geng edited comment on SPARK-44018 at 6/13/23 8:38 AM: - [~dongjoon]Yes. I have created PR for this. https://github.com/apache/spark/pull/41543 was (Author: beliefer): [~dongjoon]Yes. I have created PR for this. > Improve the hashCode for Some DS V2 Expression > -- > > Key: SPARK-44018 > URL: https://issues.apache.org/jira/browse/SPARK-44018 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > > The hashCode() of UserDefinedScalarFunc and GeneralScalarExpression is not > good enough. UserDefinedAggregateFunc and GeneralAggregateFunc missing > hashCode() -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44037) Add maxCharsPerRow option for CSV datasource
[ https://issues.apache.org/jira/browse/SPARK-44037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sysolyatin updated SPARK-44037: -- Description: CSV datasource supports maxColumns and maxCharsPerColumn options. But those two options do not allow limit row size properly. For instance, if I want to limit the row size to be less than or equal to 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then # User can not read column with size > 10 even if row size <= 100 # User can not read more then 10 columns where each column < 5 chars even if row size <= 100 I suggest to add additional option maxCharsPerRow was: CSV datasource supports maxColumns and maxCharsPerColumn options. But those two option does not allow restrict row size properly. For instance, if I want to limit the row size to be less than or equal to 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then # User can not read column with size > 10 even if row size <= 100 # User can not read more then 10 columns where each column < 5 chars even if row size <= 100 I suggest to add additional option maxCharsPerRow > Add maxCharsPerRow option for CSV datasource > > > Key: SPARK-44037 > URL: https://issues.apache.org/jira/browse/SPARK-44037 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Dmitry Sysolyatin >Priority: Major > > CSV datasource supports maxColumns and maxCharsPerColumn options. But those > two options do not allow limit row size properly. > For instance, if I want to limit the row size to be less than or equal to > 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then > # User can not read column with size > 10 even if row size <= 100 > # User can not read more then 10 columns where each column < 5 chars even if > row size <= 100 > I suggest to add additional option maxCharsPerRow -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44037) Add maxCharsPerRow option for CSV datasource
[ https://issues.apache.org/jira/browse/SPARK-44037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sysolyatin updated SPARK-44037: -- Description: CSV datasource supports maxColumns and maxCharsPerColumn options. But those two option does not allow restrict row size properly. For instance, if I want to limit the row size to be less than or equal to 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then # User can not read column with size > 10 even if row size <= 100 # User can not read more then 10 columns where each column < 5 chars even if row size <= 100 I suggest to add additional option maxCharsPerRow was: CSV datasource supports maxColumns and maxCharsPerColumn options. But those two option does not allow restrict row size properly. For instance, if I want to limit the row size to be less than or equal to 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then # User can not read column with size > 10 even if row size <= 100 # User can not read more then 10 columns where each column < 5 chars even if row size <= 100 > Add maxCharsPerRow option for CSV datasource > > > Key: SPARK-44037 > URL: https://issues.apache.org/jira/browse/SPARK-44037 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Dmitry Sysolyatin >Priority: Major > > CSV datasource supports maxColumns and maxCharsPerColumn options. But those > two option does not allow restrict row size properly. > For instance, if I want to limit the row size to be less than or equal to > 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then > # User can not read column with size > 10 even if row size <= 100 > # User can not read more then 10 columns where each column < 5 chars even if > row size <= 100 > I suggest to add additional option maxCharsPerRow -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44018) Improve the hashCode for Some DS V2 Expression
[ https://issues.apache.org/jira/browse/SPARK-44018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17731988#comment-17731988 ] jiaan.geng commented on SPARK-44018: [~dongjoon]Yes. I have created PR for this. > Improve the hashCode for Some DS V2 Expression > -- > > Key: SPARK-44018 > URL: https://issues.apache.org/jira/browse/SPARK-44018 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > > The hashCode() of UserDefinedScalarFunc and GeneralScalarExpression is not > good enough. UserDefinedAggregateFunc and GeneralAggregateFunc missing > hashCode() -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44037) Add maxCharsPerRow option for CSV datasource
Dmitry Sysolyatin created SPARK-44037: - Summary: Add maxCharsPerRow option for CSV datasource Key: SPARK-44037 URL: https://issues.apache.org/jira/browse/SPARK-44037 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.4.0 Reporter: Dmitry Sysolyatin CSV datasource supports maxColumns and maxCharsPerColumn options. But those two option does not allow restrict row size properly. For instance, if I want to limit the row size to be less than or equal to 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then # User can not read column with size > 10 even if row size <= 100 # User can not read more then 10 columns where each column < 5 chars even if row size <= 100 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43619) Enable DataFrameSlowParityTests.test_udt
[ https://issues.apache.org/jira/browse/SPARK-43619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-43619: Fix Version/s: 3.5.0 > Enable DataFrameSlowParityTests.test_udt > > > Key: SPARK-43619 > URL: https://issues.apache.org/jira/browse/SPARK-43619 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > Fix For: 3.5.0 > > > Repro: > {code:java} > sparse_values = {0: 0.1, 1: 1.1} > sparse_vector = SparseVector(len(sparse_values), sparse_values) > pdf = pd.DataFrame({"a": [sparse_vector], "b": [10]}) > psdf = ps.from_pandas(pdf) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43661) Enable ReshapeParityTests.test_get_dummies_date_datetime
[ https://issues.apache.org/jira/browse/SPARK-43661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-43661: Fix Version/s: 3.5.0 > Enable ReshapeParityTests.test_get_dummies_date_datetime > > > Key: SPARK-43661 > URL: https://issues.apache.org/jira/browse/SPARK-43661 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > Fix For: 3.5.0 > > > Enable ReshapeParityTests.test_get_dummies_date_datetime -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43619) Enable DataFrameSlowParityTests.test_udt
[ https://issues.apache.org/jira/browse/SPARK-43619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee resolved SPARK-43619. - Resolution: Resolved This is contained by SPARK-44036. > Enable DataFrameSlowParityTests.test_udt > > > Key: SPARK-43619 > URL: https://issues.apache.org/jira/browse/SPARK-43619 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > Repro: > {code:java} > sparse_values = {0: 0.1, 1: 1.1} > sparse_vector = SparseVector(len(sparse_values), sparse_values) > pdf = pd.DataFrame({"a": [sparse_vector], "b": [10]}) > psdf = ps.from_pandas(pdf) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43661) Enable ReshapeParityTests.test_get_dummies_date_datetime
[ https://issues.apache.org/jira/browse/SPARK-43661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee resolved SPARK-43661. - Resolution: Resolved This is contained by SPARK-44036. > Enable ReshapeParityTests.test_get_dummies_date_datetime > > > Key: SPARK-43661 > URL: https://issues.apache.org/jira/browse/SPARK-43661 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > Enable ReshapeParityTests.test_get_dummies_date_datetime -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44036) Cleanup & consolidate tickets to simplify the tasks.
Haejoon Lee created SPARK-44036: --- Summary: Cleanup & consolidate tickets to simplify the tasks. Key: SPARK-44036 URL: https://issues.apache.org/jira/browse/SPARK-44036 Project: Spark Issue Type: Sub-task Components: Connect, Pandas API on Spark Affects Versions: 3.5.0 Reporter: Haejoon Lee We have so many tickets for pandas API on Spark with Spark Connect, so it would be great if we can simplify them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43710) Support functions.date_part for Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-43710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee resolved SPARK-43710. - Resolution: Duplicate It's duplicated by SPARK-43705 > Support functions.date_part for Spark Connect > - > > Key: SPARK-43710 > URL: https://issues.apache.org/jira/browse/SPARK-43710 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > Repro: run `TimedeltaIndexParityTests.test_properties` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44035) Split `pyspark.pandas.tests.connect.test_parity_ops_on_diff_frames_slow`
Ruifeng Zheng created SPARK-44035: - Summary: Split `pyspark.pandas.tests.connect.test_parity_ops_on_diff_frames_slow` Key: SPARK-44035 URL: https://issues.apache.org/jira/browse/SPARK-44035 Project: Spark Issue Type: Sub-task Components: Connect, Pandas API on Spark, Tests Affects Versions: 3.5.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43653) Enable GroupBySlowParityTests.test_split_apply_combine_on_series
[ https://issues.apache.org/jira/browse/SPARK-43653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee resolved SPARK-43653. - Resolution: Duplicate This is duplicated by SPARK-43445 > Enable GroupBySlowParityTests.test_split_apply_combine_on_series > > > Key: SPARK-43653 > URL: https://issues.apache.org/jira/browse/SPARK-43653 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > Enable GroupBySlowParityTests.test_split_apply_combine_on_series -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43652) Enable GroupBy.rank with Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-43652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee resolved SPARK-43652. - Resolution: Duplicate This is duplicated by SPARK-43611. > Enable GroupBy.rank with Spark Connect > -- > > Key: SPARK-43652 > URL: https://issues.apache.org/jira/browse/SPARK-43652 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > Enable Groupby.rank with Spark Connect -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org