[jira] [Updated] (SPARK-45943) DataSourceV2Relation.computeStats throws IllegalStateException in test mode

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45943:
---
Labels: pull-request-available  (was: )

> DataSourceV2Relation.computeStats throws IllegalStateException in test mode
> ---
>
> Key: SPARK-45943
> URL: https://issues.apache.org/jira/browse/SPARK-45943
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Asif
>Priority: Major
>  Labels: pull-request-available
>
> This issue surfaces when the new unit test of PR 
> SPARK-45866|https://github.com/apache/spark/pull/43824] is added



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42694) Data duplication and loss occur after executing 'insert overwrite...' in Spark 3.1.1

2023-11-16 Thread FengZhou (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17787074#comment-17787074
 ] 

FengZhou commented on SPARK-42694:
--

No. Everything is OK, all the tasks are successful.

> Data duplication and loss occur after executing 'insert overwrite...' in 
> Spark 3.1.1
> 
>
> Key: SPARK-42694
> URL: https://issues.apache.org/jira/browse/SPARK-42694
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.1
> Environment: Spark 3.1.1
> Hadoop 3.2.1
> Hive 3.1.2
>Reporter: FengZhou
>Priority: Critical
>  Labels: shuffle, spark
> Attachments: image-2023-03-07-15-59-08-818.png, 
> image-2023-03-07-15-59-27-665.png
>
>
> We are currently using Spark version 3.1.1 in our production environment. We 
> have noticed that occasionally, after executing 'insert overwrite ... 
> select', the resulting data is inconsistent, with some data being duplicated 
> or lost. This issue does not occur all the time and seems to be more 
> prevalent on large tables with tens of millions of records.
> We compared the execution plans for two runs of the same SQL and found that 
> they were identical. In the case where the SQL was executed successfully, the 
> amount of data being written and read during the shuffle stage was the same. 
> However, in the case where the problem occurred, the amount of data being 
> written and read during the shuffle stage was different. Please see the 
> attached screenshots for the write/read data during shuffle stage.
>  
> Normal SQL:
> !image-2023-03-07-15-59-08-818.png!
> SQL with issues:
> !image-2023-03-07-15-59-27-665.png!
>  
> Is this problem caused by a bug in version 3.1.1, specifically (SPARK-34534): 
> 'New protocol FetchShuffleBlocks in OneForOneBlockFetcher lead to data loss 
> or correctness'? Or is it caused by something else? What could be the root 
> cause of this problem?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45971) Correct the package name of `SparkCollectionUtils`

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45971:
---
Labels: pull-request-available  (was: )

> Correct the package name of `SparkCollectionUtils`
> --
>
> Key: SPARK-45971
> URL: https://issues.apache.org/jira/browse/SPARK-45971
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45962) Remove treatEmptyValuesAsNulls and use nullValue option instead in XML

2023-11-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45962.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43852
[https://github.com/apache/spark/pull/43852]

> Remove treatEmptyValuesAsNulls and use nullValue option instead in XML
> --
>
> Key: SPARK-45962
> URL: https://issues.apache.org/jira/browse/SPARK-45962
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Shujing Yang
>Assignee: Shujing Yang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Today, we offer two available options to handle null values. To enhance user 
> clarity and simplify usage, we propose consolidating these into a single 
> option. We recommend retaining the {{nullValue}} option due to its broader 
> semantic scope. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45971) Correct the package name of `SparkCollectionUtils`

2023-11-16 Thread Yang Jie (Jira)
Yang Jie created SPARK-45971:


 Summary: Correct the package name of `SparkCollectionUtils`
 Key: SPARK-45971
 URL: https://issues.apache.org/jira/browse/SPARK-45971
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45699) Fix "Widening conversion from `TypeA` to `TypeB` is deprecated because it loses precision"

2023-11-16 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17787053#comment-17787053
 ] 

Yang Jie commented on SPARK-45699:
--

[~hannahkamundson] Is there any progress on this ticket?

> Fix "Widening conversion from `TypeA` to `TypeB` is deprecated because it 
> loses precision"
> --
>
> Key: SPARK-45699
> URL: https://issues.apache.org/jira/browse/SPARK-45699
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1199:67:
>  Widening conversion from Long to Double is deprecated because it loses 
> precision. Write `.toDouble` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks.threshold
> [error]       val threshold = max(speculationMultiplier * medianDuration, 
> minTimeToSpeculation)
> [error]                                                                   ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1207:60:
>  Widening conversion from Long to Double is deprecated because it loses 
> precision. Write `.toDouble` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks
> [error]       foundTasks = checkAndSubmitSpeculatableTasks(timeMs, threshold, 
> customizedThreshold = true)
> [error]                                                            ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:137:48:
>  Widening conversion from Int to Float is deprecated because it loses 
> precision. Write `.toFloat` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.sql.connect.client.arrow.IntVectorReader.getFloat
> [error]   override def getFloat(i: Int): Float = getInt(i)
> [error]                                                ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:146:49:
>  Widening conversion from Long to Float is deprecated because it loses 
> precision. Write `.toFloat` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getFloat
> [error]   override def getFloat(i: Int): Float = getLong(i)
> [error]                                                 ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:147:51:
>  Widening conversion from Long to Double is deprecated because it loses 
> precision. Write `.toDouble` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getDouble
> [error]   override def getDouble(i: Int): Double = getLong(i)
> [error]                                                   ^ {code}
>  
>  
> The example of the compilation warning is as above, there are probably over 
> 100 similar cases that need to be fixed.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45966) Add missing methods for API reference.

2023-11-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45966.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43860
[https://github.com/apache/spark/pull/43860]

> Add missing methods for API reference.
> --
>
> Key: SPARK-45966
> URL: https://issues.apache.org/jira/browse/SPARK-45966
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45966) Add missing methods for API reference.

2023-11-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45966:
-

Assignee: Haejoon Lee

> Add missing methods for API reference.
> --
>
> Key: SPARK-45966
> URL: https://issues.apache.org/jira/browse/SPARK-45966
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45968) Upgrade github docker action to latest version

2023-11-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45968.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43862
[https://github.com/apache/spark/pull/43862]

> Upgrade github docker action to latest version
> --
>
> Key: SPARK-45968
> URL: https://issues.apache.org/jira/browse/SPARK-45968
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45970) Provide partitioning expressions in Java as same as Scala

2023-11-16 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-45970:


 Summary: Provide partitioning expressions in Java as same as Scala
 Key: SPARK-45970
 URL: https://issues.apache.org/jira/browse/SPARK-45970
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


See https://github.com/apache/spark/pull/43858.

Once Scala 3 is out, we can support the same way of partitioning expressions 
such as:

{code}
import static org.apache.spark.sql.functions.partitioning.*;
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45969) Document configuration change of executor failure tracker

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45969:
---
Labels: pull-request-available  (was: )

> Document configuration change of executor failure tracker
> -
>
> Key: SPARK-45969
> URL: https://issues.apache.org/jira/browse/SPARK-45969
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45969) Document configuration change of executor failure tracker

2023-11-16 Thread Cheng Pan (Jira)
Cheng Pan created SPARK-45969:
-

 Summary: Document configuration change of executor failure tracker
 Key: SPARK-45969
 URL: https://issues.apache.org/jira/browse/SPARK-45969
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 3.5.0
Reporter: Cheng Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45762) Shuffle managers defined in user jars are not available for some launch modes

2023-11-16 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-45762:
---

Assignee: Alessandro Bellina

> Shuffle managers defined in user jars are not available for some launch modes
> -
>
> Key: SPARK-45762
> URL: https://issues.apache.org/jira/browse/SPARK-45762
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Alessandro Bellina
>Assignee: Alessandro Bellina
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Starting a spark job in standalone mode with a custom `ShuffleManager` 
> provided in a jar via `--jars` does not work. This can also be experienced in 
> local-cluster mode.
> The approach that works consistently is to copy the jar containing the custom 
> `ShuffleManager` to a specific location in each node then add it to 
> `spark.driver.extraClassPath` and `spark.executor.extraClassPath`, but we 
> would like to move away from setting extra configurations unnecessarily.
> Example:
> {code:java}
> $SPARK_HOME/bin/spark-shell \
>   --master spark://127.0.0.1:7077 \
>   --conf spark.shuffle.manager=org.apache.spark.examples.TestShuffleManager \
>   --jars user-code.jar
> {code}
> This yields `java.lang.ClassNotFoundException` in the executors.
> {code:java}
> Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1915)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:61)
>   at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:436)
>   at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:425)
>   at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.spark.examples.TestShuffleManager
>   at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
>   at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520)
>   at java.base/java.lang.Class.forName0(Native Method)
>   at java.base/java.lang.Class.forName(Class.java:467)
>   at 
> org.apache.spark.util.SparkClassUtils.classForName(SparkClassUtils.scala:41)
>   at 
> org.apache.spark.util.SparkClassUtils.classForName$(SparkClassUtils.scala:36)
>   at org.apache.spark.util.Utils$.classForName(Utils.scala:95)
>   at 
> org.apache.spark.util.Utils$.instantiateSerializerOrShuffleManager(Utils.scala:2574)
>   at org.apache.spark.SparkEnv$.create(SparkEnv.scala:366)
>   at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:255)
>   at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$7(CoarseGrainedExecutorBackend.scala:487)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:62)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
>   at 
> java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
>   at java.base/javax.security.auth.Subject.doAs(Subject.java:439)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   ... 4 more
> {code}
> We can change our command to use `extraClassPath`:
> {code:java}
> $SPARK_HOME/bin/spark-shell \
>   --master spark://127.0.0.1:7077 \
>   --conf spark.shuffle.manager=org.apache.spark.examples.TestShuffleManager \
>   --conf spark.driver.extraClassPath=user-code.jar \
>  --conf spark.executor.extraClassPath=user-code.jar
> {code}
> Success after adding the jar to `extraClassPath`:
> {code:java}
> 23/10/26 12:58:26 INFO TransportClientFactory: Successfully created 
> connection to localhost/127.0.0.1:33053 after 7 ms (0 ms spent in bootstraps)
> 23/10/26 12:58:26 WARN TestShuffleManager: Instantiated TestShuffleManager!!
> 23/10/26 12:58:26 INFO DiskBlockManager: Created local directory at 
> /tmp/spark-cb101b05-c4b7-4ba9-8b3d-5b23baa7cb46/executor-5d5335dd-c116-4211-9691-87d8566017fd/blockmgr-2fcb1ab2-d886--8c7f-9dca2c880c2c
> {code}
> We would like to change startup order such that the original command 
> succeeds, without specifying `extraClassPath`:
> {code:java}
> $SPARK_HOME/bin/spark-shell \
>   --master spark://127.0.0.1:7077 \
>   --conf spark.shuffle.manager=org.apache.spark.examples.TestShuffleManager \
>   --jars user-code.jar
> {code}
> Proposed changes:
> Refactor code so we initialize the `ShuffleManager` later, after jars have 
> been locali

[jira] [Resolved] (SPARK-45762) Shuffle managers defined in user jars are not available for some launch modes

2023-11-16 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-45762.
-
Resolution: Fixed

Issue resolved by pull request 43627
[https://github.com/apache/spark/pull/43627]

> Shuffle managers defined in user jars are not available for some launch modes
> -
>
> Key: SPARK-45762
> URL: https://issues.apache.org/jira/browse/SPARK-45762
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Alessandro Bellina
>Assignee: Alessandro Bellina
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Starting a spark job in standalone mode with a custom `ShuffleManager` 
> provided in a jar via `--jars` does not work. This can also be experienced in 
> local-cluster mode.
> The approach that works consistently is to copy the jar containing the custom 
> `ShuffleManager` to a specific location in each node then add it to 
> `spark.driver.extraClassPath` and `spark.executor.extraClassPath`, but we 
> would like to move away from setting extra configurations unnecessarily.
> Example:
> {code:java}
> $SPARK_HOME/bin/spark-shell \
>   --master spark://127.0.0.1:7077 \
>   --conf spark.shuffle.manager=org.apache.spark.examples.TestShuffleManager \
>   --jars user-code.jar
> {code}
> This yields `java.lang.ClassNotFoundException` in the executors.
> {code:java}
> Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1915)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:61)
>   at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:436)
>   at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:425)
>   at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.spark.examples.TestShuffleManager
>   at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
>   at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520)
>   at java.base/java.lang.Class.forName0(Native Method)
>   at java.base/java.lang.Class.forName(Class.java:467)
>   at 
> org.apache.spark.util.SparkClassUtils.classForName(SparkClassUtils.scala:41)
>   at 
> org.apache.spark.util.SparkClassUtils.classForName$(SparkClassUtils.scala:36)
>   at org.apache.spark.util.Utils$.classForName(Utils.scala:95)
>   at 
> org.apache.spark.util.Utils$.instantiateSerializerOrShuffleManager(Utils.scala:2574)
>   at org.apache.spark.SparkEnv$.create(SparkEnv.scala:366)
>   at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:255)
>   at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$7(CoarseGrainedExecutorBackend.scala:487)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:62)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
>   at 
> java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
>   at java.base/javax.security.auth.Subject.doAs(Subject.java:439)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   ... 4 more
> {code}
> We can change our command to use `extraClassPath`:
> {code:java}
> $SPARK_HOME/bin/spark-shell \
>   --master spark://127.0.0.1:7077 \
>   --conf spark.shuffle.manager=org.apache.spark.examples.TestShuffleManager \
>   --conf spark.driver.extraClassPath=user-code.jar \
>  --conf spark.executor.extraClassPath=user-code.jar
> {code}
> Success after adding the jar to `extraClassPath`:
> {code:java}
> 23/10/26 12:58:26 INFO TransportClientFactory: Successfully created 
> connection to localhost/127.0.0.1:33053 after 7 ms (0 ms spent in bootstraps)
> 23/10/26 12:58:26 WARN TestShuffleManager: Instantiated TestShuffleManager!!
> 23/10/26 12:58:26 INFO DiskBlockManager: Created local directory at 
> /tmp/spark-cb101b05-c4b7-4ba9-8b3d-5b23baa7cb46/executor-5d5335dd-c116-4211-9691-87d8566017fd/blockmgr-2fcb1ab2-d886--8c7f-9dca2c880c2c
> {code}
> We would like to change startup order such that the original command 
> succeeds, without specifying `extraClassPath`:
> {code:java}
> $SPARK_HOME/bin/spark-shell \
>   --master spark://127.0.0.1:7077 \
>   --conf spark.shuffle.manager=org.apache.spark.examples.TestShuffleManager \
>   --jars user-code.jar
> {code}
> Proposed changes:
> Refactor code so we in

[jira] [Updated] (SPARK-44021) Add spark.sql.files.maxPartitionNum

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44021:
---
Labels: pull-request-available  (was: )

> Add spark.sql.files.maxPartitionNum
> ---
>
> Key: SPARK-44021
> URL: https://issues.apache.org/jira/browse/SPARK-44021
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45966) Add missing methods for API reference.

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45966:
---
Labels: pull-request-available  (was: )

> Add missing methods for API reference.
> --
>
> Key: SPARK-45966
> URL: https://issues.apache.org/jira/browse/SPARK-45966
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45964) Remove private[sql] in XML and JSON package under catalyst package

2023-11-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45964.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43856
[https://github.com/apache/spark/pull/43856]

> Remove private[sql] in XML and JSON package under catalyst package
> --
>
> Key: SPARK-45964
> URL: https://issues.apache.org/jira/browse/SPARK-45964
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> catalyst is intenral, so we don't need to annotate them as private[sql]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45967) Upgrade jackson to 2.16.0

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45967:
---
Labels: pull-request-available  (was: )

> Upgrade jackson to 2.16.0
> -
>
> Key: SPARK-45967
> URL: https://issues.apache.org/jira/browse/SPARK-45967
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45964) Remove private[sql] in XML and JSON package under catalyst package

2023-11-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45964:
-

Assignee: Hyukjin Kwon

> Remove private[sql] in XML and JSON package under catalyst package
> --
>
> Key: SPARK-45964
> URL: https://issues.apache.org/jira/browse/SPARK-45964
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> catalyst is intenral, so we don't need to annotate them as private[sql]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45967) Upgrade jackson to 2.16.0

2023-11-16 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-45967:
---

 Summary: Upgrade jackson to 2.16.0
 Key: SPARK-45967
 URL: https://issues.apache.org/jira/browse/SPARK-45967
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45966) Add missing methods for API reference.

2023-11-16 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-45966:
---

 Summary: Add missing methods for API reference.
 Key: SPARK-45966
 URL: https://issues.apache.org/jira/browse/SPARK-45966
 Project: Spark
  Issue Type: Bug
  Components: Documentation, Pandas API on Spark
Affects Versions: 4.0.0
Reporter: Haejoon Lee






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45965) Move DSv2 partitioning expressions into functions.partitioning

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45965:
---
Labels: pull-request-available  (was: )

> Move DSv2 partitioning expressions into functions.partitioning
> --
>
> Key: SPARK-45965
> URL: https://issues.apache.org/jira/browse/SPARK-45965
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> We weren't able to move those partitioning expressions into nested object 
> because of Scala 2.12 limitation. Now we're able to do it with Scala 2.13



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45952) Use built-in math constant in math functions

2023-11-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45952:


Assignee: Ruifeng Zheng

> Use built-in math constant in math functions 
> -
>
> Key: SPARK-45952
> URL: https://issues.apache.org/jira/browse/SPARK-45952
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45952) Use built-in math constant in math functions

2023-11-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45952.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43837
[https://github.com/apache/spark/pull/43837]

> Use built-in math constant in math functions 
> -
>
> Key: SPARK-45952
> URL: https://issues.apache.org/jira/browse/SPARK-45952
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45964) Remove private[sql] in XML and JSON package under catalyst package

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45964:
---
Labels: pull-request-available  (was: )

> Remove private[sql] in XML and JSON package under catalyst package
> --
>
> Key: SPARK-45964
> URL: https://issues.apache.org/jira/browse/SPARK-45964
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> catalyst is intenral, so we don't need to annotate them as private[sql]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40909) Reuse the broadcast exchange for bloom filter

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-40909:
---
Labels: pull-request-available  (was: )

> Reuse the broadcast exchange for bloom filter
> -
>
> Key: SPARK-40909
> URL: https://issues.apache.org/jira/browse/SPARK-40909
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
>
> Currently, if the creation side of bloom filter could be broadcasted, Spark 
> cannot inject a bloom filter or InSunquery filter into the application side.
> In fact, we can inject bloom filter which could reuse the broadcast exchange 
> and improve performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44669) Parquet/ORC files written using Hive Serde should has file extension

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44669:
---
Labels: pull-request-available  (was: )

> Parquet/ORC files written using Hive Serde should has file extension
> 
>
> Key: SPARK-44669
> URL: https://issues.apache.org/jira/browse/SPARK-44669
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45912) Enhancement of XSDToSchema API: Change to HDFS API for cloud storage accessibility

2023-11-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45912:


Assignee: Shujing Yang

> Enhancement  of XSDToSchema API: Change to HDFS API for cloud storage 
> accessibility
> ---
>
> Key: SPARK-45912
> URL: https://issues.apache.org/jira/browse/SPARK-45912
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Shujing Yang
>Assignee: Shujing Yang
>Priority: Major
>  Labels: pull-request-available
>
> Previously, it utilized `java.nio.path`, which limited file reading to local 
> file systems only. By changing this to an HDFS-compatible API, we now enable 
> the XSDToSchema function to access files in cloud storage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44704) Cleanup shuffle files from host node after migration due to graceful decommissioning

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44704:
---
Labels: pull-request-available  (was: )

> Cleanup shuffle files from host node after migration due to graceful 
> decommissioning
> 
>
> Key: SPARK-44704
> URL: https://issues.apache.org/jira/browse/SPARK-44704
> Project: Spark
>  Issue Type: Improvement
>  Components: Block Manager
>Affects Versions: 3.4.1
>Reporter: Deependra Patel
>Priority: Minor
>  Labels: pull-request-available
>
> Although these files will be deleted at the end of the application by the 
> external shuffle service, doing this early can free up resources and can help 
> in long running applications running out of disk space.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45912) Enhancement of XSDToSchema API: Change to HDFS API for cloud storage accessibility

2023-11-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45912.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43789
[https://github.com/apache/spark/pull/43789]

> Enhancement  of XSDToSchema API: Change to HDFS API for cloud storage 
> accessibility
> ---
>
> Key: SPARK-45912
> URL: https://issues.apache.org/jira/browse/SPARK-45912
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Shujing Yang
>Assignee: Shujing Yang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Previously, it utilized `java.nio.path`, which limited file reading to local 
> file systems only. By changing this to an HDFS-compatible API, we now enable 
> the XSDToSchema function to access files in cloud storage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45964) Remove private[sql] in XML and JSON package under catalyst package

2023-11-16 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-45964:


 Summary: Remove private[sql] in XML and JSON package under 
catalyst package
 Key: SPARK-45964
 URL: https://issues.apache.org/jira/browse/SPARK-45964
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


catalyst is intenral, so we don't need to annotate them as private[sql]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45963) Restore documentation for DSv2 API

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45963:
---
Labels: pull-request-available  (was: )

> Restore documentation for DSv2 API
> --
>
> Key: SPARK-45963
> URL: https://issues.apache.org/jira/browse/SPARK-45963
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0, 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> DSv2 documentation is mistakenly gone after 
> https://github.com/apache/spark/pull/38392. It used to exist in 3.3.0: 
> https://spark.apache.org/docs/3.3.0/api/scala/org/apache/spark/sql/connector/catalog/index.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45963) Restore documentation for DSv2 API

2023-11-16 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-45963:


 Summary: Restore documentation for DSv2 API
 Key: SPARK-45963
 URL: https://issues.apache.org/jira/browse/SPARK-45963
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0, 3.4.1, 4.0.0
Reporter: Hyukjin Kwon


DSv2 documentation is mistakenly gone after 
https://github.com/apache/spark/pull/38392. It used to exist in 3.3.0: 
https://spark.apache.org/docs/3.3.0/api/scala/org/apache/spark/sql/connector/catalog/index.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45959) Abusing DataSet.withColumn can cause huge tree with severe perf degradation

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45959:
---
Labels: pull-request-available  (was: )

> Abusing DataSet.withColumn can cause huge tree with severe perf degradation
> ---
>
> Key: SPARK-45959
> URL: https://issues.apache.org/jira/browse/SPARK-45959
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Asif
>Priority: Minor
>  Labels: pull-request-available
>
> Though documentation clearly recommends to add all columns in a single shot, 
> but in reality is difficult to expect customer to modify their code, as in 
> spark2  the rules in analyzer were such that  they did not do deep tree 
> traversal.  Moreover in Spark3 , the plans are cloned before giving to 
> analyzer , optimizer etc which was not the case in Spark2.
> All these things have resulted in query time being increased from 5 min to 2 
> - 3 hrs.
> Many times the columns are added to plan via some for loop logic which just 
> keeps adding new computation based on some rule.
> So,  my suggestion is to do some intial check in the withColumn api, before 
> creating a new projection, like if all the existing columns are still being 
> projected, and the new column being added has an expression which is not 
> depending on the output of the top node , but its child,  then instead of 
> adding a new project, the column can be added to the existing node.
> For starts, may be we can just handle Project node ..



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45950) Fix IvyTestUtils#createIvyDescriptor function and make common-utils module can run tests on GitHub Action

2023-11-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45950:


Assignee: Yang Jie

> Fix IvyTestUtils#createIvyDescriptor function and make common-utils module 
> can run tests on GitHub Action
> -
>
> Key: SPARK-45950
> URL: https://issues.apache.org/jira/browse/SPARK-45950
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra, Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45950) Fix IvyTestUtils#createIvyDescriptor function and make common-utils module can run tests on GitHub Action

2023-11-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45950.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43834
[https://github.com/apache/spark/pull/43834]

> Fix IvyTestUtils#createIvyDescriptor function and make common-utils module 
> can run tests on GitHub Action
> -
>
> Key: SPARK-45950
> URL: https://issues.apache.org/jira/browse/SPARK-45950
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra, Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45960) Add Python 3.10 to the Daily Python Github Action job

2023-11-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45960.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43847
[https://github.com/apache/spark/pull/43847]

> Add Python 3.10 to the Daily Python Github Action job
> -
>
> Key: SPARK-45960
> URL: https://issues.apache.org/jira/browse/SPARK-45960
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45961) Document `spark.master.*` configurations

2023-11-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45961:
--
Fix Version/s: 3.4.2

> Document `spark.master.*` configurations
> 
>
> Key: SPARK-45961
> URL: https://issues.apache.org/jira/browse/SPARK-45961
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.4.2, 4.0.0, 3.5.1
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.2, 4.0.0, 3.5.1
>
>
> Currently, `spark.master.*` configurations are undocumented.
> {code:java}
> $ git grep 'ConfigBuilder("spark.master'
> core/src/main/scala/org/apache/spark/internal/config/UI.scala:  val 
> MASTER_UI_DECOMMISSION_ALLOW_MODE = 
> ConfigBuilder("spark.master.ui.decommission.allow.mode")
> core/src/main/scala/org/apache/spark/internal/config/package.scala:  
> private[spark] val MASTER_REST_SERVER_ENABLED = 
> ConfigBuilder("spark.master.rest.enabled")
> core/src/main/scala/org/apache/spark/internal/config/package.scala:  
> private[spark] val MASTER_REST_SERVER_PORT = 
> ConfigBuilder("spark.master.rest.port")
> core/src/main/scala/org/apache/spark/internal/config/package.scala:  
> private[spark] val MASTER_UI_PORT = ConfigBuilder("spark.master.ui.port")
> core/src/main/scala/org/apache/spark/internal/config/package.scala:    
> ConfigBuilder("spark.master.ui.historyServerUrl")
> core/src/main/scala/org/apache/spark/internal/config/package.scala:    
> ConfigBuilder("spark.master.useAppNameAsAppId.enabled") {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45961) Document `spark.master.*` configurations

2023-11-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45961:
--
Fix Version/s: 3.5.1

> Document `spark.master.*` configurations
> 
>
> Key: SPARK-45961
> URL: https://issues.apache.org/jira/browse/SPARK-45961
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.4.2, 4.0.0, 3.5.1
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.1
>
>
> Currently, `spark.master.*` configurations are undocumented.
> {code:java}
> $ git grep 'ConfigBuilder("spark.master'
> core/src/main/scala/org/apache/spark/internal/config/UI.scala:  val 
> MASTER_UI_DECOMMISSION_ALLOW_MODE = 
> ConfigBuilder("spark.master.ui.decommission.allow.mode")
> core/src/main/scala/org/apache/spark/internal/config/package.scala:  
> private[spark] val MASTER_REST_SERVER_ENABLED = 
> ConfigBuilder("spark.master.rest.enabled")
> core/src/main/scala/org/apache/spark/internal/config/package.scala:  
> private[spark] val MASTER_REST_SERVER_PORT = 
> ConfigBuilder("spark.master.rest.port")
> core/src/main/scala/org/apache/spark/internal/config/package.scala:  
> private[spark] val MASTER_UI_PORT = ConfigBuilder("spark.master.ui.port")
> core/src/main/scala/org/apache/spark/internal/config/package.scala:    
> ConfigBuilder("spark.master.ui.historyServerUrl")
> core/src/main/scala/org/apache/spark/internal/config/package.scala:    
> ConfigBuilder("spark.master.useAppNameAsAppId.enabled") {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45961) Document `spark.master.*` configurations

2023-11-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45961:
-

Assignee: Dongjoon Hyun

> Document `spark.master.*` configurations
> 
>
> Key: SPARK-45961
> URL: https://issues.apache.org/jira/browse/SPARK-45961
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.4.2, 4.0.0, 3.5.1
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> Currently, `spark.master.*` configurations are undocumented.
> {code:java}
> $ git grep 'ConfigBuilder("spark.master'
> core/src/main/scala/org/apache/spark/internal/config/UI.scala:  val 
> MASTER_UI_DECOMMISSION_ALLOW_MODE = 
> ConfigBuilder("spark.master.ui.decommission.allow.mode")
> core/src/main/scala/org/apache/spark/internal/config/package.scala:  
> private[spark] val MASTER_REST_SERVER_ENABLED = 
> ConfigBuilder("spark.master.rest.enabled")
> core/src/main/scala/org/apache/spark/internal/config/package.scala:  
> private[spark] val MASTER_REST_SERVER_PORT = 
> ConfigBuilder("spark.master.rest.port")
> core/src/main/scala/org/apache/spark/internal/config/package.scala:  
> private[spark] val MASTER_UI_PORT = ConfigBuilder("spark.master.ui.port")
> core/src/main/scala/org/apache/spark/internal/config/package.scala:    
> ConfigBuilder("spark.master.ui.historyServerUrl")
> core/src/main/scala/org/apache/spark/internal/config/package.scala:    
> ConfigBuilder("spark.master.useAppNameAsAppId.enabled") {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45961) Document `spark.master.*` configurations

2023-11-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45961.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43848
[https://github.com/apache/spark/pull/43848]

> Document `spark.master.*` configurations
> 
>
> Key: SPARK-45961
> URL: https://issues.apache.org/jira/browse/SPARK-45961
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.4.2, 4.0.0, 3.5.1
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Currently, `spark.master.*` configurations are undocumented.
> {code:java}
> $ git grep 'ConfigBuilder("spark.master'
> core/src/main/scala/org/apache/spark/internal/config/UI.scala:  val 
> MASTER_UI_DECOMMISSION_ALLOW_MODE = 
> ConfigBuilder("spark.master.ui.decommission.allow.mode")
> core/src/main/scala/org/apache/spark/internal/config/package.scala:  
> private[spark] val MASTER_REST_SERVER_ENABLED = 
> ConfigBuilder("spark.master.rest.enabled")
> core/src/main/scala/org/apache/spark/internal/config/package.scala:  
> private[spark] val MASTER_REST_SERVER_PORT = 
> ConfigBuilder("spark.master.rest.port")
> core/src/main/scala/org/apache/spark/internal/config/package.scala:  
> private[spark] val MASTER_UI_PORT = ConfigBuilder("spark.master.ui.port")
> core/src/main/scala/org/apache/spark/internal/config/package.scala:    
> ConfigBuilder("spark.master.ui.historyServerUrl")
> core/src/main/scala/org/apache/spark/internal/config/package.scala:    
> ConfigBuilder("spark.master.useAppNameAsAppId.enabled") {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45962) Remove treatEmptyValuesAsNulls and use nullValue option instead in XML

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45962:
---
Labels: pull-request-available  (was: )

> Remove treatEmptyValuesAsNulls and use nullValue option instead in XML
> --
>
> Key: SPARK-45962
> URL: https://issues.apache.org/jira/browse/SPARK-45962
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Shujing Yang
>Priority: Major
>  Labels: pull-request-available
>
> Today, we offer two available options to handle null values. To enhance user 
> clarity and simplify usage, we propose consolidating these into a single 
> option. We recommend retaining the {{nullValue}} option due to its broader 
> semantic scope. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45961) Document `spark.master.*` configurations

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45961:
---
Labels: pull-request-available  (was: )

> Document `spark.master.*` configurations
> 
>
> Key: SPARK-45961
> URL: https://issues.apache.org/jira/browse/SPARK-45961
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.4.2, 4.0.0, 3.5.1
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> Currently, `spark.master.*` configurations are undocumented.
> {code:java}
> $ git grep 'ConfigBuilder("spark.master'
> core/src/main/scala/org/apache/spark/internal/config/UI.scala:  val 
> MASTER_UI_DECOMMISSION_ALLOW_MODE = 
> ConfigBuilder("spark.master.ui.decommission.allow.mode")
> core/src/main/scala/org/apache/spark/internal/config/package.scala:  
> private[spark] val MASTER_REST_SERVER_ENABLED = 
> ConfigBuilder("spark.master.rest.enabled")
> core/src/main/scala/org/apache/spark/internal/config/package.scala:  
> private[spark] val MASTER_REST_SERVER_PORT = 
> ConfigBuilder("spark.master.rest.port")
> core/src/main/scala/org/apache/spark/internal/config/package.scala:  
> private[spark] val MASTER_UI_PORT = ConfigBuilder("spark.master.ui.port")
> core/src/main/scala/org/apache/spark/internal/config/package.scala:    
> ConfigBuilder("spark.master.ui.historyServerUrl")
> core/src/main/scala/org/apache/spark/internal/config/package.scala:    
> ConfigBuilder("spark.master.useAppNameAsAppId.enabled") {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45961) Document `spark.master.*` configurations

2023-11-16 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-45961:
-

 Summary: Document `spark.master.*` configurations
 Key: SPARK-45961
 URL: https://issues.apache.org/jira/browse/SPARK-45961
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 3.4.2, 4.0.0, 3.5.1
Reporter: Dongjoon Hyun


Currently, `spark.master.*` configurations are undocumented.
{code:java}
$ git grep 'ConfigBuilder("spark.master'
core/src/main/scala/org/apache/spark/internal/config/UI.scala:  val 
MASTER_UI_DECOMMISSION_ALLOW_MODE = 
ConfigBuilder("spark.master.ui.decommission.allow.mode")
core/src/main/scala/org/apache/spark/internal/config/package.scala:  
private[spark] val MASTER_REST_SERVER_ENABLED = 
ConfigBuilder("spark.master.rest.enabled")
core/src/main/scala/org/apache/spark/internal/config/package.scala:  
private[spark] val MASTER_REST_SERVER_PORT = 
ConfigBuilder("spark.master.rest.port")
core/src/main/scala/org/apache/spark/internal/config/package.scala:  
private[spark] val MASTER_UI_PORT = ConfigBuilder("spark.master.ui.port")
core/src/main/scala/org/apache/spark/internal/config/package.scala:    
ConfigBuilder("spark.master.ui.historyServerUrl")
core/src/main/scala/org/apache/spark/internal/config/package.scala:    
ConfigBuilder("spark.master.useAppNameAsAppId.enabled") {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45958) Upgrade Arrow to 14.0.1

2023-11-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45958.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43846
[https://github.com/apache/spark/pull/43846]

> Upgrade Arrow to 14.0.1
> ---
>
> Key: SPARK-45958
> URL: https://issues.apache.org/jira/browse/SPARK-45958
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45946) Fix use of deprecated FileUtils write in RocksDBSuite

2023-11-16 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-45946.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43832
[https://github.com/apache/spark/pull/43832]

> Fix use of deprecated FileUtils write in RocksDBSuite
> -
>
> Key: SPARK-45946
> URL: https://issues.apache.org/jira/browse/SPARK-45946
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Anish Shrigondekar
>Assignee: Anish Shrigondekar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Fix use of deprecated FileUtils write in RocksDBSuite



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45960) Add Python 3.10 to the Daily Python Github Action job

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45960:
---
Labels: pull-request-available  (was: )

> Add Python 3.10 to the Daily Python Github Action job
> -
>
> Key: SPARK-45960
> URL: https://issues.apache.org/jira/browse/SPARK-45960
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45960) Add Python 3.10 to the Daily Python Github Action job

2023-11-16 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-45960:
-

 Summary: Add Python 3.10 to the Daily Python Github Action job
 Key: SPARK-45960
 URL: https://issues.apache.org/jira/browse/SPARK-45960
 Project: Spark
  Issue Type: Task
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45953) Add Python 3.10 to Infra docker image

2023-11-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45953.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43840
[https://github.com/apache/spark/pull/43840]

> Add Python 3.10 to Infra docker image
> -
>
> Key: SPARK-45953
> URL: https://issues.apache.org/jira/browse/SPARK-45953
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45953) Add Python 3.10 to Infra docker image

2023-11-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45953:
-

Assignee: Dongjoon Hyun

> Add Python 3.10 to Infra docker image
> -
>
> Key: SPARK-45953
> URL: https://issues.apache.org/jira/browse/SPARK-45953
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45959) Abusing DataSet.withColumn can cause huge tree with severe perf degradation

2023-11-16 Thread Asif (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Asif updated SPARK-45959:
-
Priority: Minor  (was: Major)

> Abusing DataSet.withColumn can cause huge tree with severe perf degradation
> ---
>
> Key: SPARK-45959
> URL: https://issues.apache.org/jira/browse/SPARK-45959
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Asif
>Priority: Minor
>
> Though documentation clearly recommends to add all columns in a single shot, 
> but in reality is difficult to expect customer to modify their code, as in 
> spark2  the rules in analyzer were such that  they did not do deep tree 
> traversal.  Moreover in Spark3 , the plans are cloned before giving to 
> analyzer , optimizer etc which was not the case in Spark2.
> All these things have resulted in query time being increased from 5 min to 2 
> - 3 hrs.
> Many times the columns are added to plan via some for loop logic which just 
> keeps adding new computation based on some rule.
> So,  my suggestion is to do some intial check in the withColumn api, before 
> creating a new projection, like if all the existing columns are still being 
> projected, and the new column being added has an expression which is not 
> depending on the output of the top node , but its child,  then instead of 
> adding a new project, the column can be added to the existing node.
> For starts, may be we can just handle Project node ..



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45959) Abusing DataSet.withColumn can cause huge tree with severe perf degradation

2023-11-16 Thread Asif (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786941#comment-17786941
 ] 

Asif commented on SPARK-45959:
--

will create a PR for the same..

> Abusing DataSet.withColumn can cause huge tree with severe perf degradation
> ---
>
> Key: SPARK-45959
> URL: https://issues.apache.org/jira/browse/SPARK-45959
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Asif
>Priority: Major
>
> Though documentation clearly recommends to add all columns in a single shot, 
> but in reality is difficult to expect customer to modify their code, as in 
> spark2  the rules in analyzer were such that  they did not do deep tree 
> traversal.  Moreover in Spark3 , the plans are cloned before giving to 
> analyzer , optimizer etc which was not the case in Spark2.
> All these things have resulted in query time being increased from 5 min to 2 
> - 3 hrs.
> Many times the columns are added to plan via some for loop logic which just 
> keeps adding new computation based on some rule.
> So,  my suggestion is to do some intial check in the withColumn api, before 
> creating a new projection, like if all the existing columns are still being 
> projected, and the new column being added has an expression which is not 
> depending on the output of the top node , but its child,  then instead of 
> adding a new project, the column can be added to the existing node.
> For starts, may be we can just handle Project node ..



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45959) Abusing DataSet.withColumn can cause huge tree with severe perf degradation

2023-11-16 Thread Asif (Jira)
Asif created SPARK-45959:


 Summary: Abusing DataSet.withColumn can cause huge tree with 
severe perf degradation
 Key: SPARK-45959
 URL: https://issues.apache.org/jira/browse/SPARK-45959
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.1
Reporter: Asif


Though documentation clearly recommends to add all columns in a single shot, 
but in reality is difficult to expect customer to modify their code, as in 
spark2  the rules in analyzer were such that  they did not do deep tree 
traversal.  Moreover in Spark3 , the plans are cloned before giving to analyzer 
, optimizer etc which was not the case in Spark2.
All these things have resulted in query time being increased from 5 min to 2 - 
3 hrs.
Many times the columns are added to plan via some for loop logic which just 
keeps adding new computation based on some rule.

So,  my suggestion is to do some intial check in the withColumn api, before 
creating a new projection, like if all the existing columns are still being 
projected, and the new column being added has an expression which is not 
depending on the output of the top node , but its child,  then instead of 
adding a new project, the column can be added to the existing node.
For starts, may be we can just handle Project node ..



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45958) Upgrade Arrow to 14.0.1

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45958:
---
Labels: pull-request-available  (was: )

> Upgrade Arrow to 14.0.1
> ---
>
> Key: SPARK-45958
> URL: https://issues.apache.org/jira/browse/SPARK-45958
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45958) Upgrade Arrow to 14.0.1

2023-11-16 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-45958:
-

 Summary: Upgrade Arrow to 14.0.1
 Key: SPARK-45958
 URL: https://issues.apache.org/jira/browse/SPARK-45958
 Project: Spark
  Issue Type: Task
  Components: Build
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44924) Add configurations for FileStreamSource cached files

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44924:
---
Labels: pull-request-available  (was: )

> Add configurations for FileStreamSource cached files
> 
>
> Key: SPARK-44924
> URL: https://issues.apache.org/jira/browse/SPARK-44924
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: kevin nacios
>Priority: Minor
>  Labels: pull-request-available
>
> With https://issues.apache.org/jira/browse/SPARK-30866, caching of listed 
> files was added for structured streaming to reduce cost of relisting from 
> filesystem each batch.  The settings that drive this are currently hardcoded 
> and there is no way to change them.  
>  
> This impacts some of our workloads where we process large datasets where its 
> unknown how "heavy" some files are, so a single batch can take a long period 
> of time.  When we set maxFilesPerTrigger to 100k files, a subsequent batch 
> using the cached max of 10k files is causing the job to take longer since the 
> cluster is capable of handling the 100k files but is stuck doing 10% of the 
> workload.  The benefit of the caching doesn't outweigh the cost of the 
> performance on the rest of the job.
>  
> With config settings available for this, we could either absorb some 
> increased driver memory usage for caching the next 100k files, or opt to 
> disable caching entirely and just relist files each batch by setting the 
> cache amount to 0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45956) Upgrade ZooKeeper to 3.7.2

2023-11-16 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786919#comment-17786919
 ] 

Dongjoon Hyun commented on SPARK-45956:
---

I collected this as a subtask of SPARK-44111 to give more visibility. Thank you 
for working on this.

> Upgrade ZooKeeper to 3.7.2
> --
>
> Key: SPARK-45956
> URL: https://issues.apache.org/jira/browse/SPARK-45956
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>  Labels: pull-request-available
>
> [CVE-2023-44981|https://nvd.nist.gov/vuln/detail/CVE-2023-44981]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45956) Upgrade ZooKeeper to 3.7.2

2023-11-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45956:
--
Parent: SPARK-44111
Issue Type: Sub-task  (was: Dependency upgrade)

> Upgrade ZooKeeper to 3.7.2
> --
>
> Key: SPARK-45956
> URL: https://issues.apache.org/jira/browse/SPARK-45956
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>  Labels: pull-request-available
>
> [CVE-2023-44981|https://nvd.nist.gov/vuln/detail/CVE-2023-44981]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44118) Support K8s scheduling gates

2023-11-16 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786916#comment-17786916
 ] 

Dongjoon Hyun commented on SPARK-44118:
---

We will revisit this after the feature reaches `GA` and the most K8s 
environment users can access this feature.

> Support K8s scheduling gates
> 
>
> Key: SPARK-44118
> URL: https://issues.apache.org/jira/browse/SPARK-44118
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> [https://kubernetes.io/docs/concepts/scheduling-eviction/pod-scheduling-readiness/]
>  - Kubernetes v1.26 [alpha]
>  - Kubernetes v1.27 [beta]
>  - Kubernetes v1.28 [beta]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44118) Support K8s scheduling gates

2023-11-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44118:
--
Description: 
[https://kubernetes.io/docs/concepts/scheduling-eviction/pod-scheduling-readiness/]
 - Kubernetes v1.26 [alpha]
 - Kubernetes v1.27 [beta]
 - Kubernetes v1.28 [beta]

  was:
https://kubernetes.io/docs/concepts/scheduling-eviction/pod-scheduling-readiness/
- Kubernetes v1.26 [alpha]


> Support K8s scheduling gates
> 
>
> Key: SPARK-44118
> URL: https://issues.apache.org/jira/browse/SPARK-44118
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> [https://kubernetes.io/docs/concepts/scheduling-eviction/pod-scheduling-readiness/]
>  - Kubernetes v1.26 [alpha]
>  - Kubernetes v1.27 [beta]
>  - Kubernetes v1.28 [beta]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44118) Support K8s scheduling gates

2023-11-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44118:
--
Parent: (was: SPARK-44111)
Issue Type: Improvement  (was: Sub-task)

> Support K8s scheduling gates
> 
>
> Key: SPARK-44118
> URL: https://issues.apache.org/jira/browse/SPARK-44118
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> [https://kubernetes.io/docs/concepts/scheduling-eviction/pod-scheduling-readiness/]
>  - Kubernetes v1.26 [alpha]
>  - Kubernetes v1.27 [beta]
>  - Kubernetes v1.28 [beta]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44118) Support K8s scheduling gates

2023-11-16 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786913#comment-17786913
 ] 

Dongjoon Hyun commented on SPARK-44118:
---

This is excluded from Apache Spark 4.0.0 scope because it's still `beta` even 
in K8s 1.28.

> Support K8s scheduling gates
> 
>
> Key: SPARK-44118
> URL: https://issues.apache.org/jira/browse/SPARK-44118
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> https://kubernetes.io/docs/concepts/scheduling-eviction/pod-scheduling-readiness/
> - Kubernetes v1.26 [alpha]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45957) SQL on streaming Temp view fails

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45957:
---
Labels: pull-request-available  (was: )

> SQL on streaming Temp view fails
> 
>
> Key: SPARK-45957
> URL: https://issues.apache.org/jira/browse/SPARK-45957
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Raghu Angadi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The following code fails in the last step with Spark Connect.
> The root cause is that Connect server triggers physical plan on a streaming 
> Dataframe [in 
> SparkConnectPlanner.scala|https://github.com/apache/spark/blob/334d952f9555cbfad8ef84987d6f978eb6b37b9b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala#L2591].
>  Better to avoid that entirely, but at least for streaming it should be 
> avoided since it cannot be done with a batch execution engine.
> {code:java}
> df = spark.readStream.format("rate").option("numPartitions", "1").load()
> df.createOrReplaceTempView("temp_view")
> view_df = spark.sql("SELECT * FROM temp_view") // FAILS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45957) SQL on streaming Temp view fails

2023-11-16 Thread Raghu Angadi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated SPARK-45957:
-
Description: 
The following code fails in the last step with Spark Connect.

The root cause is that Connect server triggers physical plan on a streaming 
Dataframe [in 
SparkConnectPlanner.scala|https://github.com/apache/spark/blob/334d952f9555cbfad8ef84987d6f978eb6b37b9b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala#L2591].
 Better to avoid that entirely, but at least for streaming it should be avoided 
since it cannot be done with a batch execution engine.
{code:java}
df = spark.readStream.format("rate").option("numPartitions", "1").load()
df.createOrReplaceTempView("temp_view")

view_df = spark.sql("SELECT * FROM temp_view") // FAILS{code}
 

  was:
The following code fails in the last step with Spark Connect.

The root cause is that Connect server triggers physical plan on a streaming 
Dataframe [in 
SparkConnectPlanner.scala|https://github.com/apache/spark/blob/334d952f9555cbfad8ef84987d6f978eb6b37b9b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala#L2591].
 Better to avoid that entirely, but at least for streaming it should be avoided 
since it cannot be done with a batch execution engine.

 

 
{code:java}
df = spark.readStream.format("rate").option("numPartitions", "1").load()
df.createOrReplaceTempView("temp_view")

view_df = spark.sql("SELECT * FROM temp_view") // FAILS{code}
 


> SQL on streaming Temp view fails
> 
>
> Key: SPARK-45957
> URL: https://issues.apache.org/jira/browse/SPARK-45957
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Raghu Angadi
>Priority: Major
> Fix For: 4.0.0
>
>
> The following code fails in the last step with Spark Connect.
> The root cause is that Connect server triggers physical plan on a streaming 
> Dataframe [in 
> SparkConnectPlanner.scala|https://github.com/apache/spark/blob/334d952f9555cbfad8ef84987d6f978eb6b37b9b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala#L2591].
>  Better to avoid that entirely, but at least for streaming it should be 
> avoided since it cannot be done with a batch execution engine.
> {code:java}
> df = spark.readStream.format("rate").option("numPartitions", "1").load()
> df.createOrReplaceTempView("temp_view")
> view_df = spark.sql("SELECT * FROM temp_view") // FAILS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45955) Collapse Support for Flamegraph and thread dump details

2023-11-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45955.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43842
[https://github.com/apache/spark/pull/43842]

> Collapse Support for Flamegraph and thread dump details
> ---
>
> Key: SPARK-45955
> URL: https://issues.apache.org/jira/browse/SPARK-45955
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45957) SQL on streaming Temp view fails

2023-11-16 Thread Raghu Angadi (Jira)
Raghu Angadi created SPARK-45957:


 Summary: SQL on streaming Temp view fails
 Key: SPARK-45957
 URL: https://issues.apache.org/jira/browse/SPARK-45957
 Project: Spark
  Issue Type: Bug
  Components: Connect, Structured Streaming
Affects Versions: 4.0.0
Reporter: Raghu Angadi
 Fix For: 4.0.0


The following code fails in the last step with Spark Connect.

The root cause is that Connect server triggers physical plan on a streaming 
Dataframe [in 
SparkConnectPlanner.scala|https://github.com/apache/spark/blob/334d952f9555cbfad8ef84987d6f978eb6b37b9b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala#L2591].
 Better to avoid that entirely, but at least for streaming it should be avoided 
since it cannot be done with a batch execution engine.

 

 
{code:java}
df = spark.readStream.format("rate").option("numPartitions", "1").load()
df.createOrReplaceTempView("temp_view")

view_df = spark.sql("SELECT * FROM temp_view") // FAILS{code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45955) Collapse Support for Flamegraph and thread dump details

2023-11-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45955:
-

Assignee: Kent Yao

> Collapse Support for Flamegraph and thread dump details
> ---
>
> Key: SPARK-45955
> URL: https://issues.apache.org/jira/browse/SPARK-45955
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45956) Upgrade ZooKeeper to 3.7.2

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45956:
---
Labels: pull-request-available  (was: )

> Upgrade ZooKeeper to 3.7.2
> --
>
> Key: SPARK-45956
> URL: https://issues.apache.org/jira/browse/SPARK-45956
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>  Labels: pull-request-available
>
> [CVE-2023-44981|https://nvd.nist.gov/vuln/detail/CVE-2023-44981]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45956) Upgrade ZooKeeper to 3.7.2

2023-11-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-45956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bjørn Jørgensen updated SPARK-45956:

Summary: Upgrade ZooKeeper to 3.7.2  (was: Upgrade ZooKeeper to X.X)

> Upgrade ZooKeeper to 3.7.2
> --
>
> Key: SPARK-45956
> URL: https://issues.apache.org/jira/browse/SPARK-45956
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> [CVE-2023-44981|https://nvd.nist.gov/vuln/detail/CVE-2023-44981]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45956) Upgrade ZooKeeper to X.X

2023-11-16 Thread Jira
Bjørn Jørgensen created SPARK-45956:
---

 Summary: Upgrade ZooKeeper to X.X
 Key: SPARK-45956
 URL: https://issues.apache.org/jira/browse/SPARK-45956
 Project: Spark
  Issue Type: Dependency upgrade
  Components: Build
Affects Versions: 4.0.0
Reporter: Bjørn Jørgensen


[CVE-2023-44981|https://nvd.nist.gov/vuln/detail/CVE-2023-44981]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45920) group by ordinal should be idempotent

2023-11-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45920:
--
Fix Version/s: 3.3.4

> group by ordinal should be idempotent
> -
>
> Key: SPARK-45920
> URL: https://issues.apache.org/jira/browse/SPARK-45920
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.2, 4.0.0, 3.5.1, 3.3.4
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45920) group by ordinal should be idempotent

2023-11-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45920:
--
Fix Version/s: 3.4.2

> group by ordinal should be idempotent
> -
>
> Key: SPARK-45920
> URL: https://issues.apache.org/jira/browse/SPARK-45920
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.2, 4.0.0, 3.5.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45920) group by ordinal should be idempotent

2023-11-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45920:
--
Fix Version/s: 3.5.1

> group by ordinal should be idempotent
> -
>
> Key: SPARK-45920
> URL: https://issues.apache.org/jira/browse/SPARK-45920
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43980) Add support for EXCEPT in select clause, similar to what databricks provides

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-43980:
---
Labels: pull-request-available  (was: )

> Add support for EXCEPT in select clause, similar to what databricks provides
> 
>
> Key: SPARK-43980
> URL: https://issues.apache.org/jira/browse/SPARK-43980
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yash Kothari
>Priority: Major
>  Labels: pull-request-available
>
> I'm looking for a way to incorporate the {{select * except(col1, ...)}} 
> clause provided by Databricks into my workflow. I don't use Databricks and 
> would like to introduce this {{select except}} clause either as a 
> spark-package or by contributing a change to Spark.
> However, I'm unsure about how to begin this process and would appreciate any 
> guidance from the community.
> [https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-select.html#examples]
>  
> Thank you for your assistance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45954) Avoid generating redundant ShuffleExchangeExec node

2023-11-16 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-45954:

Summary: Avoid generating redundant ShuffleExchangeExec node  (was: Remove 
redundant shuffles)

> Avoid generating redundant ShuffleExchangeExec node
> ---
>
> Key: SPARK-45954
> URL: https://issues.apache.org/jira/browse/SPARK-45954
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45951) Upgrade buf to v1.28.1

2023-11-16 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-45951.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43835
[https://github.com/apache/spark/pull/43835]

> Upgrade buf to v1.28.1
> --
>
> Key: SPARK-45951
> URL: https://issues.apache.org/jira/browse/SPARK-45951
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45951) Upgrade buf to v1.28.1

2023-11-16 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-45951:


Assignee: Ruifeng Zheng

> Upgrade buf to v1.28.1
> --
>
> Key: SPARK-45951
> URL: https://issues.apache.org/jira/browse/SPARK-45951
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45955) Collapse Support for Flamegraph and thread dump details

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45955:
---
Labels: pull-request-available  (was: )

> Collapse Support for Flamegraph and thread dump details
> ---
>
> Key: SPARK-45955
> URL: https://issues.apache.org/jira/browse/SPARK-45955
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45955) Collapse Support for Flamegraph and thread dump details

2023-11-16 Thread Kent Yao (Jira)
Kent Yao created SPARK-45955:


 Summary: Collapse Support for Flamegraph and thread dump details
 Key: SPARK-45955
 URL: https://issues.apache.org/jira/browse/SPARK-45955
 Project: Spark
  Issue Type: Sub-task
  Components: Web UI
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45954) Remove redundant shuffles

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45954:
---
Labels: pull-request-available  (was: )

> Remove redundant shuffles
> -
>
> Key: SPARK-45954
> URL: https://issues.apache.org/jira/browse/SPARK-45954
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45954) Remove redundant shuffles

2023-11-16 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-45954:
---

 Summary: Remove redundant shuffles
 Key: SPARK-45954
 URL: https://issues.apache.org/jira/browse/SPARK-45954
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45950) Fix IvyTestUtils#createIvyDescriptor function and make common-utils module can run tests on GitHub Action

2023-11-16 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-45950:
-
Summary: Fix IvyTestUtils#createIvyDescriptor function and make 
common-utils module can run tests on GitHub Action  (was: Make `common-utils` 
module can run tests on GitHub Action)

> Fix IvyTestUtils#createIvyDescriptor function and make common-utils module 
> can run tests on GitHub Action
> -
>
> Key: SPARK-45950
> URL: https://issues.apache.org/jira/browse/SPARK-45950
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45950) Fix IvyTestUtils#createIvyDescriptor function and make common-utils module can run tests on GitHub Action

2023-11-16 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-45950:
-
Component/s: Spark Core

> Fix IvyTestUtils#createIvyDescriptor function and make common-utils module 
> can run tests on GitHub Action
> -
>
> Key: SPARK-45950
> URL: https://issues.apache.org/jira/browse/SPARK-45950
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra, Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45950) Fix IvyTestUtils#createIvyDescriptor function and make common-utils module can run tests on GitHub Action

2023-11-16 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-45950:
-
Issue Type: Bug  (was: Improvement)

> Fix IvyTestUtils#createIvyDescriptor function and make common-utils module 
> can run tests on GitHub Action
> -
>
> Key: SPARK-45950
> URL: https://issues.apache.org/jira/browse/SPARK-45950
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45414) spark-xml misplaces string tag content

2023-11-16 Thread Giuseppe Ceravolo (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786700#comment-17786700
 ] 

Giuseppe Ceravolo commented on SPARK-45414:
---

[~ritikam] I appreciate your support, but I do not want to have to 
manually/programmatically move up or down one or more fields... I am looking 
for an automatic fix of this error
by the way, I have already put in place (in production) the workaround you are 
suggesting by programmatically moving down all string columns, and adding a new 
"fake" column for each one of them, writing the file like that and then reading 
it back to remove the "fake" tags and re-writing it... not the best solution I 
guess :)

> spark-xml misplaces string tag content
> --
>
> Key: SPARK-45414
> URL: https://issues.apache.org/jira/browse/SPARK-45414
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Spark Core
>Affects Versions: 3.3.0
>Reporter: Giuseppe Ceravolo
>Priority: Critical
> Attachments: IllegalArgumentException.txt
>
>
> h1. Intro
> Hi all! Please expect some degree of incompleteness in this issue as this is 
> the very first one I post, and feel free to edit it as you like - I welcome 
> your feedback.
> My goal is to provide you with as many details and indications as I can on 
> this issue that I am currently facing with a Client of mine on its Production 
> environment (we use Azure Databricks DBR 11.3 LTS).
> I was told by Sean Owen [[srowen (Sean Owen) 
> (github.com)|https://github.com/srowen]], who maintains the spark-xml maven 
> repository on GitHub [[https://github.com/srowen/spark-xml]] to post an issue 
> here because "This code has been ported to Apache Spark now anyway so won't 
> be updated here" (refer to his comment [here|#issuecomment-1744792958]).
> h1. Issue
> When I write a DataFrame into xml format via the spark-xml library either (1) 
> I get an error if empty string columns are in between non-string nested ones 
> or (2) if I put all string columns at the end then I get a wrong xml where 
> the content of string tags are misplaced into the following ones.
> h1. Code to reproduce the issue
> Please find below the end-to-end code snippet that results into the error
> h2. CASE (1): ERROR
> When empty strings are in between non-string nested ones, the write fails 
> with the following error.
> _Caused by: java.lang.IllegalArgumentException: Failed to convert value 
> MyDescription (class of class java.lang.String) in type 
> ArrayType(StructType(StructField(_ID,StringType,true),StructField(_Level,StringType,true)),true)
>  to XML._
> Please find attached the full trace of the error.
> {code:python}
> fake_file_df = spark \
>     .sql(
>         """SELECT
>             CAST(STRUCT('ItemId' AS `_Type`, '123' AS `_VALUE`) AS 
> STRUCT<_Type: STRING, _VALUE: STRING>) AS ItemID,
>             CAST(STRUCT('UPC' AS `_Type`, '123' AS `_VALUE`) AS STRUCT<_Type: 
> STRING, _VALUE: STRING>) AS UPC,
>             CAST('' AS STRING) AS _SerialNumberFlag,
>             CAST('MyDescription' AS STRING) AS Description,
>             CAST(ARRAY(STRUCT(NULL AS `_ID`, NULL AS `_Level`)) AS 
> ARRAY>) AS MerchandiseHierarchy,
>             CAST(ARRAY(STRUCT(NULL AS `_ValueTypeCode`, NULL AS `_VALUE`)) AS 
> ARRAY>) AS ItemPrice,
>             CAST('' AS STRING) AS Color,
>             CAST('' AS STRING) AS IntendedIndustry,
>             CAST(STRUCT(NULL AS `Name`) AS STRUCT) AS 
> Manufacturer,
>             CAST(STRUCT(NULL AS `Season`) AS STRUCT) AS 
> Marketing,
>             CAST(STRUCT(NULL AS `_Name`) AS STRUCT<_Name: STRING>) AS 
> BrandOwner,
>             CAST(ARRAY(STRUCT('Attribute1' AS `_Name`, 'Value1' AS `_VALUE`)) 
> AS ARRAY>) AS 
> ItemAttribute_culinary,
>             CAST(ARRAY(STRUCT(NULL AS `_Name`, ARRAY(ARRAY(STRUCT(NULL AS 
> `AttributeCode`, NULL AS `AttributeValue`))) AS `_VALUE`)) AS 
> ARRAY AttributeValue: STRING>) AS ItemAttribute_noculinary,
>             CAST(STRUCT(STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS 
> `Depth`, STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Height`, 
> STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Width`, STRUCT(NULL AS 
> `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Diameter`) AS STRUCT STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>, Height: 
> STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>, Width: STRUCT<_UnitOfMeasure: 
> STRING, _VALUE: STRING>, Diameter: STRUCT<_UnitOfMeasure: STRING, _VALUE: 
> STRING>>) AS ItemMeasurements,
>             CAST(STRUCT('GroupA' AS `TaxGroupID`, 'CodeA' AS `TaxExemptCode`, 
> '1' AS `TaxAmount`) AS STRUCT TaxAmount: STRING>) AS TaxInformation,
>             CAST('' AS STRING) AS ItemImageUrl,
>             CAST(ARRAY(ARRAY(STRUCT(NULL AS `_action`, NULL AS 
> `_franchiseeId`, NULL AS `_franchisee

[jira] [Resolved] (SPARK-45851) (Scala) Support different retry policies for connect client

2023-11-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45851.
--
Fix Version/s: 4.0.0
 Assignee: Alice Sayutina
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/43757

> (Scala) Support different retry policies for connect client
> ---
>
> Key: SPARK-45851
> URL: https://issues.apache.org/jira/browse/SPARK-45851
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Alice Sayutina
>Assignee: Alice Sayutina
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Support multiple retry policies defined at the same time. Each policy 
> determines which error types it can retry and how exactly.
> For instance, networking errors should generally be retried differently that
> remote resource being available.
> Relevant python ticket: SPARK-45733



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45945) Add a helper function for `parser`

2023-11-16 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-45945.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43826
[https://github.com/apache/spark/pull/43826]

> Add a helper function for `parser`
> --
>
> Key: SPARK-45945
> URL: https://issues.apache.org/jira/browse/SPARK-45945
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45953) Add Python 3.10 to Infra docker image

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45953:
--

Assignee: Apache Spark

> Add Python 3.10 to Infra docker image
> -
>
> Key: SPARK-45953
> URL: https://issues.apache.org/jira/browse/SPARK-45953
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45953) Add Python 3.10 to Infra docker image

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45953:
--

Assignee: (was: Apache Spark)

> Add Python 3.10 to Infra docker image
> -
>
> Key: SPARK-45953
> URL: https://issues.apache.org/jira/browse/SPARK-45953
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45953) Add Python 3.10 to Infra docker image

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45953:
--

Assignee: (was: Apache Spark)

> Add Python 3.10 to Infra docker image
> -
>
> Key: SPARK-45953
> URL: https://issues.apache.org/jira/browse/SPARK-45953
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45953) Add Python 3.10 to Infra docker image

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45953:
--

Assignee: Apache Spark

> Add Python 3.10 to Infra docker image
> -
>
> Key: SPARK-45953
> URL: https://issues.apache.org/jira/browse/SPARK-45953
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45953) Add Python 3.10 to Infra docker image

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45953:
---
Labels: pull-request-available  (was: )

> Add Python 3.10 to Infra docker image
> -
>
> Key: SPARK-45953
> URL: https://issues.apache.org/jira/browse/SPARK-45953
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45953) Add Python 3.10 to Infra docker image

2023-11-16 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-45953:
-

 Summary: Add Python 3.10 to Infra docker image
 Key: SPARK-45953
 URL: https://issues.apache.org/jira/browse/SPARK-45953
 Project: Spark
  Issue Type: Task
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45929) support grouping set operation in dataframe api

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45929:
--

Assignee: (was: Apache Spark)

> support grouping set operation in dataframe api
> ---
>
> Key: SPARK-45929
> URL: https://issues.apache.org/jira/browse/SPARK-45929
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: JacobZheng
>Priority: Major
>  Labels: pull-request-available
>
> I am using spark dataframe api for complex calculations. When I need to use 
> the grouping sets function, I can only convert the expression to sql via 
> analyzedPlan and then splice these sql into a complex sql to execute. In some 
> cases, this operation generates an extremely complex sql. executing this 
> complex sql, antlr4 continues to consume a large amount of memory, similar to 
> a memory leak scenario. If you can and rollup, cube function through the 
> dataframe api to calculate these operations will be much simpler.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45929) support grouping set operation in dataframe api

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45929:
--

Assignee: Apache Spark

> support grouping set operation in dataframe api
> ---
>
> Key: SPARK-45929
> URL: https://issues.apache.org/jira/browse/SPARK-45929
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: JacobZheng
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> I am using spark dataframe api for complex calculations. When I need to use 
> the grouping sets function, I can only convert the expression to sql via 
> analyzedPlan and then splice these sql into a complex sql to execute. In some 
> cases, this operation generates an extremely complex sql. executing this 
> complex sql, antlr4 continues to consume a large amount of memory, similar to 
> a memory leak scenario. If you can and rollup, cube function through the 
> dataframe api to calculate these operations will be much simpler.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45952) Use built-in math constant in math functions

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45952:
--

Assignee: (was: Apache Spark)

> Use built-in math constant in math functions 
> -
>
> Key: SPARK-45952
> URL: https://issues.apache.org/jira/browse/SPARK-45952
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45952) Use built-in math constant in math functions

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45952:
--

Assignee: Apache Spark

> Use built-in math constant in math functions 
> -
>
> Key: SPARK-45952
> URL: https://issues.apache.org/jira/browse/SPARK-45952
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45952) Use built-in math constant in math functions

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45952:
---
Labels: pull-request-available  (was: )

> Use built-in math constant in math functions 
> -
>
> Key: SPARK-45952
> URL: https://issues.apache.org/jira/browse/SPARK-45952
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45929) support grouping set operation in dataframe api

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45929:
--

Assignee: (was: Apache Spark)

> support grouping set operation in dataframe api
> ---
>
> Key: SPARK-45929
> URL: https://issues.apache.org/jira/browse/SPARK-45929
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: JacobZheng
>Priority: Major
>  Labels: pull-request-available
>
> I am using spark dataframe api for complex calculations. When I need to use 
> the grouping sets function, I can only convert the expression to sql via 
> analyzedPlan and then splice these sql into a complex sql to execute. In some 
> cases, this operation generates an extremely complex sql. executing this 
> complex sql, antlr4 continues to consume a large amount of memory, similar to 
> a memory leak scenario. If you can and rollup, cube function through the 
> dataframe api to calculate these operations will be much simpler.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45929) support grouping set operation in dataframe api

2023-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45929:
--

Assignee: Apache Spark

> support grouping set operation in dataframe api
> ---
>
> Key: SPARK-45929
> URL: https://issues.apache.org/jira/browse/SPARK-45929
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: JacobZheng
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> I am using spark dataframe api for complex calculations. When I need to use 
> the grouping sets function, I can only convert the expression to sql via 
> analyzedPlan and then splice these sql into a complex sql to execute. In some 
> cases, this operation generates an extremely complex sql. executing this 
> complex sql, antlr4 continues to consume a large amount of memory, similar to 
> a memory leak scenario. If you can and rollup, cube function through the 
> dataframe api to calculate these operations will be much simpler.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >