[jira] [Assigned] (SPARK-42898) Cast from string to date and date to string say timezone is needed, but it is not used
[ https://issues.apache.org/jira/browse/SPARK-42898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42898: Assignee: (was: Apache Spark) > Cast from string to date and date to string say timezone is needed, but it is > not used > -- > > Key: SPARK-42898 > URL: https://issues.apache.org/jira/browse/SPARK-42898 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Robert Joseph Evans >Priority: Major > > This is really minor but SPARK-35581 removed the need for a timezone when > casting from a `StringType` to a `DateType`, but the patch didn't update the > `needsTimeZone` function to indicate that it was not longer required. > Currently Casting from a DateType to a StringType also says that it needs the > timezone, but it only uses the `DateFormatter` with it's default parameters > that do not use the time zone at all. > I think this can be fixed with just a two line change. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42898) Cast from string to date and date to string say timezone is needed, but it is not used
[ https://issues.apache.org/jira/browse/SPARK-42898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703720#comment-17703720 ] Apache Spark commented on SPARK-42898: -- User 'revans2' has created a pull request for this issue: https://github.com/apache/spark/pull/40524 > Cast from string to date and date to string say timezone is needed, but it is > not used > -- > > Key: SPARK-42898 > URL: https://issues.apache.org/jira/browse/SPARK-42898 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Robert Joseph Evans >Priority: Major > > This is really minor but SPARK-35581 removed the need for a timezone when > casting from a `StringType` to a `DateType`, but the patch didn't update the > `needsTimeZone` function to indicate that it was not longer required. > Currently Casting from a DateType to a StringType also says that it needs the > timezone, but it only uses the `DateFormatter` with it's default parameters > that do not use the time zone at all. > I think this can be fixed with just a two line change. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42898) Cast from string to date and date to string say timezone is needed, but it is not used
[ https://issues.apache.org/jira/browse/SPARK-42898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42898: Assignee: Apache Spark > Cast from string to date and date to string say timezone is needed, but it is > not used > -- > > Key: SPARK-42898 > URL: https://issues.apache.org/jira/browse/SPARK-42898 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Robert Joseph Evans >Assignee: Apache Spark >Priority: Major > > This is really minor but SPARK-35581 removed the need for a timezone when > casting from a `StringType` to a `DateType`, but the patch didn't update the > `needsTimeZone` function to indicate that it was not longer required. > Currently Casting from a DateType to a StringType also says that it needs the > timezone, but it only uses the `DateFormatter` with it's default parameters > that do not use the time zone at all. > I think this can be fixed with just a two line change. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42897) Avoid evaluate more than once for the variables from the left side in the FullOuter SMJ condition
[ https://issues.apache.org/jira/browse/SPARK-42897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703686#comment-17703686 ] Apache Spark commented on SPARK-42897: -- User 'wankunde' has created a pull request for this issue: https://github.com/apache/spark/pull/40523 > Avoid evaluate more than once for the variables from the left side in the > FullOuter SMJ condition > - > > Key: SPARK-42897 > URL: https://issues.apache.org/jira/browse/SPARK-42897 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wan Kun >Priority: Minor > > Codegen issue for FullOuter SMJ, for example > {code} > val df1 = spark.range(5).select($"id".as("k1")) > val df2 = spark.range(10).select($"id".as("k2")) > df1.join(df2.hint("SHUFFLE_MERGE"), > $"k1" === $"k2" % 3 && $"k1" + 3 =!= $"k2" && $"k1" + 5 =!= $"k2", > "full_outer") > {code} > the join condition *$"k1" + 3 =!= $"k2" && $"k1" + 5 =!= $"k2"* both will > evaluate the variable *k1* and caused the codegen failed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42897) Avoid evaluate more than once for the variables from the left side in the FullOuter SMJ condition
[ https://issues.apache.org/jira/browse/SPARK-42897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42897: Assignee: Apache Spark > Avoid evaluate more than once for the variables from the left side in the > FullOuter SMJ condition > - > > Key: SPARK-42897 > URL: https://issues.apache.org/jira/browse/SPARK-42897 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wan Kun >Assignee: Apache Spark >Priority: Minor > > Codegen issue for FullOuter SMJ, for example > {code} > val df1 = spark.range(5).select($"id".as("k1")) > val df2 = spark.range(10).select($"id".as("k2")) > df1.join(df2.hint("SHUFFLE_MERGE"), > $"k1" === $"k2" % 3 && $"k1" + 3 =!= $"k2" && $"k1" + 5 =!= $"k2", > "full_outer") > {code} > the join condition *$"k1" + 3 =!= $"k2" && $"k1" + 5 =!= $"k2"* both will > evaluate the variable *k1* and caused the codegen failed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42897) Avoid evaluate more than once for the variables from the left side in the FullOuter SMJ condition
[ https://issues.apache.org/jira/browse/SPARK-42897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42897: Assignee: (was: Apache Spark) > Avoid evaluate more than once for the variables from the left side in the > FullOuter SMJ condition > - > > Key: SPARK-42897 > URL: https://issues.apache.org/jira/browse/SPARK-42897 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wan Kun >Priority: Minor > > Codegen issue for FullOuter SMJ, for example > {code} > val df1 = spark.range(5).select($"id".as("k1")) > val df2 = spark.range(10).select($"id".as("k2")) > df1.join(df2.hint("SHUFFLE_MERGE"), > $"k1" === $"k2" % 3 && $"k1" + 3 =!= $"k2" && $"k1" + 5 =!= $"k2", > "full_outer") > {code} > the join condition *$"k1" + 3 =!= $"k2" && $"k1" + 5 =!= $"k2"* both will > evaluate the variable *k1* and caused the codegen failed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42101) Wrap InMemoryTableScanExec with QueryStage
[ https://issues.apache.org/jira/browse/SPARK-42101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703677#comment-17703677 ] Apache Spark commented on SPARK-42101: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/40522 > Wrap InMemoryTableScanExec with QueryStage > -- > > Key: SPARK-42101 > URL: https://issues.apache.org/jira/browse/SPARK-42101 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Major > Fix For: 3.5.0 > > > The first access to the cached plan which is enable AQE is tricky. Currently, > we can not preverse it's output partitioning and ordering. > The whole query plan also missed lots of optimization in AQE framework. Wrap > InMemoryTableScanExec to query stage can resolve all these issues. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42896) Make `mapInPandas` / mapInArrow` support barrier mode execution
[ https://issues.apache.org/jira/browse/SPARK-42896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703637#comment-17703637 ] Apache Spark commented on SPARK-42896: -- User 'WeichenXu123' has created a pull request for this issue: https://github.com/apache/spark/pull/40520 > Make `mapInPandas` / mapInArrow` support barrier mode execution > --- > > Key: SPARK-42896 > URL: https://issues.apache.org/jira/browse/SPARK-42896 > Project: Spark > Issue Type: New Feature > Components: Pandas API on Spark, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Weichen Xu >Priority: Major > > Make `mapInPandas` / mapInArrow` support barrier mode execution -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42896) Make `mapInPandas` / mapInArrow` support barrier mode execution
[ https://issues.apache.org/jira/browse/SPARK-42896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42896: Assignee: (was: Apache Spark) > Make `mapInPandas` / mapInArrow` support barrier mode execution > --- > > Key: SPARK-42896 > URL: https://issues.apache.org/jira/browse/SPARK-42896 > Project: Spark > Issue Type: New Feature > Components: Pandas API on Spark, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Weichen Xu >Priority: Major > > Make `mapInPandas` / mapInArrow` support barrier mode execution -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42896) Make `mapInPandas` / mapInArrow` support barrier mode execution
[ https://issues.apache.org/jira/browse/SPARK-42896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42896: Assignee: Apache Spark > Make `mapInPandas` / mapInArrow` support barrier mode execution > --- > > Key: SPARK-42896 > URL: https://issues.apache.org/jira/browse/SPARK-42896 > Project: Spark > Issue Type: New Feature > Components: Pandas API on Spark, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Weichen Xu >Assignee: Apache Spark >Priority: Major > > Make `mapInPandas` / mapInArrow` support barrier mode execution -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42864) Review and fix issues in MLlib API docs
[ https://issues.apache.org/jira/browse/SPARK-42864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703615#comment-17703615 ] Apache Spark commented on SPARK-42864: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/40519 > Review and fix issues in MLlib API docs > --- > > Key: SPARK-42864 > URL: https://issues.apache.org/jira/browse/SPARK-42864 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42864) Review and fix issues in MLlib API docs
[ https://issues.apache.org/jira/browse/SPARK-42864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703614#comment-17703614 ] Apache Spark commented on SPARK-42864: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/40519 > Review and fix issues in MLlib API docs > --- > > Key: SPARK-42864 > URL: https://issues.apache.org/jira/browse/SPARK-42864 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42889) Implement cache, persist, unpersist, and storageLevel
[ https://issues.apache.org/jira/browse/SPARK-42889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703541#comment-17703541 ] Apache Spark commented on SPARK-42889: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40518 > Implement cache, persist, unpersist, and storageLevel > - > > Key: SPARK-42889 > URL: https://issues.apache.org/jira/browse/SPARK-42889 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42508) Extract the common .ml classes to `mllib-common`
[ https://issues.apache.org/jira/browse/SPARK-42508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703503#comment-17703503 ] Apache Spark commented on SPARK-42508: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/40517 > Extract the common .ml classes to `mllib-common` > > > Key: SPARK-42508 > URL: https://issues.apache.org/jira/browse/SPARK-42508 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42894) Implement cache, persist, unpersist, and storageLevel
[ https://issues.apache.org/jira/browse/SPARK-42894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42894: Assignee: Apache Spark > Implement cache, persist, unpersist, and storageLevel > - > > Key: SPARK-42894 > URL: https://issues.apache.org/jira/browse/SPARK-42894 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42894) Implement cache, persist, unpersist, and storageLevel
[ https://issues.apache.org/jira/browse/SPARK-42894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42894: Assignee: (was: Apache Spark) > Implement cache, persist, unpersist, and storageLevel > - > > Key: SPARK-42894 > URL: https://issues.apache.org/jira/browse/SPARK-42894 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42894) Implement cache, persist, unpersist, and storageLevel
[ https://issues.apache.org/jira/browse/SPARK-42894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703493#comment-17703493 ] Apache Spark commented on SPARK-42894: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40516 > Implement cache, persist, unpersist, and storageLevel > - > > Key: SPARK-42894 > URL: https://issues.apache.org/jira/browse/SPARK-42894 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42894) Implement cache, persist, unpersist, and storageLevel
[ https://issues.apache.org/jira/browse/SPARK-42894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703492#comment-17703492 ] Apache Spark commented on SPARK-42894: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40516 > Implement cache, persist, unpersist, and storageLevel > - > > Key: SPARK-42894 > URL: https://issues.apache.org/jira/browse/SPARK-42894 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42884) Add Ammonite REPL support
[ https://issues.apache.org/jira/browse/SPARK-42884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703459#comment-17703459 ] Apache Spark commented on SPARK-42884: -- User 'hvanhovell' has created a pull request for this issue: https://github.com/apache/spark/pull/40515 > Add Ammonite REPL support > - > > Key: SPARK-42884 > URL: https://issues.apache.org/jira/browse/SPARK-42884 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42884) Add Ammonite REPL support
[ https://issues.apache.org/jira/browse/SPARK-42884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42884: Assignee: Apache Spark (was: Herman van Hövell) > Add Ammonite REPL support > - > > Key: SPARK-42884 > URL: https://issues.apache.org/jira/browse/SPARK-42884 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42884) Add Ammonite REPL support
[ https://issues.apache.org/jira/browse/SPARK-42884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42884: Assignee: Herman van Hövell (was: Apache Spark) > Add Ammonite REPL support > - > > Key: SPARK-42884 > URL: https://issues.apache.org/jira/browse/SPARK-42884 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42884) Add Ammonite REPL support
[ https://issues.apache.org/jira/browse/SPARK-42884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703458#comment-17703458 ] Apache Spark commented on SPARK-42884: -- User 'hvanhovell' has created a pull request for this issue: https://github.com/apache/spark/pull/40515 > Add Ammonite REPL support > - > > Key: SPARK-42884 > URL: https://issues.apache.org/jira/browse/SPARK-42884 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41233) High-order function: array_prepend
[ https://issues.apache.org/jira/browse/SPARK-41233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703454#comment-17703454 ] Apache Spark commented on SPARK-41233: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40514 > High-order function: array_prepend > -- > > Key: SPARK-41233 > URL: https://issues.apache.org/jira/browse/SPARK-41233 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > Fix For: 3.5.0 > > > refer to > https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/api/snowflake.snowpark.functions.array_prepend.html > 1, about the data type validation: > In Snowflake’s array_append, array_prepend and array_insert functions, the > element data type does not need to match the data type of the existing > elements in the array. > While in Spark, we want to leverage the same data type validation as > array_remove. > 2, about the NULL handling > Currently, SparkSQL, SnowSQL and PostgreSQL deal with NULL values in > different ways. > Existing functions array_contains, array_position and array_remove in > SparkSQL handle NULL in this way, if the input array or/and element is NULL, > returns NULL. However, this behavior should be broken. > We should implement the NULL handling in array_prepend in this way: > 2.1, if the array is NULL, returns NULL; > 2.2 if the array is not NULL, the element is NULL, append the NULL value into > the array -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41233) High-order function: array_prepend
[ https://issues.apache.org/jira/browse/SPARK-41233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703455#comment-17703455 ] Apache Spark commented on SPARK-41233: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40514 > High-order function: array_prepend > -- > > Key: SPARK-41233 > URL: https://issues.apache.org/jira/browse/SPARK-41233 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > Fix For: 3.5.0 > > > refer to > https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/api/snowflake.snowpark.functions.array_prepend.html > 1, about the data type validation: > In Snowflake’s array_append, array_prepend and array_insert functions, the > element data type does not need to match the data type of the existing > elements in the array. > While in Spark, we want to leverage the same data type validation as > array_remove. > 2, about the NULL handling > Currently, SparkSQL, SnowSQL and PostgreSQL deal with NULL values in > different ways. > Existing functions array_contains, array_position and array_remove in > SparkSQL handle NULL in this way, if the input array or/and element is NULL, > returns NULL. However, this behavior should be broken. > We should implement the NULL handling in array_prepend in this way: > 2.1, if the array is NULL, returns NULL; > 2.2 if the array is not NULL, the element is NULL, append the NULL value into > the array -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42893) Block Arrow-optimized Python UDFs
[ https://issues.apache.org/jira/browse/SPARK-42893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42893: Assignee: Apache Spark > Block Arrow-optimized Python UDFs > - > > Key: SPARK-42893 > URL: https://issues.apache.org/jira/browse/SPARK-42893 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > > Considering the upcoming improvements on the result inconsistencies between > traditional Pickled Python UDFs and Arrow-optimized Python UDFs, we'd better > block the feature, otherwise, users who try out the feature will expect > behavior changes in the next release. > In addition, since Spark Connect Python Client(SCPC) has been introduced in > Spark 3.4, we'd better ensure the feature is ready in both vanilla PySpark > and SCPC at the same time for compatibility. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42893) Block Arrow-optimized Python UDFs
[ https://issues.apache.org/jira/browse/SPARK-42893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703446#comment-17703446 ] Apache Spark commented on SPARK-42893: -- User 'xinrong-meng' has created a pull request for this issue: https://github.com/apache/spark/pull/40513 > Block Arrow-optimized Python UDFs > - > > Key: SPARK-42893 > URL: https://issues.apache.org/jira/browse/SPARK-42893 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Considering the upcoming improvements on the result inconsistencies between > traditional Pickled Python UDFs and Arrow-optimized Python UDFs, we'd better > block the feature, otherwise, users who try out the feature will expect > behavior changes in the next release. > In addition, since Spark Connect Python Client(SCPC) has been introduced in > Spark 3.4, we'd better ensure the feature is ready in both vanilla PySpark > and SCPC at the same time for compatibility. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42893) Block Arrow-optimized Python UDFs
[ https://issues.apache.org/jira/browse/SPARK-42893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703447#comment-17703447 ] Apache Spark commented on SPARK-42893: -- User 'xinrong-meng' has created a pull request for this issue: https://github.com/apache/spark/pull/40513 > Block Arrow-optimized Python UDFs > - > > Key: SPARK-42893 > URL: https://issues.apache.org/jira/browse/SPARK-42893 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Considering the upcoming improvements on the result inconsistencies between > traditional Pickled Python UDFs and Arrow-optimized Python UDFs, we'd better > block the feature, otherwise, users who try out the feature will expect > behavior changes in the next release. > In addition, since Spark Connect Python Client(SCPC) has been introduced in > Spark 3.4, we'd better ensure the feature is ready in both vanilla PySpark > and SCPC at the same time for compatibility. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42893) Block Arrow-optimized Python UDFs
[ https://issues.apache.org/jira/browse/SPARK-42893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42893: Assignee: (was: Apache Spark) > Block Arrow-optimized Python UDFs > - > > Key: SPARK-42893 > URL: https://issues.apache.org/jira/browse/SPARK-42893 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Considering the upcoming improvements on the result inconsistencies between > traditional Pickled Python UDFs and Arrow-optimized Python UDFs, we'd better > block the feature, otherwise, users who try out the feature will expect > behavior changes in the next release. > In addition, since Spark Connect Python Client(SCPC) has been introduced in > Spark 3.4, we'd better ensure the feature is ready in both vanilla PySpark > and SCPC at the same time for compatibility. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42892) Move sameType and relevant methods out of DataType
[ https://issues.apache.org/jira/browse/SPARK-42892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42892: Assignee: Rui Wang (was: Apache Spark) > Move sameType and relevant methods out of DataType > -- > > Key: SPARK-42892 > URL: https://issues.apache.org/jira/browse/SPARK-42892 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42892) Move sameType and relevant methods out of DataType
[ https://issues.apache.org/jira/browse/SPARK-42892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42892: Assignee: Apache Spark (was: Rui Wang) > Move sameType and relevant methods out of DataType > -- > > Key: SPARK-42892 > URL: https://issues.apache.org/jira/browse/SPARK-42892 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Rui Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42892) Move sameType and relevant methods out of DataType
[ https://issues.apache.org/jira/browse/SPARK-42892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703436#comment-17703436 ] Apache Spark commented on SPARK-42892: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/40512 > Move sameType and relevant methods out of DataType > -- > > Key: SPARK-42892 > URL: https://issues.apache.org/jira/browse/SPARK-42892 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42891) Implement CoGrouped Map API
[ https://issues.apache.org/jira/browse/SPARK-42891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42891: Assignee: (was: Apache Spark) > Implement CoGrouped Map API > --- > > Key: SPARK-42891 > URL: https://issues.apache.org/jira/browse/SPARK-42891 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Implement CoGrouped Map API. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42891) Implement CoGrouped Map API
[ https://issues.apache.org/jira/browse/SPARK-42891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703429#comment-17703429 ] Apache Spark commented on SPARK-42891: -- User 'xinrong-meng' has created a pull request for this issue: https://github.com/apache/spark/pull/40487 > Implement CoGrouped Map API > --- > > Key: SPARK-42891 > URL: https://issues.apache.org/jira/browse/SPARK-42891 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Implement CoGrouped Map API. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42891) Implement CoGrouped Map API
[ https://issues.apache.org/jira/browse/SPARK-42891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703428#comment-17703428 ] Apache Spark commented on SPARK-42891: -- User 'xinrong-meng' has created a pull request for this issue: https://github.com/apache/spark/pull/40487 > Implement CoGrouped Map API > --- > > Key: SPARK-42891 > URL: https://issues.apache.org/jira/browse/SPARK-42891 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Implement CoGrouped Map API. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42891) Implement CoGrouped Map API
[ https://issues.apache.org/jira/browse/SPARK-42891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42891: Assignee: Apache Spark > Implement CoGrouped Map API > --- > > Key: SPARK-42891 > URL: https://issues.apache.org/jira/browse/SPARK-42891 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > > Implement CoGrouped Map API. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42888) Upgrade GCS connector from 2.2.7 to 2.2.11.
[ https://issues.apache.org/jira/browse/SPARK-42888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703401#comment-17703401 ] Apache Spark commented on SPARK-42888: -- User 'cnauroth' has created a pull request for this issue: https://github.com/apache/spark/pull/40511 > Upgrade GCS connector from 2.2.7 to 2.2.11. > --- > > Key: SPARK-42888 > URL: https://issues.apache.org/jira/browse/SPARK-42888 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.1 >Reporter: Chris Nauroth >Priority: Minor > > Upgrade the [GCS > Connector|https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/v2.2.11/gcs] > bundled in the Spark distro from version 2.2.7 to 2.2.11. The new release > contains multiple bug fixes and enhancements discussed in the [Release > Notes|https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.11/gcs/CHANGES.md]. > Notable changes include: > * Improved socket timeout handling. > * Trace logging capabilities. > * Fix bug that prevented usage of GCS as a [Hadoop Credential > Provider|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html]. > * Dependency upgrades. > * Support OAuth2 based client authentication. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42888) Upgrade GCS connector from 2.2.7 to 2.2.11.
[ https://issues.apache.org/jira/browse/SPARK-42888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42888: Assignee: (was: Apache Spark) > Upgrade GCS connector from 2.2.7 to 2.2.11. > --- > > Key: SPARK-42888 > URL: https://issues.apache.org/jira/browse/SPARK-42888 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.1 >Reporter: Chris Nauroth >Priority: Minor > > Upgrade the [GCS > Connector|https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/v2.2.11/gcs] > bundled in the Spark distro from version 2.2.7 to 2.2.11. The new release > contains multiple bug fixes and enhancements discussed in the [Release > Notes|https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.11/gcs/CHANGES.md]. > Notable changes include: > * Improved socket timeout handling. > * Trace logging capabilities. > * Fix bug that prevented usage of GCS as a [Hadoop Credential > Provider|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html]. > * Dependency upgrades. > * Support OAuth2 based client authentication. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42888) Upgrade GCS connector from 2.2.7 to 2.2.11.
[ https://issues.apache.org/jira/browse/SPARK-42888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42888: Assignee: Apache Spark > Upgrade GCS connector from 2.2.7 to 2.2.11. > --- > > Key: SPARK-42888 > URL: https://issues.apache.org/jira/browse/SPARK-42888 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.1 >Reporter: Chris Nauroth >Assignee: Apache Spark >Priority: Minor > > Upgrade the [GCS > Connector|https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/v2.2.11/gcs] > bundled in the Spark distro from version 2.2.7 to 2.2.11. The new release > contains multiple bug fixes and enhancements discussed in the [Release > Notes|https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.11/gcs/CHANGES.md]. > Notable changes include: > * Improved socket timeout handling. > * Trace logging capabilities. > * Fix bug that prevented usage of GCS as a [Hadoop Credential > Provider|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html]. > * Dependency upgrades. > * Support OAuth2 based client authentication. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42888) Upgrade GCS connector from 2.2.7 to 2.2.11.
[ https://issues.apache.org/jira/browse/SPARK-42888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703400#comment-17703400 ] Apache Spark commented on SPARK-42888: -- User 'cnauroth' has created a pull request for this issue: https://github.com/apache/spark/pull/40511 > Upgrade GCS connector from 2.2.7 to 2.2.11. > --- > > Key: SPARK-42888 > URL: https://issues.apache.org/jira/browse/SPARK-42888 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.1 >Reporter: Chris Nauroth >Priority: Minor > > Upgrade the [GCS > Connector|https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/v2.2.11/gcs] > bundled in the Spark distro from version 2.2.7 to 2.2.11. The new release > contains multiple bug fixes and enhancements discussed in the [Release > Notes|https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.11/gcs/CHANGES.md]. > Notable changes include: > * Improved socket timeout handling. > * Trace logging capabilities. > * Fix bug that prevented usage of GCS as a [Hadoop Credential > Provider|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html]. > * Dependency upgrades. > * Support OAuth2 based client authentication. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42889) Implement cache, persist, unpersist, and storageLevel
[ https://issues.apache.org/jira/browse/SPARK-42889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703392#comment-17703392 ] Apache Spark commented on SPARK-42889: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40510 > Implement cache, persist, unpersist, and storageLevel > - > > Key: SPARK-42889 > URL: https://issues.apache.org/jira/browse/SPARK-42889 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42889) Implement cache, persist, unpersist, and storageLevel
[ https://issues.apache.org/jira/browse/SPARK-42889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42889: Assignee: (was: Apache Spark) > Implement cache, persist, unpersist, and storageLevel > - > > Key: SPARK-42889 > URL: https://issues.apache.org/jira/browse/SPARK-42889 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42889) Implement cache, persist, unpersist, and storageLevel
[ https://issues.apache.org/jira/browse/SPARK-42889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42889: Assignee: Apache Spark > Implement cache, persist, unpersist, and storageLevel > - > > Key: SPARK-42889 > URL: https://issues.apache.org/jira/browse/SPARK-42889 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42838) Assign a name to the error class _LEGACY_ERROR_TEMP_2000
[ https://issues.apache.org/jira/browse/SPARK-42838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42838: Assignee: (was: Apache Spark) > Assign a name to the error class _LEGACY_ERROR_TEMP_2000 > > > Key: SPARK-42838 > URL: https://issues.apache.org/jira/browse/SPARK-42838 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42838) Assign a name to the error class _LEGACY_ERROR_TEMP_2000
[ https://issues.apache.org/jira/browse/SPARK-42838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703286#comment-17703286 ] Apache Spark commented on SPARK-42838: -- User 'unical1988' has created a pull request for this issue: https://github.com/apache/spark/pull/40468 > Assign a name to the error class _LEGACY_ERROR_TEMP_2000 > > > Key: SPARK-42838 > URL: https://issues.apache.org/jira/browse/SPARK-42838 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42838) Assign a name to the error class _LEGACY_ERROR_TEMP_2000
[ https://issues.apache.org/jira/browse/SPARK-42838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42838: Assignee: Apache Spark > Assign a name to the error class _LEGACY_ERROR_TEMP_2000 > > > Key: SPARK-42838 > URL: https://issues.apache.org/jira/browse/SPARK-42838 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42885) Upgrade `kubernetes-client` to 6.5.1
[ https://issues.apache.org/jira/browse/SPARK-42885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42885: Assignee: (was: Apache Spark) > Upgrade `kubernetes-client` to 6.5.1 > > > Key: SPARK-42885 > URL: https://issues.apache.org/jira/browse/SPARK-42885 > Project: Spark > Issue Type: Bug > Components: Build, Kubernetes >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42885) Upgrade `kubernetes-client` to 6.5.1
[ https://issues.apache.org/jira/browse/SPARK-42885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42885: Assignee: Apache Spark > Upgrade `kubernetes-client` to 6.5.1 > > > Key: SPARK-42885 > URL: https://issues.apache.org/jira/browse/SPARK-42885 > Project: Spark > Issue Type: Bug > Components: Build, Kubernetes >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42885) Upgrade `kubernetes-client` to 6.5.1
[ https://issues.apache.org/jira/browse/SPARK-42885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703272#comment-17703272 ] Apache Spark commented on SPARK-42885: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/40509 > Upgrade `kubernetes-client` to 6.5.1 > > > Key: SPARK-42885 > URL: https://issues.apache.org/jira/browse/SPARK-42885 > Project: Spark > Issue Type: Bug > Components: Build, Kubernetes >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42662) Add `_distributed_sequence_id` for distributed-sequence index.
[ https://issues.apache.org/jira/browse/SPARK-42662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703112#comment-17703112 ] Apache Spark commented on SPARK-42662: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/40507 > Add `_distributed_sequence_id` for distributed-sequence index. > -- > > Key: SPARK-42662 > URL: https://issues.apache.org/jira/browse/SPARK-42662 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark, PySpark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > Turn `withSequenceColumn` into PySpark internal API to support the > distributed-sequence index of the pandas API on Spark in Spark Connect as > well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42881) get_json_object Codegen Support
[ https://issues.apache.org/jira/browse/SPARK-42881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703078#comment-17703078 ] Apache Spark commented on SPARK-42881: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/40506 > get_json_object Codegen Support > --- > > Key: SPARK-42881 > URL: https://issues.apache.org/jira/browse/SPARK-42881 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42881) get_json_object Codegen Support
[ https://issues.apache.org/jira/browse/SPARK-42881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703077#comment-17703077 ] Apache Spark commented on SPARK-42881: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/40506 > get_json_object Codegen Support > --- > > Key: SPARK-42881 > URL: https://issues.apache.org/jira/browse/SPARK-42881 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42881) get_json_object Codegen Support
[ https://issues.apache.org/jira/browse/SPARK-42881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42881: Assignee: (was: Apache Spark) > get_json_object Codegen Support > --- > > Key: SPARK-42881 > URL: https://issues.apache.org/jira/browse/SPARK-42881 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42881) get_json_object Codegen Support
[ https://issues.apache.org/jira/browse/SPARK-42881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42881: Assignee: Apache Spark > get_json_object Codegen Support > --- > > Key: SPARK-42881 > URL: https://issues.apache.org/jira/browse/SPARK-42881 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42880) Improve the yarn document for lo4j2 configuration
[ https://issues.apache.org/jira/browse/SPARK-42880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42880: Assignee: Apache Spark > Improve the yarn document for lo4j2 configuration > - > > Key: SPARK-42880 > URL: https://issues.apache.org/jira/browse/SPARK-42880 > Project: Spark > Issue Type: Improvement > Components: Spark Core, YARN >Affects Versions: 3.3.2 >Reporter: Zhifang Li >Assignee: Apache Spark >Priority: Minor > > Since Spark3.3 has changed log4j1 to log4j2, some documents should also be > updated. > For example, docs/running-on-yarn.md still uses log4j1 syntax as follows. > `log4j.appender.file_appender.File=${spark.yarn.app.container.log.dir}/spark.log`. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42880) Improve the yarn document for lo4j2 configuration
[ https://issues.apache.org/jira/browse/SPARK-42880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42880: Assignee: (was: Apache Spark) > Improve the yarn document for lo4j2 configuration > - > > Key: SPARK-42880 > URL: https://issues.apache.org/jira/browse/SPARK-42880 > Project: Spark > Issue Type: Improvement > Components: Spark Core, YARN >Affects Versions: 3.3.2 >Reporter: Zhifang Li >Priority: Minor > > Since Spark3.3 has changed log4j1 to log4j2, some documents should also be > updated. > For example, docs/running-on-yarn.md still uses log4j1 syntax as follows. > `log4j.appender.file_appender.File=${spark.yarn.app.container.log.dir}/spark.log`. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42880) Improve the yarn document for lo4j2 configuration
[ https://issues.apache.org/jira/browse/SPARK-42880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703071#comment-17703071 ] Apache Spark commented on SPARK-42880: -- User 'frankliee' has created a pull request for this issue: https://github.com/apache/spark/pull/40504 > Improve the yarn document for lo4j2 configuration > - > > Key: SPARK-42880 > URL: https://issues.apache.org/jira/browse/SPARK-42880 > Project: Spark > Issue Type: Improvement > Components: Spark Core, YARN >Affects Versions: 3.3.2 >Reporter: Zhifang Li >Priority: Minor > > Since Spark3.3 has changed log4j1 to log4j2, some documents should also be > updated. > For example, docs/running-on-yarn.md still uses log4j1 syntax as follows. > `log4j.appender.file_appender.File=${spark.yarn.app.container.log.dir}/spark.log`. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42878) Named Table should support options
[ https://issues.apache.org/jira/browse/SPARK-42878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703050#comment-17703050 ] Apache Spark commented on SPARK-42878: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/40498 > Named Table should support options > -- > > Key: SPARK-42878 > URL: https://issues.apache.org/jira/browse/SPARK-42878 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42878) Named Table should support options
[ https://issues.apache.org/jira/browse/SPARK-42878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42878: Assignee: Apache Spark (was: Rui Wang) > Named Table should support options > -- > > Key: SPARK-42878 > URL: https://issues.apache.org/jira/browse/SPARK-42878 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42878) Named Table should support options
[ https://issues.apache.org/jira/browse/SPARK-42878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42878: Assignee: Rui Wang (was: Apache Spark) > Named Table should support options > -- > > Key: SPARK-42878 > URL: https://issues.apache.org/jira/browse/SPARK-42878 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42878) Named Table should support options
[ https://issues.apache.org/jira/browse/SPARK-42878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703048#comment-17703048 ] Apache Spark commented on SPARK-42878: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/40498 > Named Table should support options > -- > > Key: SPARK-42878 > URL: https://issues.apache.org/jira/browse/SPARK-42878 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42830) Link skipped stages on Spark UI
[ https://issues.apache.org/jira/browse/SPARK-42830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42830: Assignee: (was: Apache Spark) > Link skipped stages on Spark UI > --- > > Key: SPARK-42830 > URL: https://issues.apache.org/jira/browse/SPARK-42830 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.3.2 >Reporter: Yian Liou >Priority: Major > > Add a link to the skipped Spark stages so that its easier to find the > execution details on the UI. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42830) Link skipped stages on Spark UI
[ https://issues.apache.org/jira/browse/SPARK-42830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42830: Assignee: Apache Spark > Link skipped stages on Spark UI > --- > > Key: SPARK-42830 > URL: https://issues.apache.org/jira/browse/SPARK-42830 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.3.2 >Reporter: Yian Liou >Assignee: Apache Spark >Priority: Major > > Add a link to the skipped Spark stages so that its easier to find the > execution details on the UI. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42830) Link skipped stages on Spark UI
[ https://issues.apache.org/jira/browse/SPARK-42830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703046#comment-17703046 ] Apache Spark commented on SPARK-42830: -- User 'yliou' has created a pull request for this issue: https://github.com/apache/spark/pull/40503 > Link skipped stages on Spark UI > --- > > Key: SPARK-42830 > URL: https://issues.apache.org/jira/browse/SPARK-42830 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.3.2 >Reporter: Yian Liou >Priority: Major > > Add a link to the skipped Spark stages so that its easier to find the > execution details on the UI. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42829) Added Identifier to the cached RDD operator on the Stages page
[ https://issues.apache.org/jira/browse/SPARK-42829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42829: Assignee: Apache Spark > Added Identifier to the cached RDD operator on the Stages page > --- > > Key: SPARK-42829 > URL: https://issues.apache.org/jira/browse/SPARK-42829 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.3.2 >Reporter: Yian Liou >Assignee: Apache Spark >Priority: Major > Attachments: Screen Shot 2023-03-20 at 3.55.40 PM.png > > > On the stages page in the Web UI, there is no distinction for which cached > RDD is being executed in a particular stage. This Jira aims to add an repeat > identifier to distinguish which cached RDD is being executed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42829) Added Identifier to the cached RDD operator on the Stages page
[ https://issues.apache.org/jira/browse/SPARK-42829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703039#comment-17703039 ] Apache Spark commented on SPARK-42829: -- User 'yliou' has created a pull request for this issue: https://github.com/apache/spark/pull/40502 > Added Identifier to the cached RDD operator on the Stages page > --- > > Key: SPARK-42829 > URL: https://issues.apache.org/jira/browse/SPARK-42829 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.3.2 >Reporter: Yian Liou >Priority: Major > Attachments: Screen Shot 2023-03-20 at 3.55.40 PM.png > > > On the stages page in the Web UI, there is no distinction for which cached > RDD is being executed in a particular stage. This Jira aims to add an repeat > identifier to distinguish which cached RDD is being executed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42829) Added Identifier to the cached RDD operator on the Stages page
[ https://issues.apache.org/jira/browse/SPARK-42829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42829: Assignee: (was: Apache Spark) > Added Identifier to the cached RDD operator on the Stages page > --- > > Key: SPARK-42829 > URL: https://issues.apache.org/jira/browse/SPARK-42829 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.3.2 >Reporter: Yian Liou >Priority: Major > Attachments: Screen Shot 2023-03-20 at 3.55.40 PM.png > > > On the stages page in the Web UI, there is no distinction for which cached > RDD is being executed in a particular stage. This Jira aims to add an repeat > identifier to distinguish which cached RDD is being executed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42864) Review and fix issues in MLlib API docs
[ https://issues.apache.org/jira/browse/SPARK-42864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703026#comment-17703026 ] Apache Spark commented on SPARK-42864: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/40501 > Review and fix issues in MLlib API docs > --- > > Key: SPARK-42864 > URL: https://issues.apache.org/jira/browse/SPARK-42864 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42864) Review and fix issues in MLlib API docs
[ https://issues.apache.org/jira/browse/SPARK-42864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703017#comment-17703017 ] Apache Spark commented on SPARK-42864: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/40500 > Review and fix issues in MLlib API docs > --- > > Key: SPARK-42864 > URL: https://issues.apache.org/jira/browse/SPARK-42864 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42864) Review and fix issues in MLlib API docs
[ https://issues.apache.org/jira/browse/SPARK-42864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703018#comment-17703018 ] Apache Spark commented on SPARK-42864: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/40500 > Review and fix issues in MLlib API docs > --- > > Key: SPARK-42864 > URL: https://issues.apache.org/jira/browse/SPARK-42864 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42864) Review and fix issues in MLlib API docs
[ https://issues.apache.org/jira/browse/SPARK-42864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42864: Assignee: Apache Spark (was: Ruifeng Zheng) > Review and fix issues in MLlib API docs > --- > > Key: SPARK-42864 > URL: https://issues.apache.org/jira/browse/SPARK-42864 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42864) Review and fix issues in MLlib API docs
[ https://issues.apache.org/jira/browse/SPARK-42864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42864: Assignee: Ruifeng Zheng (was: Apache Spark) > Review and fix issues in MLlib API docs > --- > > Key: SPARK-42864 > URL: https://issues.apache.org/jira/browse/SPARK-42864 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42876) DataType's physicalDataType should be private[sql]
[ https://issues.apache.org/jira/browse/SPARK-42876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42876: Assignee: Rui Wang (was: Apache Spark) > DataType's physicalDataType should be private[sql] > -- > > Key: SPARK-42876 > URL: https://issues.apache.org/jira/browse/SPARK-42876 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42876) DataType's physicalDataType should be private[sql]
[ https://issues.apache.org/jira/browse/SPARK-42876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42876: Assignee: Apache Spark (was: Rui Wang) > DataType's physicalDataType should be private[sql] > -- > > Key: SPARK-42876 > URL: https://issues.apache.org/jira/browse/SPARK-42876 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42876) DataType's physicalDataType should be private[sql]
[ https://issues.apache.org/jira/browse/SPARK-42876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702978#comment-17702978 ] Apache Spark commented on SPARK-42876: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/40499 > DataType's physicalDataType should be private[sql] > -- > > Key: SPARK-42876 > URL: https://issues.apache.org/jira/browse/SPARK-42876 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42875) Fix toPandas to handle timezone and map types properly.
[ https://issues.apache.org/jira/browse/SPARK-42875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42875: Assignee: Apache Spark > Fix toPandas to handle timezone and map types properly. > --- > > Key: SPARK-42875 > URL: https://issues.apache.org/jira/browse/SPARK-42875 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42875) Fix toPandas to handle timezone and map types properly.
[ https://issues.apache.org/jira/browse/SPARK-42875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702922#comment-17702922 ] Apache Spark commented on SPARK-42875: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40497 > Fix toPandas to handle timezone and map types properly. > --- > > Key: SPARK-42875 > URL: https://issues.apache.org/jira/browse/SPARK-42875 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42875) Fix toPandas to handle timezone and map types properly.
[ https://issues.apache.org/jira/browse/SPARK-42875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702921#comment-17702921 ] Apache Spark commented on SPARK-42875: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40497 > Fix toPandas to handle timezone and map types properly. > --- > > Key: SPARK-42875 > URL: https://issues.apache.org/jira/browse/SPARK-42875 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42875) Fix toPandas to handle timezone and map types properly.
[ https://issues.apache.org/jira/browse/SPARK-42875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42875: Assignee: (was: Apache Spark) > Fix toPandas to handle timezone and map types properly. > --- > > Key: SPARK-42875 > URL: https://issues.apache.org/jira/browse/SPARK-42875 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42874) Enable new golden file test framework for analysis for all input files
[ https://issues.apache.org/jira/browse/SPARK-42874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702888#comment-17702888 ] Apache Spark commented on SPARK-42874: -- User 'dtenedor' has created a pull request for this issue: https://github.com/apache/spark/pull/40496 > Enable new golden file test framework for analysis for all input files > -- > > Key: SPARK-42874 > URL: https://issues.apache.org/jira/browse/SPARK-42874 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Daniel >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42874) Enable new golden file test framework for analysis for all input files
[ https://issues.apache.org/jira/browse/SPARK-42874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42874: Assignee: Apache Spark > Enable new golden file test framework for analysis for all input files > -- > > Key: SPARK-42874 > URL: https://issues.apache.org/jira/browse/SPARK-42874 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Daniel >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42874) Enable new golden file test framework for analysis for all input files
[ https://issues.apache.org/jira/browse/SPARK-42874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42874: Assignee: (was: Apache Spark) > Enable new golden file test framework for analysis for all input files > -- > > Key: SPARK-42874 > URL: https://issues.apache.org/jira/browse/SPARK-42874 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Daniel >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42839) Assign a name to the error class _LEGACY_ERROR_TEMP_2003
[ https://issues.apache.org/jira/browse/SPARK-42839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42839: Assignee: (was: Apache Spark) > Assign a name to the error class _LEGACY_ERROR_TEMP_2003 > > > Key: SPARK-42839 > URL: https://issues.apache.org/jira/browse/SPARK-42839 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > Attachments: Screenshot from 2023-03-21 00-20-11.png > > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2003* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42839) Assign a name to the error class _LEGACY_ERROR_TEMP_2003
[ https://issues.apache.org/jira/browse/SPARK-42839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42839: Assignee: Apache Spark > Assign a name to the error class _LEGACY_ERROR_TEMP_2003 > > > Key: SPARK-42839 > URL: https://issues.apache.org/jira/browse/SPARK-42839 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Minor > Labels: starter > Attachments: Screenshot from 2023-03-21 00-20-11.png > > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2003* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42839) Assign a name to the error class _LEGACY_ERROR_TEMP_2003
[ https://issues.apache.org/jira/browse/SPARK-42839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702841#comment-17702841 ] Apache Spark commented on SPARK-42839: -- User 'ruilibuaa' has created a pull request for this issue: https://github.com/apache/spark/pull/40493 > Assign a name to the error class _LEGACY_ERROR_TEMP_2003 > > > Key: SPARK-42839 > URL: https://issues.apache.org/jira/browse/SPARK-42839 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > Attachments: Screenshot from 2023-03-21 00-20-11.png > > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2003* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42791) Create golden file test framework for analysis
[ https://issues.apache.org/jira/browse/SPARK-42791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702811#comment-17702811 ] Apache Spark commented on SPARK-42791: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40492 > Create golden file test framework for analysis > -- > > Key: SPARK-42791 > URL: https://issues.apache.org/jira/browse/SPARK-42791 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Fix For: 3.5.0 > > > Here we track the work to add new golden file test support for the Spark > analyzer. Each golden file can contain a list of SQL queries followed by the > string representations of their analyzed logical plans. > > This can be similar to Spark's existing `SQLQueryTestSuite` [1], but stopping > after analysis and listing analyzed plans as the results instead of fully > executing queries end-to-end. As another example, ZetaSQL has analyzer-based > golden file testing like this as well [2]. > > This way, any changes to analysis will show up as test diffs, which are easy > to spot in review and also easy to update automatically. This could help the > community together maintain the qualify of Apache Spark's query analysis. > > [1] > [https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala] > > [2] > [https://github.com/google/zetasql/blob/master/zetasql/analyzer/testdata/limit.test]. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41006) ConfigMap has the same name when launching two pods on the same namespace
[ https://issues.apache.org/jira/browse/SPARK-41006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41006: Assignee: Apache Spark > ConfigMap has the same name when launching two pods on the same namespace > - > > Key: SPARK-41006 > URL: https://issues.apache.org/jira/browse/SPARK-41006 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.1.0, 3.2.0, 3.3.0 >Reporter: Eric >Assignee: Apache Spark >Priority: Minor > > If we use the Spark Launcher to launch our spark apps in k8s: > {code:java} > val sparkLauncher = new InProcessLauncher() > .setMaster(k8sMaster) > .setDeployMode(deployMode) > .setAppName(appName) > .setVerbose(true) > sparkLauncher.startApplication(new SparkAppHandle.Listener { ...{code} > We have an issue when we launch another spark driver in the same namespace > where other spark app was running: > {code:java} > kp -n audit-exporter-eee5073aac -w > NAME READY STATUS RESTARTS AGE > audit-exporter-71489e843d8085c0-driver 1/1 Running 0 > 9m54s > audit-exporter-7e6b8b843d80b9e6-exec-1 1/1 Running 0 > 9m40s > data-io-120204843d899567-driver 0/1 Terminating 0 1s > data-io-120204843d899567-driver 0/1 Terminating 0 2s > data-io-120204843d899567-driver 0/1 Terminating 0 3s > data-io-120204843d899567-driver 0/1 Terminating 0 > 3s{code} > The error is: > {code:java} > {"time":"2022-11-03T12:49:45.626Z","lvl":"WARN","logger":"o.a.s.l.InProcessAppHandle","thread":"spark-app-38: > 'data-io'","msg":"Application failed with > exception.","stack_trace":"io.fabric8.kubernetes.client.KubernetesClientException: > Failure executing: PUT at: > https://kubernetes.default/api/v1/namespaces/audit-exporter-eee5073aac/configmaps/spark-drv-d19c37843d80350c-conf-map. > Message: ConfigMap \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: > Forbidden: field is immutable when `immutable` is set. Received status: > Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=data, message=Forbidden: > field is immutable when `immutable` is set, reason=FieldValueForbidden, > additionalProperties={})], group=null, kind=ConfigMap, > name=spark-drv-d19c37843d80350c-conf-map, retryAfterSeconds=null, uid=null, > additionalProperties={}), kind=Status, message=ConfigMap > \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: Forbidden: field is > immutable when `immutable` is set, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}).\n\tat > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:682)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:661)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:555)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:518)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:342)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:322)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:649)\n\tat > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:195)\n\tat > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation$$Lambda$5360/00.apply(Unknown > Source)\n\tat > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:200)\n\tat > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:141)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation$$Lambda$4618/00.apply(Unknown > Source)\n\tat > io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.replace(CreateOrReplaceHelper.java:69)\n\tat > > io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.createOrReplace(CreateOrReplaceHelper.java:61)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:318)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:83)\n\tat > > io.fabric8.kubernetes.client.dsl.internal.NamespaceVisitFromServerGetWatchDeleteR
[jira] [Commented] (SPARK-41006) ConfigMap has the same name when launching two pods on the same namespace
[ https://issues.apache.org/jira/browse/SPARK-41006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702710#comment-17702710 ] Apache Spark commented on SPARK-41006: -- User 'DHKold' has created a pull request for this issue: https://github.com/apache/spark/pull/40491 > ConfigMap has the same name when launching two pods on the same namespace > - > > Key: SPARK-41006 > URL: https://issues.apache.org/jira/browse/SPARK-41006 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.1.0, 3.2.0, 3.3.0 >Reporter: Eric >Priority: Minor > > If we use the Spark Launcher to launch our spark apps in k8s: > {code:java} > val sparkLauncher = new InProcessLauncher() > .setMaster(k8sMaster) > .setDeployMode(deployMode) > .setAppName(appName) > .setVerbose(true) > sparkLauncher.startApplication(new SparkAppHandle.Listener { ...{code} > We have an issue when we launch another spark driver in the same namespace > where other spark app was running: > {code:java} > kp -n audit-exporter-eee5073aac -w > NAME READY STATUS RESTARTS AGE > audit-exporter-71489e843d8085c0-driver 1/1 Running 0 > 9m54s > audit-exporter-7e6b8b843d80b9e6-exec-1 1/1 Running 0 > 9m40s > data-io-120204843d899567-driver 0/1 Terminating 0 1s > data-io-120204843d899567-driver 0/1 Terminating 0 2s > data-io-120204843d899567-driver 0/1 Terminating 0 3s > data-io-120204843d899567-driver 0/1 Terminating 0 > 3s{code} > The error is: > {code:java} > {"time":"2022-11-03T12:49:45.626Z","lvl":"WARN","logger":"o.a.s.l.InProcessAppHandle","thread":"spark-app-38: > 'data-io'","msg":"Application failed with > exception.","stack_trace":"io.fabric8.kubernetes.client.KubernetesClientException: > Failure executing: PUT at: > https://kubernetes.default/api/v1/namespaces/audit-exporter-eee5073aac/configmaps/spark-drv-d19c37843d80350c-conf-map. > Message: ConfigMap \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: > Forbidden: field is immutable when `immutable` is set. Received status: > Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=data, message=Forbidden: > field is immutable when `immutable` is set, reason=FieldValueForbidden, > additionalProperties={})], group=null, kind=ConfigMap, > name=spark-drv-d19c37843d80350c-conf-map, retryAfterSeconds=null, uid=null, > additionalProperties={}), kind=Status, message=ConfigMap > \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: Forbidden: field is > immutable when `immutable` is set, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}).\n\tat > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:682)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:661)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:555)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:518)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:342)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:322)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:649)\n\tat > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:195)\n\tat > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation$$Lambda$5360/00.apply(Unknown > Source)\n\tat > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:200)\n\tat > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:141)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation$$Lambda$4618/00.apply(Unknown > Source)\n\tat > io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.replace(CreateOrReplaceHelper.java:69)\n\tat > > io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.createOrReplace(CreateOrReplaceHelper.java:61)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:318)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:83)\n\tat
[jira] [Assigned] (SPARK-41006) ConfigMap has the same name when launching two pods on the same namespace
[ https://issues.apache.org/jira/browse/SPARK-41006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41006: Assignee: (was: Apache Spark) > ConfigMap has the same name when launching two pods on the same namespace > - > > Key: SPARK-41006 > URL: https://issues.apache.org/jira/browse/SPARK-41006 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.1.0, 3.2.0, 3.3.0 >Reporter: Eric >Priority: Minor > > If we use the Spark Launcher to launch our spark apps in k8s: > {code:java} > val sparkLauncher = new InProcessLauncher() > .setMaster(k8sMaster) > .setDeployMode(deployMode) > .setAppName(appName) > .setVerbose(true) > sparkLauncher.startApplication(new SparkAppHandle.Listener { ...{code} > We have an issue when we launch another spark driver in the same namespace > where other spark app was running: > {code:java} > kp -n audit-exporter-eee5073aac -w > NAME READY STATUS RESTARTS AGE > audit-exporter-71489e843d8085c0-driver 1/1 Running 0 > 9m54s > audit-exporter-7e6b8b843d80b9e6-exec-1 1/1 Running 0 > 9m40s > data-io-120204843d899567-driver 0/1 Terminating 0 1s > data-io-120204843d899567-driver 0/1 Terminating 0 2s > data-io-120204843d899567-driver 0/1 Terminating 0 3s > data-io-120204843d899567-driver 0/1 Terminating 0 > 3s{code} > The error is: > {code:java} > {"time":"2022-11-03T12:49:45.626Z","lvl":"WARN","logger":"o.a.s.l.InProcessAppHandle","thread":"spark-app-38: > 'data-io'","msg":"Application failed with > exception.","stack_trace":"io.fabric8.kubernetes.client.KubernetesClientException: > Failure executing: PUT at: > https://kubernetes.default/api/v1/namespaces/audit-exporter-eee5073aac/configmaps/spark-drv-d19c37843d80350c-conf-map. > Message: ConfigMap \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: > Forbidden: field is immutable when `immutable` is set. Received status: > Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=data, message=Forbidden: > field is immutable when `immutable` is set, reason=FieldValueForbidden, > additionalProperties={})], group=null, kind=ConfigMap, > name=spark-drv-d19c37843d80350c-conf-map, retryAfterSeconds=null, uid=null, > additionalProperties={}), kind=Status, message=ConfigMap > \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: Forbidden: field is > immutable when `immutable` is set, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}).\n\tat > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:682)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:661)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:555)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:518)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:342)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:322)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:649)\n\tat > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:195)\n\tat > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation$$Lambda$5360/00.apply(Unknown > Source)\n\tat > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:200)\n\tat > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:141)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation$$Lambda$4618/00.apply(Unknown > Source)\n\tat > io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.replace(CreateOrReplaceHelper.java:69)\n\tat > > io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.createOrReplace(CreateOrReplaceHelper.java:61)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:318)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:83)\n\tat > > io.fabric8.kubernetes.client.dsl.internal.NamespaceVisitFromServerGetWatchDeleteRecreateWaitApplicableImpl
[jira] [Assigned] (SPARK-42536) Upgrade log4j2 to 2.20.0
[ https://issues.apache.org/jira/browse/SPARK-42536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42536: Assignee: (was: Apache Spark) > Upgrade log4j2 to 2.20.0 > > > Key: SPARK-42536 > URL: https://issues.apache.org/jira/browse/SPARK-42536 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > > [https://logging.apache.org/log4j/2.x/release-notes/2.20.0.html] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42536) Upgrade log4j2 to 2.20.0
[ https://issues.apache.org/jira/browse/SPARK-42536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42536: Assignee: Apache Spark > Upgrade log4j2 to 2.20.0 > > > Key: SPARK-42536 > URL: https://issues.apache.org/jira/browse/SPARK-42536 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > [https://logging.apache.org/log4j/2.x/release-notes/2.20.0.html] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42536) Upgrade log4j2 to 2.20.0
[ https://issues.apache.org/jira/browse/SPARK-42536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702667#comment-17702667 ] Apache Spark commented on SPARK-42536: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40490 > Upgrade log4j2 to 2.20.0 > > > Key: SPARK-42536 > URL: https://issues.apache.org/jira/browse/SPARK-42536 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > > [https://logging.apache.org/log4j/2.x/release-notes/2.20.0.html] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42871) Upgrade slf4j to 2.0.7
[ https://issues.apache.org/jira/browse/SPARK-42871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702662#comment-17702662 ] Apache Spark commented on SPARK-42871: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40489 > Upgrade slf4j to 2.0.7 > -- > > Key: SPARK-42871 > URL: https://issues.apache.org/jira/browse/SPARK-42871 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > > https://www.slf4j.org/news.html#2.0.7 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42871) Upgrade slf4j to 2.0.7
[ https://issues.apache.org/jira/browse/SPARK-42871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42871: Assignee: (was: Apache Spark) > Upgrade slf4j to 2.0.7 > -- > > Key: SPARK-42871 > URL: https://issues.apache.org/jira/browse/SPARK-42871 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > > https://www.slf4j.org/news.html#2.0.7 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42871) Upgrade slf4j to 2.0.7
[ https://issues.apache.org/jira/browse/SPARK-42871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42871: Assignee: Apache Spark > Upgrade slf4j to 2.0.7 > -- > > Key: SPARK-42871 > URL: https://issues.apache.org/jira/browse/SPARK-42871 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > > https://www.slf4j.org/news.html#2.0.7 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42871) Upgrade slf4j to 2.0.7
[ https://issues.apache.org/jira/browse/SPARK-42871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702661#comment-17702661 ] Apache Spark commented on SPARK-42871: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40489 > Upgrade slf4j to 2.0.7 > -- > > Key: SPARK-42871 > URL: https://issues.apache.org/jira/browse/SPARK-42871 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > > https://www.slf4j.org/news.html#2.0.7 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42851) EquivalentExpressions methods need to be consistently guarded by supportedExpression
[ https://issues.apache.org/jira/browse/SPARK-42851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702658#comment-17702658 ] Apache Spark commented on SPARK-42851: -- User 'peter-toth' has created a pull request for this issue: https://github.com/apache/spark/pull/40488 > EquivalentExpressions methods need to be consistently guarded by > supportedExpression > > > Key: SPARK-42851 > URL: https://issues.apache.org/jira/browse/SPARK-42851 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.2, 3.4.0 >Reporter: Kris Mok >Priority: Major > > SPARK-41468 tried to fix a bug but introduced a new regression. Its change to > {{EquivalentExpressions}} added a {{supportedExpression()}} guard to the > {{addExprTree()}} and {{getExprState()}} methods, but didn't add the same > guard to the other "add" entry point -- {{addExpr()}}. > As such, uses that add single expressions to CSE via {{addExpr()}} may > succeed, but upon retrieval via {{getExprState()}} it'd inconsistently get a > {{None}} due to failing the guard. > We need to make sure the "add" and "get" methods are consistent. It could be > done by one of: > 1. Adding the same {{supportedExpression()}} guard to {{addExpr()}}, or > 2. Removing the guard from {{getExprState()}}, relying solely on the guard on > the "add" path to make sure only intended state is added. > (or other alternative refactorings to fuse the guard into various methods to > make it more efficient) > There are pros and cons to the two directions above, because {{addExpr()}} > used to allow (potentially incorrect) more expressions to get CSE'd, making > it more restrictive may cause performance regressions (for the cases that > happened to work). > Example: > {code:sql} > select max(transform(array(id), x -> x)), max(transform(array(id), x -> x)) > from range(2) > {code} > Running this query on Spark 3.2 branch returns the correct value: > {code} > scala> spark.sql("select max(transform(array(id), x -> x)), > max(transform(array(id), x -> x)) from range(2)").collect > res0: Array[org.apache.spark.sql.Row] = > Array([WrappedArray(1),WrappedArray(1)]) > {code} > Here, {{transform(array(id), x -> x)}} is an {{AggregateExpression}} that was > (potentially unsafely) recognized by {{addExpr()}} as a common subexpression, > and {{getExprState()}} doesn't do extra guarding, so during physical > planning, in {{PhysicalAggregation}} this expression gets CSE'd in both the > aggregation expression list and the result expressions list. > {code} > AdaptiveSparkPlan isFinalPlan=false > +- SortAggregate(key=[], functions=[max(transform(array(id#0L), > lambdafunction(lambda x#1L, lambda x#1L, false)))]) >+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=11] > +- SortAggregate(key=[], functions=[partial_max(transform(array(id#0L), > lambdafunction(lambda x#1L, lambda x#1L, false)))]) > +- Range (0, 2, step=1, splits=16) > {code} > Running the same query on current master triggers an error when binding the > result expression to the aggregate expression in the Aggregate operators (for > a WSCG-enabled operator like {{HashAggregateExec}}, the same error would show > up during codegen): > {code} > ERROR TaskSetManager: Task 0 in stage 2.0 failed 1 times; aborting job > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 > (TID 16) (ip-10-110-16-93.us-west-2.compute.internal executor driver): > java.lang.IllegalStateException: Couldn't find max(transform(array(id#0L), > lambdafunction(lambda x#2L, lambda x#2L, false)))#4 in > [max(transform(array(id#0L), lambdafunction(lambda x#1L, lambda x#1L, > false)))#3] > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:517) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1249) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1248) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:532)
[jira] [Commented] (SPARK-42851) EquivalentExpressions methods need to be consistently guarded by supportedExpression
[ https://issues.apache.org/jira/browse/SPARK-42851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702657#comment-17702657 ] Apache Spark commented on SPARK-42851: -- User 'peter-toth' has created a pull request for this issue: https://github.com/apache/spark/pull/40488 > EquivalentExpressions methods need to be consistently guarded by > supportedExpression > > > Key: SPARK-42851 > URL: https://issues.apache.org/jira/browse/SPARK-42851 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.2, 3.4.0 >Reporter: Kris Mok >Priority: Major > > SPARK-41468 tried to fix a bug but introduced a new regression. Its change to > {{EquivalentExpressions}} added a {{supportedExpression()}} guard to the > {{addExprTree()}} and {{getExprState()}} methods, but didn't add the same > guard to the other "add" entry point -- {{addExpr()}}. > As such, uses that add single expressions to CSE via {{addExpr()}} may > succeed, but upon retrieval via {{getExprState()}} it'd inconsistently get a > {{None}} due to failing the guard. > We need to make sure the "add" and "get" methods are consistent. It could be > done by one of: > 1. Adding the same {{supportedExpression()}} guard to {{addExpr()}}, or > 2. Removing the guard from {{getExprState()}}, relying solely on the guard on > the "add" path to make sure only intended state is added. > (or other alternative refactorings to fuse the guard into various methods to > make it more efficient) > There are pros and cons to the two directions above, because {{addExpr()}} > used to allow (potentially incorrect) more expressions to get CSE'd, making > it more restrictive may cause performance regressions (for the cases that > happened to work). > Example: > {code:sql} > select max(transform(array(id), x -> x)), max(transform(array(id), x -> x)) > from range(2) > {code} > Running this query on Spark 3.2 branch returns the correct value: > {code} > scala> spark.sql("select max(transform(array(id), x -> x)), > max(transform(array(id), x -> x)) from range(2)").collect > res0: Array[org.apache.spark.sql.Row] = > Array([WrappedArray(1),WrappedArray(1)]) > {code} > Here, {{transform(array(id), x -> x)}} is an {{AggregateExpression}} that was > (potentially unsafely) recognized by {{addExpr()}} as a common subexpression, > and {{getExprState()}} doesn't do extra guarding, so during physical > planning, in {{PhysicalAggregation}} this expression gets CSE'd in both the > aggregation expression list and the result expressions list. > {code} > AdaptiveSparkPlan isFinalPlan=false > +- SortAggregate(key=[], functions=[max(transform(array(id#0L), > lambdafunction(lambda x#1L, lambda x#1L, false)))]) >+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=11] > +- SortAggregate(key=[], functions=[partial_max(transform(array(id#0L), > lambdafunction(lambda x#1L, lambda x#1L, false)))]) > +- Range (0, 2, step=1, splits=16) > {code} > Running the same query on current master triggers an error when binding the > result expression to the aggregate expression in the Aggregate operators (for > a WSCG-enabled operator like {{HashAggregateExec}}, the same error would show > up during codegen): > {code} > ERROR TaskSetManager: Task 0 in stage 2.0 failed 1 times; aborting job > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 > (TID 16) (ip-10-110-16-93.us-west-2.compute.internal executor driver): > java.lang.IllegalStateException: Couldn't find max(transform(array(id#0L), > lambdafunction(lambda x#2L, lambda x#2L, false)))#4 in > [max(transform(array(id#0L), lambdafunction(lambda x#1L, lambda x#1L, > false)))#3] > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:517) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1249) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1248) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:532)
[jira] [Commented] (SPARK-42340) Implement GroupedData.applyInPandas
[ https://issues.apache.org/jira/browse/SPARK-42340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702632#comment-17702632 ] Apache Spark commented on SPARK-42340: -- User 'xinrong-meng' has created a pull request for this issue: https://github.com/apache/spark/pull/40486 > Implement GroupedData.applyInPandas > --- > > Key: SPARK-42340 > URL: https://issues.apache.org/jira/browse/SPARK-42340 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42870) Move `toCatalystValue` to connect-common
[ https://issues.apache.org/jira/browse/SPARK-42870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42870: Assignee: (was: Apache Spark) > Move `toCatalystValue` to connect-common > > > Key: SPARK-42870 > URL: https://issues.apache.org/jira/browse/SPARK-42870 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42870) Move `toCatalystValue` to connect-common
[ https://issues.apache.org/jira/browse/SPARK-42870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702598#comment-17702598 ] Apache Spark commented on SPARK-42870: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/40485 > Move `toCatalystValue` to connect-common > > > Key: SPARK-42870 > URL: https://issues.apache.org/jira/browse/SPARK-42870 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org