[jira] [Assigned] (SPARK-42898) Cast from string to date and date to string say timezone is needed, but it is not used

2023-03-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42898:


Assignee: (was: Apache Spark)

> Cast from string to date and date to string say timezone is needed, but it is 
> not used
> --
>
> Key: SPARK-42898
> URL: https://issues.apache.org/jira/browse/SPARK-42898
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Robert Joseph Evans
>Priority: Major
>
> This is really minor but SPARK-35581 removed the need for a timezone when 
> casting from a `StringType` to a `DateType`, but the patch didn't update the 
> `needsTimeZone` function to indicate that it was not longer required.
> Currently Casting from a DateType to a StringType also says that it needs the 
> timezone, but it only uses the `DateFormatter` with it's default parameters 
> that do not use the time zone at all.
> I think this can be fixed with just a two line change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42898) Cast from string to date and date to string say timezone is needed, but it is not used

2023-03-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703720#comment-17703720
 ] 

Apache Spark commented on SPARK-42898:
--

User 'revans2' has created a pull request for this issue:
https://github.com/apache/spark/pull/40524

> Cast from string to date and date to string say timezone is needed, but it is 
> not used
> --
>
> Key: SPARK-42898
> URL: https://issues.apache.org/jira/browse/SPARK-42898
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Robert Joseph Evans
>Priority: Major
>
> This is really minor but SPARK-35581 removed the need for a timezone when 
> casting from a `StringType` to a `DateType`, but the patch didn't update the 
> `needsTimeZone` function to indicate that it was not longer required.
> Currently Casting from a DateType to a StringType also says that it needs the 
> timezone, but it only uses the `DateFormatter` with it's default parameters 
> that do not use the time zone at all.
> I think this can be fixed with just a two line change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42898) Cast from string to date and date to string say timezone is needed, but it is not used

2023-03-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42898:


Assignee: Apache Spark

> Cast from string to date and date to string say timezone is needed, but it is 
> not used
> --
>
> Key: SPARK-42898
> URL: https://issues.apache.org/jira/browse/SPARK-42898
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Robert Joseph Evans
>Assignee: Apache Spark
>Priority: Major
>
> This is really minor but SPARK-35581 removed the need for a timezone when 
> casting from a `StringType` to a `DateType`, but the patch didn't update the 
> `needsTimeZone` function to indicate that it was not longer required.
> Currently Casting from a DateType to a StringType also says that it needs the 
> timezone, but it only uses the `DateFormatter` with it's default parameters 
> that do not use the time zone at all.
> I think this can be fixed with just a two line change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42897) Avoid evaluate more than once for the variables from the left side in the FullOuter SMJ condition

2023-03-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703686#comment-17703686
 ] 

Apache Spark commented on SPARK-42897:
--

User 'wankunde' has created a pull request for this issue:
https://github.com/apache/spark/pull/40523

> Avoid evaluate more than once for the variables from the left side in the 
> FullOuter SMJ condition
> -
>
> Key: SPARK-42897
> URL: https://issues.apache.org/jira/browse/SPARK-42897
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wan Kun
>Priority: Minor
>
> Codegen issue for FullOuter SMJ,  for example
> {code}
> val df1 = spark.range(5).select($"id".as("k1"))
> val df2 = spark.range(10).select($"id".as("k2"))
> df1.join(df2.hint("SHUFFLE_MERGE"),
> $"k1" === $"k2" % 3 && $"k1" + 3 =!= $"k2" && $"k1" + 5 =!= $"k2", 
> "full_outer")
> {code}
> the join condition *$"k1" + 3 =!= $"k2" && $"k1" + 5 =!= $"k2"* both will 
> evaluate the variable *k1* and caused the codegen failed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42897) Avoid evaluate more than once for the variables from the left side in the FullOuter SMJ condition

2023-03-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42897:


Assignee: Apache Spark

> Avoid evaluate more than once for the variables from the left side in the 
> FullOuter SMJ condition
> -
>
> Key: SPARK-42897
> URL: https://issues.apache.org/jira/browse/SPARK-42897
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wan Kun
>Assignee: Apache Spark
>Priority: Minor
>
> Codegen issue for FullOuter SMJ,  for example
> {code}
> val df1 = spark.range(5).select($"id".as("k1"))
> val df2 = spark.range(10).select($"id".as("k2"))
> df1.join(df2.hint("SHUFFLE_MERGE"),
> $"k1" === $"k2" % 3 && $"k1" + 3 =!= $"k2" && $"k1" + 5 =!= $"k2", 
> "full_outer")
> {code}
> the join condition *$"k1" + 3 =!= $"k2" && $"k1" + 5 =!= $"k2"* both will 
> evaluate the variable *k1* and caused the codegen failed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42897) Avoid evaluate more than once for the variables from the left side in the FullOuter SMJ condition

2023-03-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42897:


Assignee: (was: Apache Spark)

> Avoid evaluate more than once for the variables from the left side in the 
> FullOuter SMJ condition
> -
>
> Key: SPARK-42897
> URL: https://issues.apache.org/jira/browse/SPARK-42897
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wan Kun
>Priority: Minor
>
> Codegen issue for FullOuter SMJ,  for example
> {code}
> val df1 = spark.range(5).select($"id".as("k1"))
> val df2 = spark.range(10).select($"id".as("k2"))
> df1.join(df2.hint("SHUFFLE_MERGE"),
> $"k1" === $"k2" % 3 && $"k1" + 3 =!= $"k2" && $"k1" + 5 =!= $"k2", 
> "full_outer")
> {code}
> the join condition *$"k1" + 3 =!= $"k2" && $"k1" + 5 =!= $"k2"* both will 
> evaluate the variable *k1* and caused the codegen failed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42101) Wrap InMemoryTableScanExec with QueryStage

2023-03-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703677#comment-17703677
 ] 

Apache Spark commented on SPARK-42101:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/40522

> Wrap InMemoryTableScanExec with QueryStage
> --
>
> Key: SPARK-42101
> URL: https://issues.apache.org/jira/browse/SPARK-42101
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.5.0
>
>
> The first access to the cached plan which is enable AQE is tricky. Currently, 
> we can not preverse it's output partitioning and ordering.
> The whole query plan also missed lots of optimization in AQE framework. Wrap 
> InMemoryTableScanExec  to query stage can resolve all these issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42896) Make `mapInPandas` / mapInArrow` support barrier mode execution

2023-03-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703637#comment-17703637
 ] 

Apache Spark commented on SPARK-42896:
--

User 'WeichenXu123' has created a pull request for this issue:
https://github.com/apache/spark/pull/40520

> Make `mapInPandas` / mapInArrow` support barrier mode execution
> ---
>
> Key: SPARK-42896
> URL: https://issues.apache.org/jira/browse/SPARK-42896
> Project: Spark
>  Issue Type: New Feature
>  Components: Pandas API on Spark, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Weichen Xu
>Priority: Major
>
> Make `mapInPandas` / mapInArrow` support barrier mode execution



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42896) Make `mapInPandas` / mapInArrow` support barrier mode execution

2023-03-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42896:


Assignee: (was: Apache Spark)

> Make `mapInPandas` / mapInArrow` support barrier mode execution
> ---
>
> Key: SPARK-42896
> URL: https://issues.apache.org/jira/browse/SPARK-42896
> Project: Spark
>  Issue Type: New Feature
>  Components: Pandas API on Spark, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Weichen Xu
>Priority: Major
>
> Make `mapInPandas` / mapInArrow` support barrier mode execution



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42896) Make `mapInPandas` / mapInArrow` support barrier mode execution

2023-03-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42896:


Assignee: Apache Spark

> Make `mapInPandas` / mapInArrow` support barrier mode execution
> ---
>
> Key: SPARK-42896
> URL: https://issues.apache.org/jira/browse/SPARK-42896
> Project: Spark
>  Issue Type: New Feature
>  Components: Pandas API on Spark, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Weichen Xu
>Assignee: Apache Spark
>Priority: Major
>
> Make `mapInPandas` / mapInArrow` support barrier mode execution



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42864) Review and fix issues in MLlib API docs

2023-03-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703615#comment-17703615
 ] 

Apache Spark commented on SPARK-42864:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40519

> Review and fix issues in MLlib API docs
> ---
>
> Key: SPARK-42864
> URL: https://issues.apache.org/jira/browse/SPARK-42864
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42864) Review and fix issues in MLlib API docs

2023-03-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703614#comment-17703614
 ] 

Apache Spark commented on SPARK-42864:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40519

> Review and fix issues in MLlib API docs
> ---
>
> Key: SPARK-42864
> URL: https://issues.apache.org/jira/browse/SPARK-42864
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42889) Implement cache, persist, unpersist, and storageLevel

2023-03-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703541#comment-17703541
 ] 

Apache Spark commented on SPARK-42889:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40518

> Implement cache, persist, unpersist, and storageLevel
> -
>
> Key: SPARK-42889
> URL: https://issues.apache.org/jira/browse/SPARK-42889
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42508) Extract the common .ml classes to `mllib-common`

2023-03-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703503#comment-17703503
 ] 

Apache Spark commented on SPARK-42508:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40517

> Extract the common .ml classes to `mllib-common`
> 
>
> Key: SPARK-42508
> URL: https://issues.apache.org/jira/browse/SPARK-42508
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42894) Implement cache, persist, unpersist, and storageLevel

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42894:


Assignee: Apache Spark

> Implement cache, persist, unpersist, and storageLevel
> -
>
> Key: SPARK-42894
> URL: https://issues.apache.org/jira/browse/SPARK-42894
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42894) Implement cache, persist, unpersist, and storageLevel

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42894:


Assignee: (was: Apache Spark)

> Implement cache, persist, unpersist, and storageLevel
> -
>
> Key: SPARK-42894
> URL: https://issues.apache.org/jira/browse/SPARK-42894
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42894) Implement cache, persist, unpersist, and storageLevel

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703493#comment-17703493
 ] 

Apache Spark commented on SPARK-42894:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40516

> Implement cache, persist, unpersist, and storageLevel
> -
>
> Key: SPARK-42894
> URL: https://issues.apache.org/jira/browse/SPARK-42894
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42894) Implement cache, persist, unpersist, and storageLevel

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703492#comment-17703492
 ] 

Apache Spark commented on SPARK-42894:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40516

> Implement cache, persist, unpersist, and storageLevel
> -
>
> Key: SPARK-42894
> URL: https://issues.apache.org/jira/browse/SPARK-42894
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42884) Add Ammonite REPL support

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703459#comment-17703459
 ] 

Apache Spark commented on SPARK-42884:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/40515

> Add Ammonite REPL support
> -
>
> Key: SPARK-42884
> URL: https://issues.apache.org/jira/browse/SPARK-42884
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42884) Add Ammonite REPL support

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42884:


Assignee: Apache Spark  (was: Herman van Hövell)

> Add Ammonite REPL support
> -
>
> Key: SPARK-42884
> URL: https://issues.apache.org/jira/browse/SPARK-42884
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42884) Add Ammonite REPL support

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42884:


Assignee: Herman van Hövell  (was: Apache Spark)

> Add Ammonite REPL support
> -
>
> Key: SPARK-42884
> URL: https://issues.apache.org/jira/browse/SPARK-42884
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42884) Add Ammonite REPL support

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703458#comment-17703458
 ] 

Apache Spark commented on SPARK-42884:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/40515

> Add Ammonite REPL support
> -
>
> Key: SPARK-42884
> URL: https://issues.apache.org/jira/browse/SPARK-42884
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41233) High-order function: array_prepend

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703454#comment-17703454
 ] 

Apache Spark commented on SPARK-41233:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40514

> High-order function: array_prepend
> --
>
> Key: SPARK-41233
> URL: https://issues.apache.org/jira/browse/SPARK-41233
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
> Fix For: 3.5.0
>
>
> refer to 
> https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/api/snowflake.snowpark.functions.array_prepend.html
> 1, about the data type validation:
> In Snowflake’s array_append, array_prepend and array_insert functions, the 
> element data type does not need to match the data type of the existing 
> elements in the array.
> While in Spark, we want to leverage the same data type validation as 
> array_remove.
> 2, about the NULL handling
> Currently, SparkSQL, SnowSQL and PostgreSQL deal with NULL values in 
> different ways.
> Existing functions array_contains, array_position and array_remove in 
> SparkSQL handle NULL in this way, if the input array or/and element is NULL, 
> returns NULL. However, this behavior should be broken.
> We should implement the NULL handling in array_prepend in this way:
> 2.1, if the array is NULL, returns NULL;
> 2.2 if the array is not NULL, the element is NULL, append the NULL value into 
> the array



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41233) High-order function: array_prepend

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703455#comment-17703455
 ] 

Apache Spark commented on SPARK-41233:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40514

> High-order function: array_prepend
> --
>
> Key: SPARK-41233
> URL: https://issues.apache.org/jira/browse/SPARK-41233
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
> Fix For: 3.5.0
>
>
> refer to 
> https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/api/snowflake.snowpark.functions.array_prepend.html
> 1, about the data type validation:
> In Snowflake’s array_append, array_prepend and array_insert functions, the 
> element data type does not need to match the data type of the existing 
> elements in the array.
> While in Spark, we want to leverage the same data type validation as 
> array_remove.
> 2, about the NULL handling
> Currently, SparkSQL, SnowSQL and PostgreSQL deal with NULL values in 
> different ways.
> Existing functions array_contains, array_position and array_remove in 
> SparkSQL handle NULL in this way, if the input array or/and element is NULL, 
> returns NULL. However, this behavior should be broken.
> We should implement the NULL handling in array_prepend in this way:
> 2.1, if the array is NULL, returns NULL;
> 2.2 if the array is not NULL, the element is NULL, append the NULL value into 
> the array



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42893) Block Arrow-optimized Python UDFs

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42893:


Assignee: Apache Spark

> Block Arrow-optimized Python UDFs
> -
>
> Key: SPARK-42893
> URL: https://issues.apache.org/jira/browse/SPARK-42893
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> Considering the upcoming improvements on the result inconsistencies between 
> traditional Pickled Python UDFs and Arrow-optimized Python UDFs, we'd better 
> block the feature, otherwise, users who try out the feature will expect 
> behavior changes in the next release.
> In addition, since Spark Connect Python Client(SCPC) has been introduced in 
> Spark 3.4, we'd better ensure the feature is ready in both vanilla PySpark 
> and SCPC at the same time for compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42893) Block Arrow-optimized Python UDFs

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703446#comment-17703446
 ] 

Apache Spark commented on SPARK-42893:
--

User 'xinrong-meng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40513

> Block Arrow-optimized Python UDFs
> -
>
> Key: SPARK-42893
> URL: https://issues.apache.org/jira/browse/SPARK-42893
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Considering the upcoming improvements on the result inconsistencies between 
> traditional Pickled Python UDFs and Arrow-optimized Python UDFs, we'd better 
> block the feature, otherwise, users who try out the feature will expect 
> behavior changes in the next release.
> In addition, since Spark Connect Python Client(SCPC) has been introduced in 
> Spark 3.4, we'd better ensure the feature is ready in both vanilla PySpark 
> and SCPC at the same time for compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42893) Block Arrow-optimized Python UDFs

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703447#comment-17703447
 ] 

Apache Spark commented on SPARK-42893:
--

User 'xinrong-meng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40513

> Block Arrow-optimized Python UDFs
> -
>
> Key: SPARK-42893
> URL: https://issues.apache.org/jira/browse/SPARK-42893
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Considering the upcoming improvements on the result inconsistencies between 
> traditional Pickled Python UDFs and Arrow-optimized Python UDFs, we'd better 
> block the feature, otherwise, users who try out the feature will expect 
> behavior changes in the next release.
> In addition, since Spark Connect Python Client(SCPC) has been introduced in 
> Spark 3.4, we'd better ensure the feature is ready in both vanilla PySpark 
> and SCPC at the same time for compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42893) Block Arrow-optimized Python UDFs

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42893:


Assignee: (was: Apache Spark)

> Block Arrow-optimized Python UDFs
> -
>
> Key: SPARK-42893
> URL: https://issues.apache.org/jira/browse/SPARK-42893
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Considering the upcoming improvements on the result inconsistencies between 
> traditional Pickled Python UDFs and Arrow-optimized Python UDFs, we'd better 
> block the feature, otherwise, users who try out the feature will expect 
> behavior changes in the next release.
> In addition, since Spark Connect Python Client(SCPC) has been introduced in 
> Spark 3.4, we'd better ensure the feature is ready in both vanilla PySpark 
> and SCPC at the same time for compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42892) Move sameType and relevant methods out of DataType

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42892:


Assignee: Rui Wang  (was: Apache Spark)

> Move sameType and relevant methods out of DataType
> --
>
> Key: SPARK-42892
> URL: https://issues.apache.org/jira/browse/SPARK-42892
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42892) Move sameType and relevant methods out of DataType

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42892:


Assignee: Apache Spark  (was: Rui Wang)

> Move sameType and relevant methods out of DataType
> --
>
> Key: SPARK-42892
> URL: https://issues.apache.org/jira/browse/SPARK-42892
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Rui Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42892) Move sameType and relevant methods out of DataType

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703436#comment-17703436
 ] 

Apache Spark commented on SPARK-42892:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/40512

> Move sameType and relevant methods out of DataType
> --
>
> Key: SPARK-42892
> URL: https://issues.apache.org/jira/browse/SPARK-42892
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42891) Implement CoGrouped Map API

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42891:


Assignee: (was: Apache Spark)

> Implement CoGrouped Map API
> ---
>
> Key: SPARK-42891
> URL: https://issues.apache.org/jira/browse/SPARK-42891
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Implement CoGrouped Map API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42891) Implement CoGrouped Map API

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703429#comment-17703429
 ] 

Apache Spark commented on SPARK-42891:
--

User 'xinrong-meng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40487

> Implement CoGrouped Map API
> ---
>
> Key: SPARK-42891
> URL: https://issues.apache.org/jira/browse/SPARK-42891
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Implement CoGrouped Map API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42891) Implement CoGrouped Map API

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703428#comment-17703428
 ] 

Apache Spark commented on SPARK-42891:
--

User 'xinrong-meng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40487

> Implement CoGrouped Map API
> ---
>
> Key: SPARK-42891
> URL: https://issues.apache.org/jira/browse/SPARK-42891
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Implement CoGrouped Map API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42891) Implement CoGrouped Map API

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42891:


Assignee: Apache Spark

> Implement CoGrouped Map API
> ---
>
> Key: SPARK-42891
> URL: https://issues.apache.org/jira/browse/SPARK-42891
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> Implement CoGrouped Map API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42888) Upgrade GCS connector from 2.2.7 to 2.2.11.

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703401#comment-17703401
 ] 

Apache Spark commented on SPARK-42888:
--

User 'cnauroth' has created a pull request for this issue:
https://github.com/apache/spark/pull/40511

> Upgrade GCS connector from 2.2.7 to 2.2.11.
> ---
>
> Key: SPARK-42888
> URL: https://issues.apache.org/jira/browse/SPARK-42888
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.1
>Reporter: Chris Nauroth
>Priority: Minor
>
> Upgrade the [GCS 
> Connector|https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/v2.2.11/gcs]
>  bundled in the Spark distro from version 2.2.7 to 2.2.11. The new release 
> contains multiple bug fixes and enhancements discussed in the [Release 
> Notes|https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.11/gcs/CHANGES.md].
>  Notable changes include:
> * Improved socket timeout handling.
> * Trace logging capabilities.
> * Fix bug that prevented usage of GCS as a [Hadoop Credential 
> Provider|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html].
> * Dependency upgrades.
> * Support OAuth2 based client authentication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42888) Upgrade GCS connector from 2.2.7 to 2.2.11.

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42888:


Assignee: (was: Apache Spark)

> Upgrade GCS connector from 2.2.7 to 2.2.11.
> ---
>
> Key: SPARK-42888
> URL: https://issues.apache.org/jira/browse/SPARK-42888
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.1
>Reporter: Chris Nauroth
>Priority: Minor
>
> Upgrade the [GCS 
> Connector|https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/v2.2.11/gcs]
>  bundled in the Spark distro from version 2.2.7 to 2.2.11. The new release 
> contains multiple bug fixes and enhancements discussed in the [Release 
> Notes|https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.11/gcs/CHANGES.md].
>  Notable changes include:
> * Improved socket timeout handling.
> * Trace logging capabilities.
> * Fix bug that prevented usage of GCS as a [Hadoop Credential 
> Provider|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html].
> * Dependency upgrades.
> * Support OAuth2 based client authentication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42888) Upgrade GCS connector from 2.2.7 to 2.2.11.

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42888:


Assignee: Apache Spark

> Upgrade GCS connector from 2.2.7 to 2.2.11.
> ---
>
> Key: SPARK-42888
> URL: https://issues.apache.org/jira/browse/SPARK-42888
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.1
>Reporter: Chris Nauroth
>Assignee: Apache Spark
>Priority: Minor
>
> Upgrade the [GCS 
> Connector|https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/v2.2.11/gcs]
>  bundled in the Spark distro from version 2.2.7 to 2.2.11. The new release 
> contains multiple bug fixes and enhancements discussed in the [Release 
> Notes|https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.11/gcs/CHANGES.md].
>  Notable changes include:
> * Improved socket timeout handling.
> * Trace logging capabilities.
> * Fix bug that prevented usage of GCS as a [Hadoop Credential 
> Provider|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html].
> * Dependency upgrades.
> * Support OAuth2 based client authentication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42888) Upgrade GCS connector from 2.2.7 to 2.2.11.

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703400#comment-17703400
 ] 

Apache Spark commented on SPARK-42888:
--

User 'cnauroth' has created a pull request for this issue:
https://github.com/apache/spark/pull/40511

> Upgrade GCS connector from 2.2.7 to 2.2.11.
> ---
>
> Key: SPARK-42888
> URL: https://issues.apache.org/jira/browse/SPARK-42888
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.1
>Reporter: Chris Nauroth
>Priority: Minor
>
> Upgrade the [GCS 
> Connector|https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/v2.2.11/gcs]
>  bundled in the Spark distro from version 2.2.7 to 2.2.11. The new release 
> contains multiple bug fixes and enhancements discussed in the [Release 
> Notes|https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.11/gcs/CHANGES.md].
>  Notable changes include:
> * Improved socket timeout handling.
> * Trace logging capabilities.
> * Fix bug that prevented usage of GCS as a [Hadoop Credential 
> Provider|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html].
> * Dependency upgrades.
> * Support OAuth2 based client authentication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42889) Implement cache, persist, unpersist, and storageLevel

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703392#comment-17703392
 ] 

Apache Spark commented on SPARK-42889:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40510

> Implement cache, persist, unpersist, and storageLevel
> -
>
> Key: SPARK-42889
> URL: https://issues.apache.org/jira/browse/SPARK-42889
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42889) Implement cache, persist, unpersist, and storageLevel

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42889:


Assignee: (was: Apache Spark)

> Implement cache, persist, unpersist, and storageLevel
> -
>
> Key: SPARK-42889
> URL: https://issues.apache.org/jira/browse/SPARK-42889
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42889) Implement cache, persist, unpersist, and storageLevel

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42889:


Assignee: Apache Spark

> Implement cache, persist, unpersist, and storageLevel
> -
>
> Key: SPARK-42889
> URL: https://issues.apache.org/jira/browse/SPARK-42889
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42838) Assign a name to the error class _LEGACY_ERROR_TEMP_2000

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42838:


Assignee: (was: Apache Spark)

> Assign a name to the error class _LEGACY_ERROR_TEMP_2000
> 
>
> Key: SPARK-42838
> URL: https://issues.apache.org/jira/browse/SPARK-42838
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42838) Assign a name to the error class _LEGACY_ERROR_TEMP_2000

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703286#comment-17703286
 ] 

Apache Spark commented on SPARK-42838:
--

User 'unical1988' has created a pull request for this issue:
https://github.com/apache/spark/pull/40468

> Assign a name to the error class _LEGACY_ERROR_TEMP_2000
> 
>
> Key: SPARK-42838
> URL: https://issues.apache.org/jira/browse/SPARK-42838
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42838) Assign a name to the error class _LEGACY_ERROR_TEMP_2000

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42838:


Assignee: Apache Spark

> Assign a name to the error class _LEGACY_ERROR_TEMP_2000
> 
>
> Key: SPARK-42838
> URL: https://issues.apache.org/jira/browse/SPARK-42838
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42885) Upgrade `kubernetes-client` to 6.5.1

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42885:


Assignee: (was: Apache Spark)

> Upgrade `kubernetes-client` to 6.5.1
> 
>
> Key: SPARK-42885
> URL: https://issues.apache.org/jira/browse/SPARK-42885
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Kubernetes
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42885) Upgrade `kubernetes-client` to 6.5.1

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42885:


Assignee: Apache Spark

> Upgrade `kubernetes-client` to 6.5.1
> 
>
> Key: SPARK-42885
> URL: https://issues.apache.org/jira/browse/SPARK-42885
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Kubernetes
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42885) Upgrade `kubernetes-client` to 6.5.1

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703272#comment-17703272
 ] 

Apache Spark commented on SPARK-42885:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40509

> Upgrade `kubernetes-client` to 6.5.1
> 
>
> Key: SPARK-42885
> URL: https://issues.apache.org/jira/browse/SPARK-42885
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Kubernetes
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42662) Add `_distributed_sequence_id` for distributed-sequence index.

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703112#comment-17703112
 ] 

Apache Spark commented on SPARK-42662:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/40507

> Add `_distributed_sequence_id` for distributed-sequence index.
> --
>
> Key: SPARK-42662
> URL: https://issues.apache.org/jira/browse/SPARK-42662
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark, PySpark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Turn `withSequenceColumn` into PySpark internal API to support the 
> distributed-sequence index of the pandas API on Spark in Spark Connect as 
> well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42881) get_json_object Codegen Support

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703078#comment-17703078
 ] 

Apache Spark commented on SPARK-42881:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40506

> get_json_object Codegen Support
> ---
>
> Key: SPARK-42881
> URL: https://issues.apache.org/jira/browse/SPARK-42881
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42881) get_json_object Codegen Support

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703077#comment-17703077
 ] 

Apache Spark commented on SPARK-42881:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40506

> get_json_object Codegen Support
> ---
>
> Key: SPARK-42881
> URL: https://issues.apache.org/jira/browse/SPARK-42881
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42881) get_json_object Codegen Support

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42881:


Assignee: (was: Apache Spark)

> get_json_object Codegen Support
> ---
>
> Key: SPARK-42881
> URL: https://issues.apache.org/jira/browse/SPARK-42881
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42881) get_json_object Codegen Support

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42881:


Assignee: Apache Spark

> get_json_object Codegen Support
> ---
>
> Key: SPARK-42881
> URL: https://issues.apache.org/jira/browse/SPARK-42881
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42880) Improve the yarn document for lo4j2 configuration

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42880:


Assignee: Apache Spark

> Improve the yarn document for lo4j2 configuration
> -
>
> Key: SPARK-42880
> URL: https://issues.apache.org/jira/browse/SPARK-42880
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 3.3.2
>Reporter: Zhifang Li
>Assignee: Apache Spark
>Priority: Minor
>
> Since Spark3.3 has changed log4j1 to log4j2, some documents should also be 
> updated. 
> For example, docs/running-on-yarn.md still uses log4j1 syntax as follows.
> `log4j.appender.file_appender.File=${spark.yarn.app.container.log.dir}/spark.log`.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42880) Improve the yarn document for lo4j2 configuration

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42880:


Assignee: (was: Apache Spark)

> Improve the yarn document for lo4j2 configuration
> -
>
> Key: SPARK-42880
> URL: https://issues.apache.org/jira/browse/SPARK-42880
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 3.3.2
>Reporter: Zhifang Li
>Priority: Minor
>
> Since Spark3.3 has changed log4j1 to log4j2, some documents should also be 
> updated. 
> For example, docs/running-on-yarn.md still uses log4j1 syntax as follows.
> `log4j.appender.file_appender.File=${spark.yarn.app.container.log.dir}/spark.log`.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42880) Improve the yarn document for lo4j2 configuration

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703071#comment-17703071
 ] 

Apache Spark commented on SPARK-42880:
--

User 'frankliee' has created a pull request for this issue:
https://github.com/apache/spark/pull/40504

> Improve the yarn document for lo4j2 configuration
> -
>
> Key: SPARK-42880
> URL: https://issues.apache.org/jira/browse/SPARK-42880
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 3.3.2
>Reporter: Zhifang Li
>Priority: Minor
>
> Since Spark3.3 has changed log4j1 to log4j2, some documents should also be 
> updated. 
> For example, docs/running-on-yarn.md still uses log4j1 syntax as follows.
> `log4j.appender.file_appender.File=${spark.yarn.app.container.log.dir}/spark.log`.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42878) Named Table should support options

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703050#comment-17703050
 ] 

Apache Spark commented on SPARK-42878:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/40498

> Named Table should support options
> --
>
> Key: SPARK-42878
> URL: https://issues.apache.org/jira/browse/SPARK-42878
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42878) Named Table should support options

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42878:


Assignee: Apache Spark  (was: Rui Wang)

> Named Table should support options
> --
>
> Key: SPARK-42878
> URL: https://issues.apache.org/jira/browse/SPARK-42878
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42878) Named Table should support options

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42878:


Assignee: Rui Wang  (was: Apache Spark)

> Named Table should support options
> --
>
> Key: SPARK-42878
> URL: https://issues.apache.org/jira/browse/SPARK-42878
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42878) Named Table should support options

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703048#comment-17703048
 ] 

Apache Spark commented on SPARK-42878:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/40498

> Named Table should support options
> --
>
> Key: SPARK-42878
> URL: https://issues.apache.org/jira/browse/SPARK-42878
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42830) Link skipped stages on Spark UI

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42830:


Assignee: (was: Apache Spark)

> Link skipped stages on Spark UI
> ---
>
> Key: SPARK-42830
> URL: https://issues.apache.org/jira/browse/SPARK-42830
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.3.2
>Reporter: Yian Liou
>Priority: Major
>
> Add a link to the skipped Spark stages so that its easier to find the 
> execution details on the UI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42830) Link skipped stages on Spark UI

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42830:


Assignee: Apache Spark

> Link skipped stages on Spark UI
> ---
>
> Key: SPARK-42830
> URL: https://issues.apache.org/jira/browse/SPARK-42830
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.3.2
>Reporter: Yian Liou
>Assignee: Apache Spark
>Priority: Major
>
> Add a link to the skipped Spark stages so that its easier to find the 
> execution details on the UI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42830) Link skipped stages on Spark UI

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703046#comment-17703046
 ] 

Apache Spark commented on SPARK-42830:
--

User 'yliou' has created a pull request for this issue:
https://github.com/apache/spark/pull/40503

> Link skipped stages on Spark UI
> ---
>
> Key: SPARK-42830
> URL: https://issues.apache.org/jira/browse/SPARK-42830
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.3.2
>Reporter: Yian Liou
>Priority: Major
>
> Add a link to the skipped Spark stages so that its easier to find the 
> execution details on the UI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42829) Added Identifier to the cached RDD operator on the Stages page

2023-03-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42829:


Assignee: Apache Spark

> Added Identifier to the cached RDD operator on the Stages page 
> ---
>
> Key: SPARK-42829
> URL: https://issues.apache.org/jira/browse/SPARK-42829
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.3.2
>Reporter: Yian Liou
>Assignee: Apache Spark
>Priority: Major
> Attachments: Screen Shot 2023-03-20 at 3.55.40 PM.png
>
>
> On the stages page in the Web UI, there is no distinction for which cached 
> RDD is being executed in a particular stage. This Jira aims to add an repeat 
> identifier to distinguish which cached RDD is being executed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42829) Added Identifier to the cached RDD operator on the Stages page

2023-03-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703039#comment-17703039
 ] 

Apache Spark commented on SPARK-42829:
--

User 'yliou' has created a pull request for this issue:
https://github.com/apache/spark/pull/40502

> Added Identifier to the cached RDD operator on the Stages page 
> ---
>
> Key: SPARK-42829
> URL: https://issues.apache.org/jira/browse/SPARK-42829
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.3.2
>Reporter: Yian Liou
>Priority: Major
> Attachments: Screen Shot 2023-03-20 at 3.55.40 PM.png
>
>
> On the stages page in the Web UI, there is no distinction for which cached 
> RDD is being executed in a particular stage. This Jira aims to add an repeat 
> identifier to distinguish which cached RDD is being executed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42829) Added Identifier to the cached RDD operator on the Stages page

2023-03-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42829:


Assignee: (was: Apache Spark)

> Added Identifier to the cached RDD operator on the Stages page 
> ---
>
> Key: SPARK-42829
> URL: https://issues.apache.org/jira/browse/SPARK-42829
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.3.2
>Reporter: Yian Liou
>Priority: Major
> Attachments: Screen Shot 2023-03-20 at 3.55.40 PM.png
>
>
> On the stages page in the Web UI, there is no distinction for which cached 
> RDD is being executed in a particular stage. This Jira aims to add an repeat 
> identifier to distinguish which cached RDD is being executed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42864) Review and fix issues in MLlib API docs

2023-03-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703026#comment-17703026
 ] 

Apache Spark commented on SPARK-42864:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40501

> Review and fix issues in MLlib API docs
> ---
>
> Key: SPARK-42864
> URL: https://issues.apache.org/jira/browse/SPARK-42864
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42864) Review and fix issues in MLlib API docs

2023-03-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703017#comment-17703017
 ] 

Apache Spark commented on SPARK-42864:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40500

> Review and fix issues in MLlib API docs
> ---
>
> Key: SPARK-42864
> URL: https://issues.apache.org/jira/browse/SPARK-42864
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42864) Review and fix issues in MLlib API docs

2023-03-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703018#comment-17703018
 ] 

Apache Spark commented on SPARK-42864:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40500

> Review and fix issues in MLlib API docs
> ---
>
> Key: SPARK-42864
> URL: https://issues.apache.org/jira/browse/SPARK-42864
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42864) Review and fix issues in MLlib API docs

2023-03-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42864:


Assignee: Apache Spark  (was: Ruifeng Zheng)

> Review and fix issues in MLlib API docs
> ---
>
> Key: SPARK-42864
> URL: https://issues.apache.org/jira/browse/SPARK-42864
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42864) Review and fix issues in MLlib API docs

2023-03-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42864:


Assignee: Ruifeng Zheng  (was: Apache Spark)

> Review and fix issues in MLlib API docs
> ---
>
> Key: SPARK-42864
> URL: https://issues.apache.org/jira/browse/SPARK-42864
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42876) DataType's physicalDataType should be private[sql]

2023-03-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42876:


Assignee: Rui Wang  (was: Apache Spark)

> DataType's physicalDataType should be private[sql]
> --
>
> Key: SPARK-42876
> URL: https://issues.apache.org/jira/browse/SPARK-42876
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42876) DataType's physicalDataType should be private[sql]

2023-03-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42876:


Assignee: Apache Spark  (was: Rui Wang)

> DataType's physicalDataType should be private[sql]
> --
>
> Key: SPARK-42876
> URL: https://issues.apache.org/jira/browse/SPARK-42876
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42876) DataType's physicalDataType should be private[sql]

2023-03-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702978#comment-17702978
 ] 

Apache Spark commented on SPARK-42876:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/40499

> DataType's physicalDataType should be private[sql]
> --
>
> Key: SPARK-42876
> URL: https://issues.apache.org/jira/browse/SPARK-42876
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42875) Fix toPandas to handle timezone and map types properly.

2023-03-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42875:


Assignee: Apache Spark

> Fix toPandas to handle timezone and map types properly.
> ---
>
> Key: SPARK-42875
> URL: https://issues.apache.org/jira/browse/SPARK-42875
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42875) Fix toPandas to handle timezone and map types properly.

2023-03-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702922#comment-17702922
 ] 

Apache Spark commented on SPARK-42875:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40497

> Fix toPandas to handle timezone and map types properly.
> ---
>
> Key: SPARK-42875
> URL: https://issues.apache.org/jira/browse/SPARK-42875
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42875) Fix toPandas to handle timezone and map types properly.

2023-03-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702921#comment-17702921
 ] 

Apache Spark commented on SPARK-42875:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40497

> Fix toPandas to handle timezone and map types properly.
> ---
>
> Key: SPARK-42875
> URL: https://issues.apache.org/jira/browse/SPARK-42875
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42875) Fix toPandas to handle timezone and map types properly.

2023-03-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42875:


Assignee: (was: Apache Spark)

> Fix toPandas to handle timezone and map types properly.
> ---
>
> Key: SPARK-42875
> URL: https://issues.apache.org/jira/browse/SPARK-42875
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42874) Enable new golden file test framework for analysis for all input files

2023-03-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702888#comment-17702888
 ] 

Apache Spark commented on SPARK-42874:
--

User 'dtenedor' has created a pull request for this issue:
https://github.com/apache/spark/pull/40496

> Enable new golden file test framework for analysis for all input files
> --
>
> Key: SPARK-42874
> URL: https://issues.apache.org/jira/browse/SPARK-42874
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42874) Enable new golden file test framework for analysis for all input files

2023-03-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42874:


Assignee: Apache Spark

> Enable new golden file test framework for analysis for all input files
> --
>
> Key: SPARK-42874
> URL: https://issues.apache.org/jira/browse/SPARK-42874
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42874) Enable new golden file test framework for analysis for all input files

2023-03-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42874:


Assignee: (was: Apache Spark)

> Enable new golden file test framework for analysis for all input files
> --
>
> Key: SPARK-42874
> URL: https://issues.apache.org/jira/browse/SPARK-42874
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42839) Assign a name to the error class _LEGACY_ERROR_TEMP_2003

2023-03-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42839:


Assignee: (was: Apache Spark)

> Assign a name to the error class _LEGACY_ERROR_TEMP_2003
> 
>
> Key: SPARK-42839
> URL: https://issues.apache.org/jira/browse/SPARK-42839
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
> Attachments: Screenshot from 2023-03-21 00-20-11.png
>
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2003* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42839) Assign a name to the error class _LEGACY_ERROR_TEMP_2003

2023-03-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42839:


Assignee: Apache Spark

> Assign a name to the error class _LEGACY_ERROR_TEMP_2003
> 
>
> Key: SPARK-42839
> URL: https://issues.apache.org/jira/browse/SPARK-42839
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Minor
>  Labels: starter
> Attachments: Screenshot from 2023-03-21 00-20-11.png
>
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2003* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42839) Assign a name to the error class _LEGACY_ERROR_TEMP_2003

2023-03-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702841#comment-17702841
 ] 

Apache Spark commented on SPARK-42839:
--

User 'ruilibuaa' has created a pull request for this issue:
https://github.com/apache/spark/pull/40493

> Assign a name to the error class _LEGACY_ERROR_TEMP_2003
> 
>
> Key: SPARK-42839
> URL: https://issues.apache.org/jira/browse/SPARK-42839
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
> Attachments: Screenshot from 2023-03-21 00-20-11.png
>
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2003* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42791) Create golden file test framework for analysis

2023-03-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702811#comment-17702811
 ] 

Apache Spark commented on SPARK-42791:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40492

> Create golden file test framework for analysis
> --
>
> Key: SPARK-42791
> URL: https://issues.apache.org/jira/browse/SPARK-42791
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
> Fix For: 3.5.0
>
>
> Here we track the work to add new golden file test support for the Spark 
> analyzer. Each golden file can contain a list of SQL queries followed by the 
> string representations of their analyzed logical plans.
>  
> This can be similar to Spark's existing `SQLQueryTestSuite` [1], but stopping 
> after analysis and listing analyzed plans as the results instead of fully 
> executing queries end-to-end. As another example, ZetaSQL has analyzer-based 
> golden file testing like this as well [2].
>  
> This way, any changes to analysis will show up as test diffs, which are easy 
> to spot in review and also easy to update automatically. This could help the 
> community together maintain the qualify of Apache Spark's query analysis.
>  
> [1] 
> [https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala]
>  
> [2] 
> [https://github.com/google/zetasql/blob/master/zetasql/analyzer/testdata/limit.test].
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41006) ConfigMap has the same name when launching two pods on the same namespace

2023-03-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41006:


Assignee: Apache Spark

> ConfigMap has the same name when launching two pods on the same namespace
> -
>
> Key: SPARK-41006
> URL: https://issues.apache.org/jira/browse/SPARK-41006
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.1.0, 3.2.0, 3.3.0
>Reporter: Eric
>Assignee: Apache Spark
>Priority: Minor
>
> If we use the Spark Launcher to launch our spark apps in k8s:
> {code:java}
> val sparkLauncher = new InProcessLauncher()
>  .setMaster(k8sMaster)
>  .setDeployMode(deployMode)
>  .setAppName(appName)
>  .setVerbose(true)
> sparkLauncher.startApplication(new SparkAppHandle.Listener { ...{code}
> We have an issue when we launch another spark driver in the same namespace 
> where other spark app was running:
> {code:java}
> kp -n audit-exporter-eee5073aac -w
> NAME                                     READY   STATUS        RESTARTS   AGE
> audit-exporter-71489e843d8085c0-driver   1/1     Running       0          
> 9m54s
> audit-exporter-7e6b8b843d80b9e6-exec-1   1/1     Running       0          
> 9m40s
> data-io-120204843d899567-driver          0/1     Terminating   0          1s
> data-io-120204843d899567-driver          0/1     Terminating   0          2s
> data-io-120204843d899567-driver          0/1     Terminating   0          3s
> data-io-120204843d899567-driver          0/1     Terminating   0          
> 3s{code}
> The error is:
> {code:java}
> {"time":"2022-11-03T12:49:45.626Z","lvl":"WARN","logger":"o.a.s.l.InProcessAppHandle","thread":"spark-app-38:
>  'data-io'","msg":"Application failed with 
> exception.","stack_trace":"io.fabric8.kubernetes.client.KubernetesClientException:
>  Failure executing: PUT at: 
> https://kubernetes.default/api/v1/namespaces/audit-exporter-eee5073aac/configmaps/spark-drv-d19c37843d80350c-conf-map.
>  Message: ConfigMap \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: 
> Forbidden: field is immutable when `immutable` is set. Received status: 
> Status(apiVersion=v1, code=422, 
> details=StatusDetails(causes=[StatusCause(field=data, message=Forbidden: 
> field is immutable when `immutable` is set, reason=FieldValueForbidden, 
> additionalProperties={})], group=null, kind=ConfigMap, 
> name=spark-drv-d19c37843d80350c-conf-map, retryAfterSeconds=null, uid=null, 
> additionalProperties={}), kind=Status, message=ConfigMap 
> \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: Forbidden: field is 
> immutable when `immutable` is set, metadata=ListMeta(_continue=null, 
> remainingItemCount=null, resourceVersion=null, selfLink=null, 
> additionalProperties={}), reason=Invalid, status=Failure, 
> additionalProperties={}).\n\tat 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:682)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:661)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:555)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:518)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:342)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:322)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:649)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:195)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation$$Lambda$5360/00.apply(Unknown
>  Source)\n\tat 
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:200)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:141)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.BaseOperation$$Lambda$4618/00.apply(Unknown
>  Source)\n\tat 
> io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.replace(CreateOrReplaceHelper.java:69)\n\tat
>  
> io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.createOrReplace(CreateOrReplaceHelper.java:61)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:318)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:83)\n\tat
>  
> 

[jira] [Commented] (SPARK-41006) ConfigMap has the same name when launching two pods on the same namespace

2023-03-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702710#comment-17702710
 ] 

Apache Spark commented on SPARK-41006:
--

User 'DHKold' has created a pull request for this issue:
https://github.com/apache/spark/pull/40491

> ConfigMap has the same name when launching two pods on the same namespace
> -
>
> Key: SPARK-41006
> URL: https://issues.apache.org/jira/browse/SPARK-41006
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.1.0, 3.2.0, 3.3.0
>Reporter: Eric
>Priority: Minor
>
> If we use the Spark Launcher to launch our spark apps in k8s:
> {code:java}
> val sparkLauncher = new InProcessLauncher()
>  .setMaster(k8sMaster)
>  .setDeployMode(deployMode)
>  .setAppName(appName)
>  .setVerbose(true)
> sparkLauncher.startApplication(new SparkAppHandle.Listener { ...{code}
> We have an issue when we launch another spark driver in the same namespace 
> where other spark app was running:
> {code:java}
> kp -n audit-exporter-eee5073aac -w
> NAME                                     READY   STATUS        RESTARTS   AGE
> audit-exporter-71489e843d8085c0-driver   1/1     Running       0          
> 9m54s
> audit-exporter-7e6b8b843d80b9e6-exec-1   1/1     Running       0          
> 9m40s
> data-io-120204843d899567-driver          0/1     Terminating   0          1s
> data-io-120204843d899567-driver          0/1     Terminating   0          2s
> data-io-120204843d899567-driver          0/1     Terminating   0          3s
> data-io-120204843d899567-driver          0/1     Terminating   0          
> 3s{code}
> The error is:
> {code:java}
> {"time":"2022-11-03T12:49:45.626Z","lvl":"WARN","logger":"o.a.s.l.InProcessAppHandle","thread":"spark-app-38:
>  'data-io'","msg":"Application failed with 
> exception.","stack_trace":"io.fabric8.kubernetes.client.KubernetesClientException:
>  Failure executing: PUT at: 
> https://kubernetes.default/api/v1/namespaces/audit-exporter-eee5073aac/configmaps/spark-drv-d19c37843d80350c-conf-map.
>  Message: ConfigMap \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: 
> Forbidden: field is immutable when `immutable` is set. Received status: 
> Status(apiVersion=v1, code=422, 
> details=StatusDetails(causes=[StatusCause(field=data, message=Forbidden: 
> field is immutable when `immutable` is set, reason=FieldValueForbidden, 
> additionalProperties={})], group=null, kind=ConfigMap, 
> name=spark-drv-d19c37843d80350c-conf-map, retryAfterSeconds=null, uid=null, 
> additionalProperties={}), kind=Status, message=ConfigMap 
> \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: Forbidden: field is 
> immutable when `immutable` is set, metadata=ListMeta(_continue=null, 
> remainingItemCount=null, resourceVersion=null, selfLink=null, 
> additionalProperties={}), reason=Invalid, status=Failure, 
> additionalProperties={}).\n\tat 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:682)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:661)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:555)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:518)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:342)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:322)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:649)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:195)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation$$Lambda$5360/00.apply(Unknown
>  Source)\n\tat 
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:200)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:141)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.BaseOperation$$Lambda$4618/00.apply(Unknown
>  Source)\n\tat 
> io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.replace(CreateOrReplaceHelper.java:69)\n\tat
>  
> io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.createOrReplace(CreateOrReplaceHelper.java:61)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:318)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:83)\n\tat
>  
> 

[jira] [Assigned] (SPARK-41006) ConfigMap has the same name when launching two pods on the same namespace

2023-03-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41006:


Assignee: (was: Apache Spark)

> ConfigMap has the same name when launching two pods on the same namespace
> -
>
> Key: SPARK-41006
> URL: https://issues.apache.org/jira/browse/SPARK-41006
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.1.0, 3.2.0, 3.3.0
>Reporter: Eric
>Priority: Minor
>
> If we use the Spark Launcher to launch our spark apps in k8s:
> {code:java}
> val sparkLauncher = new InProcessLauncher()
>  .setMaster(k8sMaster)
>  .setDeployMode(deployMode)
>  .setAppName(appName)
>  .setVerbose(true)
> sparkLauncher.startApplication(new SparkAppHandle.Listener { ...{code}
> We have an issue when we launch another spark driver in the same namespace 
> where other spark app was running:
> {code:java}
> kp -n audit-exporter-eee5073aac -w
> NAME                                     READY   STATUS        RESTARTS   AGE
> audit-exporter-71489e843d8085c0-driver   1/1     Running       0          
> 9m54s
> audit-exporter-7e6b8b843d80b9e6-exec-1   1/1     Running       0          
> 9m40s
> data-io-120204843d899567-driver          0/1     Terminating   0          1s
> data-io-120204843d899567-driver          0/1     Terminating   0          2s
> data-io-120204843d899567-driver          0/1     Terminating   0          3s
> data-io-120204843d899567-driver          0/1     Terminating   0          
> 3s{code}
> The error is:
> {code:java}
> {"time":"2022-11-03T12:49:45.626Z","lvl":"WARN","logger":"o.a.s.l.InProcessAppHandle","thread":"spark-app-38:
>  'data-io'","msg":"Application failed with 
> exception.","stack_trace":"io.fabric8.kubernetes.client.KubernetesClientException:
>  Failure executing: PUT at: 
> https://kubernetes.default/api/v1/namespaces/audit-exporter-eee5073aac/configmaps/spark-drv-d19c37843d80350c-conf-map.
>  Message: ConfigMap \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: 
> Forbidden: field is immutable when `immutable` is set. Received status: 
> Status(apiVersion=v1, code=422, 
> details=StatusDetails(causes=[StatusCause(field=data, message=Forbidden: 
> field is immutable when `immutable` is set, reason=FieldValueForbidden, 
> additionalProperties={})], group=null, kind=ConfigMap, 
> name=spark-drv-d19c37843d80350c-conf-map, retryAfterSeconds=null, uid=null, 
> additionalProperties={}), kind=Status, message=ConfigMap 
> \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: Forbidden: field is 
> immutable when `immutable` is set, metadata=ListMeta(_continue=null, 
> remainingItemCount=null, resourceVersion=null, selfLink=null, 
> additionalProperties={}), reason=Invalid, status=Failure, 
> additionalProperties={}).\n\tat 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:682)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:661)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:555)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:518)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:342)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:322)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:649)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:195)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation$$Lambda$5360/00.apply(Unknown
>  Source)\n\tat 
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:200)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:141)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.BaseOperation$$Lambda$4618/00.apply(Unknown
>  Source)\n\tat 
> io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.replace(CreateOrReplaceHelper.java:69)\n\tat
>  
> io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.createOrReplace(CreateOrReplaceHelper.java:61)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:318)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:83)\n\tat
>  
> 

[jira] [Assigned] (SPARK-42536) Upgrade log4j2 to 2.20.0

2023-03-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42536:


Assignee: (was: Apache Spark)

> Upgrade log4j2 to 2.20.0
> 
>
> Key: SPARK-42536
> URL: https://issues.apache.org/jira/browse/SPARK-42536
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>
> [https://logging.apache.org/log4j/2.x/release-notes/2.20.0.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42536) Upgrade log4j2 to 2.20.0

2023-03-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42536:


Assignee: Apache Spark

> Upgrade log4j2 to 2.20.0
> 
>
> Key: SPARK-42536
> URL: https://issues.apache.org/jira/browse/SPARK-42536
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> [https://logging.apache.org/log4j/2.x/release-notes/2.20.0.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42536) Upgrade log4j2 to 2.20.0

2023-03-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702667#comment-17702667
 ] 

Apache Spark commented on SPARK-42536:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40490

> Upgrade log4j2 to 2.20.0
> 
>
> Key: SPARK-42536
> URL: https://issues.apache.org/jira/browse/SPARK-42536
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>
> [https://logging.apache.org/log4j/2.x/release-notes/2.20.0.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42871) Upgrade slf4j to 2.0.7

2023-03-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702662#comment-17702662
 ] 

Apache Spark commented on SPARK-42871:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40489

> Upgrade slf4j to 2.0.7
> --
>
> Key: SPARK-42871
> URL: https://issues.apache.org/jira/browse/SPARK-42871
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> https://www.slf4j.org/news.html#2.0.7



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42871) Upgrade slf4j to 2.0.7

2023-03-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42871:


Assignee: (was: Apache Spark)

> Upgrade slf4j to 2.0.7
> --
>
> Key: SPARK-42871
> URL: https://issues.apache.org/jira/browse/SPARK-42871
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> https://www.slf4j.org/news.html#2.0.7



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42871) Upgrade slf4j to 2.0.7

2023-03-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42871:


Assignee: Apache Spark

> Upgrade slf4j to 2.0.7
> --
>
> Key: SPARK-42871
> URL: https://issues.apache.org/jira/browse/SPARK-42871
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> https://www.slf4j.org/news.html#2.0.7



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42871) Upgrade slf4j to 2.0.7

2023-03-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702661#comment-17702661
 ] 

Apache Spark commented on SPARK-42871:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40489

> Upgrade slf4j to 2.0.7
> --
>
> Key: SPARK-42871
> URL: https://issues.apache.org/jira/browse/SPARK-42871
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> https://www.slf4j.org/news.html#2.0.7



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42851) EquivalentExpressions methods need to be consistently guarded by supportedExpression

2023-03-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702658#comment-17702658
 ] 

Apache Spark commented on SPARK-42851:
--

User 'peter-toth' has created a pull request for this issue:
https://github.com/apache/spark/pull/40488

> EquivalentExpressions methods need to be consistently guarded by 
> supportedExpression
> 
>
> Key: SPARK-42851
> URL: https://issues.apache.org/jira/browse/SPARK-42851
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Kris Mok
>Priority: Major
>
> SPARK-41468 tried to fix a bug but introduced a new regression. Its change to 
> {{EquivalentExpressions}} added a {{supportedExpression()}} guard to the 
> {{addExprTree()}} and {{getExprState()}} methods, but didn't add the same 
> guard to the other "add" entry point -- {{addExpr()}}.
> As such, uses that add single expressions to CSE via {{addExpr()}} may 
> succeed, but upon retrieval via {{getExprState()}} it'd inconsistently get a 
> {{None}} due to failing the guard.
> We need to make sure the "add" and "get" methods are consistent. It could be 
> done by one of:
> 1. Adding the same {{supportedExpression()}} guard to {{addExpr()}}, or
> 2. Removing the guard from {{getExprState()}}, relying solely on the guard on 
> the "add" path to make sure only intended state is added.
> (or other alternative refactorings to fuse the guard into various methods to 
> make it more efficient)
> There are pros and cons to the two directions above, because {{addExpr()}} 
> used to allow (potentially incorrect) more expressions to get CSE'd, making 
> it more restrictive may cause performance regressions (for the cases that 
> happened to work).
> Example:
> {code:sql}
> select max(transform(array(id), x -> x)), max(transform(array(id), x -> x)) 
> from range(2)
> {code}
> Running this query on Spark 3.2 branch returns the correct value:
> {code}
> scala> spark.sql("select max(transform(array(id), x -> x)), 
> max(transform(array(id), x -> x)) from range(2)").collect
> res0: Array[org.apache.spark.sql.Row] = 
> Array([WrappedArray(1),WrappedArray(1)])
> {code}
> Here, {{transform(array(id), x -> x)}} is an {{AggregateExpression}} that was 
> (potentially unsafely) recognized by {{addExpr()}} as a common subexpression, 
> and {{getExprState()}} doesn't do extra guarding, so during physical 
> planning, in {{PhysicalAggregation}} this expression gets CSE'd in both the 
> aggregation expression list and the result expressions list.
> {code}
> AdaptiveSparkPlan isFinalPlan=false
> +- SortAggregate(key=[], functions=[max(transform(array(id#0L), 
> lambdafunction(lambda x#1L, lambda x#1L, false)))])
>+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=11]
>   +- SortAggregate(key=[], functions=[partial_max(transform(array(id#0L), 
> lambdafunction(lambda x#1L, lambda x#1L, false)))])
>  +- Range (0, 2, step=1, splits=16)
> {code}
> Running the same query on current master triggers an error when binding the 
> result expression to the aggregate expression in the Aggregate operators (for 
> a WSCG-enabled operator like {{HashAggregateExec}}, the same error would show 
> up during codegen):
> {code}
> ERROR TaskSetManager: Task 0 in stage 2.0 failed 1 times; aborting job
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 
> (TID 16) (ip-10-110-16-93.us-west-2.compute.internal executor driver): 
> java.lang.IllegalStateException: Couldn't find max(transform(array(id#0L), 
> lambdafunction(lambda x#2L, lambda x#2L, false)))#4 in 
> [max(transform(array(id#0L), lambdafunction(lambda x#1L, lambda x#1L, 
> false)))#3]
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:517)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1249)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1248)
>   at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:532)
>   at 
> 

[jira] [Commented] (SPARK-42851) EquivalentExpressions methods need to be consistently guarded by supportedExpression

2023-03-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702657#comment-17702657
 ] 

Apache Spark commented on SPARK-42851:
--

User 'peter-toth' has created a pull request for this issue:
https://github.com/apache/spark/pull/40488

> EquivalentExpressions methods need to be consistently guarded by 
> supportedExpression
> 
>
> Key: SPARK-42851
> URL: https://issues.apache.org/jira/browse/SPARK-42851
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Kris Mok
>Priority: Major
>
> SPARK-41468 tried to fix a bug but introduced a new regression. Its change to 
> {{EquivalentExpressions}} added a {{supportedExpression()}} guard to the 
> {{addExprTree()}} and {{getExprState()}} methods, but didn't add the same 
> guard to the other "add" entry point -- {{addExpr()}}.
> As such, uses that add single expressions to CSE via {{addExpr()}} may 
> succeed, but upon retrieval via {{getExprState()}} it'd inconsistently get a 
> {{None}} due to failing the guard.
> We need to make sure the "add" and "get" methods are consistent. It could be 
> done by one of:
> 1. Adding the same {{supportedExpression()}} guard to {{addExpr()}}, or
> 2. Removing the guard from {{getExprState()}}, relying solely on the guard on 
> the "add" path to make sure only intended state is added.
> (or other alternative refactorings to fuse the guard into various methods to 
> make it more efficient)
> There are pros and cons to the two directions above, because {{addExpr()}} 
> used to allow (potentially incorrect) more expressions to get CSE'd, making 
> it more restrictive may cause performance regressions (for the cases that 
> happened to work).
> Example:
> {code:sql}
> select max(transform(array(id), x -> x)), max(transform(array(id), x -> x)) 
> from range(2)
> {code}
> Running this query on Spark 3.2 branch returns the correct value:
> {code}
> scala> spark.sql("select max(transform(array(id), x -> x)), 
> max(transform(array(id), x -> x)) from range(2)").collect
> res0: Array[org.apache.spark.sql.Row] = 
> Array([WrappedArray(1),WrappedArray(1)])
> {code}
> Here, {{transform(array(id), x -> x)}} is an {{AggregateExpression}} that was 
> (potentially unsafely) recognized by {{addExpr()}} as a common subexpression, 
> and {{getExprState()}} doesn't do extra guarding, so during physical 
> planning, in {{PhysicalAggregation}} this expression gets CSE'd in both the 
> aggregation expression list and the result expressions list.
> {code}
> AdaptiveSparkPlan isFinalPlan=false
> +- SortAggregate(key=[], functions=[max(transform(array(id#0L), 
> lambdafunction(lambda x#1L, lambda x#1L, false)))])
>+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=11]
>   +- SortAggregate(key=[], functions=[partial_max(transform(array(id#0L), 
> lambdafunction(lambda x#1L, lambda x#1L, false)))])
>  +- Range (0, 2, step=1, splits=16)
> {code}
> Running the same query on current master triggers an error when binding the 
> result expression to the aggregate expression in the Aggregate operators (for 
> a WSCG-enabled operator like {{HashAggregateExec}}, the same error would show 
> up during codegen):
> {code}
> ERROR TaskSetManager: Task 0 in stage 2.0 failed 1 times; aborting job
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 
> (TID 16) (ip-10-110-16-93.us-west-2.compute.internal executor driver): 
> java.lang.IllegalStateException: Couldn't find max(transform(array(id#0L), 
> lambdafunction(lambda x#2L, lambda x#2L, false)))#4 in 
> [max(transform(array(id#0L), lambdafunction(lambda x#1L, lambda x#1L, 
> false)))#3]
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:517)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1249)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1248)
>   at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:532)
>   at 
> 

[jira] [Commented] (SPARK-42340) Implement GroupedData.applyInPandas

2023-03-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702632#comment-17702632
 ] 

Apache Spark commented on SPARK-42340:
--

User 'xinrong-meng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40486

> Implement GroupedData.applyInPandas
> ---
>
> Key: SPARK-42340
> URL: https://issues.apache.org/jira/browse/SPARK-42340
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42870) Move `toCatalystValue` to connect-common

2023-03-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42870:


Assignee: (was: Apache Spark)

> Move `toCatalystValue` to connect-common
> 
>
> Key: SPARK-42870
> URL: https://issues.apache.org/jira/browse/SPARK-42870
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42870) Move `toCatalystValue` to connect-common

2023-03-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702598#comment-17702598
 ] 

Apache Spark commented on SPARK-42870:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40485

> Move `toCatalystValue` to connect-common
> 
>
> Key: SPARK-42870
> URL: https://issues.apache.org/jira/browse/SPARK-42870
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >