date:20230905

[jira] [Created] (SPARK-45089) Remove obsolete repo of DB2 JDBC driver

2023-09-05 Thread Cheng Pan (Jira)

Cheng Pan created SPARK-45089:
-

 Summary: Remove obsolete repo of DB2 JDBC driver
 Key: SPARK-45089
 URL: https://issues.apache.org/jira/browse/SPARK-45089
 Project: Spark
  Issue Type: Test
  Components: Build, Tests
Affects Versions: 4.0.0
Reporter: Cheng Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44833) Spark Connect reattach when initial ExecutePlan didn't reach server doing too eager Reattach

2023-09-05 Thread Aparna Garg (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762282#comment-17762282
 ] 

Aparna Garg commented on SPARK-44833:
-

User 'juliuszsompolski' has created a pull request for this issue:
https://github.com/apache/spark/pull/42806

> Spark Connect reattach when initial ExecutePlan didn't reach server doing too 
> eager Reattach
> 
>
> Key: SPARK-44833
> URL: https://issues.apache.org/jira/browse/SPARK-44833
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Juliusz Sompolski
>Assignee: Juliusz Sompolski
>Priority: Major
> Fix For: 4.0.0, 3.5.1
>
>
> In
> {code:java}
> case ex: StatusRuntimeException
> if Option(StatusProto.fromThrowable(ex))
>   .exists(_.getMessage.contains("INVALID_HANDLE.OPERATION_NOT_FOUND")) =>
>   if (lastReturnedResponseId.isDefined) {
> throw new IllegalStateException(
>   "OPERATION_NOT_FOUND on the server but responses were already received 
> from it.",
>   ex)
>   }
>   // Try a new ExecutePlan, and throw upstream for retry.
> ->  iter = rawBlockingStub.executePlan(initialRequest)
> ->  throw new GrpcRetryHandler.RetryException {code}
> we call executePlan, and throw RetryException to have an exception handled 
> upstream.
> Then it goes to
> {code:java}
> retry {
>   if (firstTry) {
> // on first try, we use the existing iter.
> firstTry = false
>   } else {
> // on retry, the iter is borked, so we need a new one
> ->iter = rawBlockingStub.reattachExecute(createReattachExecuteRequest())
>   } {code}
> and because it's not firstTry, immediately does reattach.
> This causes no failure - the reattach will work and attach to the query, the 
> original executePlan will get detached. But it could be improved.
> Same issue is also present in python reattach.py.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45070) Describe the binary and datetime formats of `to_char`/`to_varchar`

2023-09-05 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-45070.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42801
[https://github.com/apache/spark/pull/42801]

> Describe the binary and datetime formats of `to_char`/`to_varchar`
> --
>
> Key: SPARK-45070
> URL: https://issues.apache.org/jira/browse/SPARK-45070
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 4.0.0
>
>
> In the PR, I propose to document the recent changes related to the `format` 
> of the `to_char`/`to_varchar` functions:
> 1. binary formats added by https://github.com/apache/spark/pull/42632
> 2. datetime formats introduced by https://github.com/apache/spark/pull/42534



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-44833) Spark Connect reattach when initial ExecutePlan didn't reach server doing too eager Reattach

2023-09-05 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-44833:


Assignee: Juliusz Sompolski

> Spark Connect reattach when initial ExecutePlan didn't reach server doing too 
> eager Reattach
> 
>
> Key: SPARK-44833
> URL: https://issues.apache.org/jira/browse/SPARK-44833
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Juliusz Sompolski
>Assignee: Juliusz Sompolski
>Priority: Major
>
> In
> {code:java}
> case ex: StatusRuntimeException
> if Option(StatusProto.fromThrowable(ex))
>   .exists(_.getMessage.contains("INVALID_HANDLE.OPERATION_NOT_FOUND")) =>
>   if (lastReturnedResponseId.isDefined) {
> throw new IllegalStateException(
>   "OPERATION_NOT_FOUND on the server but responses were already received 
> from it.",
>   ex)
>   }
>   // Try a new ExecutePlan, and throw upstream for retry.
> ->  iter = rawBlockingStub.executePlan(initialRequest)
> ->  throw new GrpcRetryHandler.RetryException {code}
> we call executePlan, and throw RetryException to have an exception handled 
> upstream.
> Then it goes to
> {code:java}
> retry {
>   if (firstTry) {
> // on first try, we use the existing iter.
> firstTry = false
>   } else {
> // on retry, the iter is borked, so we need a new one
> ->iter = rawBlockingStub.reattachExecute(createReattachExecuteRequest())
>   } {code}
> and because it's not firstTry, immediately does reattach.
> This causes no failure - the reattach will work and attach to the query, the 
> original executePlan will get detached. But it could be improved.
> Same issue is also present in python reattach.py.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-44833) Spark Connect reattach when initial ExecutePlan didn't reach server doing too eager Reattach

2023-09-05 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44833.
--
Fix Version/s: 3.5.1
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 42806
[https://github.com/apache/spark/pull/42806]

> Spark Connect reattach when initial ExecutePlan didn't reach server doing too 
> eager Reattach
> 
>
> Key: SPARK-44833
> URL: https://issues.apache.org/jira/browse/SPARK-44833
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Juliusz Sompolski
>Assignee: Juliusz Sompolski
>Priority: Major
> Fix For: 3.5.1, 4.0.0
>
>
> In
> {code:java}
> case ex: StatusRuntimeException
> if Option(StatusProto.fromThrowable(ex))
>   .exists(_.getMessage.contains("INVALID_HANDLE.OPERATION_NOT_FOUND")) =>
>   if (lastReturnedResponseId.isDefined) {
> throw new IllegalStateException(
>   "OPERATION_NOT_FOUND on the server but responses were already received 
> from it.",
>   ex)
>   }
>   // Try a new ExecutePlan, and throw upstream for retry.
> ->  iter = rawBlockingStub.executePlan(initialRequest)
> ->  throw new GrpcRetryHandler.RetryException {code}
> we call executePlan, and throw RetryException to have an exception handled 
> upstream.
> Then it goes to
> {code:java}
> retry {
>   if (firstTry) {
> // on first try, we use the existing iter.
> firstTry = false
>   } else {
> // on retry, the iter is borked, so we need a new one
> ->iter = rawBlockingStub.reattachExecute(createReattachExecuteRequest())
>   } {code}
> and because it's not firstTry, immediately does reattach.
> This causes no failure - the reattach will work and attach to the query, the 
> original executePlan will get detached. But it could be improved.
> Same issue is also present in python reattach.py.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45088) Make `getitem` work with duplicated columns

2023-09-05 Thread Aparna Garg (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762278#comment-17762278
 ] 

Aparna Garg commented on SPARK-45088:
-

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/42828

> Make `getitem` work with duplicated columns
> ---
>
> Key: SPARK-45088
> URL: https://issues.apache.org/jira/browse/SPARK-45088
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45088) Make `getitem` work with duplicated columns

2023-09-05 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-45088:
-

 Summary: Make `getitem` work with duplicated columns
 Key: SPARK-45088
 URL: https://issues.apache.org/jira/browse/SPARK-45088
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45087) Improve Python DataFrame API test coverage

2023-09-05 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-45087:
-

 Summary: Improve Python DataFrame API test coverage
 Key: SPARK-45087
 URL: https://issues.apache.org/jira/browse/SPARK-45087
 Project: Spark
  Issue Type: Umbrella
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44801) SQL Page does not capture failed queries in analyzer

2023-09-05 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762262#comment-17762262
 ] 

Snoot.io commented on SPARK-44801:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/42825

> SQL Page does not capture failed queries in analyzer 
> -
>
> Key: SPARK-44801
> URL: https://issues.apache.org/jira/browse/SPARK-44801
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 3.2.4, 3.3.2, 3.4.1, 3.5.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45086) Display hexadecimal for thread lock hash code

2023-09-05 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762261#comment-17762261
 ] 

Snoot.io commented on SPARK-45086:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/42826

> Display hexadecimal for thread lock hash code
> -
>
> Key: SPARK-45086
> URL: https://issues.apache.org/jira/browse/SPARK-45086
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.4.1, 3.5.0, 4.0.0
>Reporter: Kent Yao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45086) Display hexadecimal for thread lock hash code

2023-09-05 Thread Kent Yao (Jira)

Kent Yao created SPARK-45086:


 Summary: Display hexadecimal for thread lock hash code
 Key: SPARK-45086
 URL: https://issues.apache.org/jira/browse/SPARK-45086
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 3.4.1, 3.5.0, 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45071) Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

2023-09-05 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-45071:

Fix Version/s: 3.5.1
   (was: 3.5.0)

> Optimize the processing speed of `BinaryArithmetic#dataType` when processing 
> multi-column data
> --
>
> Key: SPARK-45071
> URL: https://issues.apache.org/jira/browse/SPARK-45071
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0, 3.5.0
>Reporter: ming95
>Assignee: ming95
>Priority: Major
> Fix For: 3.4.2, 4.0.0, 3.5.1
>
>
> Since `BinaryArithmetic#dataType` will recursively process the datatype of 
> each node, the driver will be very slow when multiple columns are processed.
> For example, the following code:
> {code:java}
> ```
>     import spark.implicits._
>     import scala.util.Random
>     import org.apache.spark.sql.functions.sum
>     import org.apache.spark.sql.types.{StructType, StructField, IntegerType}
>     val N = 30
>     val M = 100
>     val columns = Seq.fill(N)(Random.alphanumeric.take(8).mkString)
>     val data = Seq.fill(M)(Seq.fill(N)(Random.nextInt(16) - 5))
>     val schema = StructType(columns.map(StructField(_, IntegerType)))
>     val rdd = spark.sparkContext.parallelize(data.map(Row.fromSeq(_)))
>     val df = spark.createDataFrame(rdd, schema)
>     val colExprs = columns.map(sum(_))
>     // gen a new column , and add the other 30 column
>     df.withColumn("new_col_sum", expr(columns.mkString(" + ")))
> ```
> {code}
>  
> This code will take a few minutes for the driver to execute in the spark3.4 
> version, but only takes a few seconds to execute in the spark3.2 version. 
> Related issue: SPARK-39316



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45046) Set shadeTestJar of core module to false

2023-09-05 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762260#comment-17762260
 ] 

Snoot.io commented on SPARK-45046:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/42766

> Set shadeTestJar of core module to false
> 
>
> Key: SPARK-45046
> URL: https://issues.apache.org/jira/browse/SPARK-45046
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45046) Set shadeTestJar of core module to false

2023-09-05 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-45046.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42766
[https://github.com/apache/spark/pull/42766]

> Set shadeTestJar of core module to false
> 
>
> Key: SPARK-45046
> URL: https://issues.apache.org/jira/browse/SPARK-45046
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45071) Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

2023-09-05 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762257#comment-17762257
 ] 

Snoot.io commented on SPARK-45071:
--

User 'ming95' has created a pull request for this issue:
https://github.com/apache/spark/pull/42804

> Optimize the processing speed of `BinaryArithmetic#dataType` when processing 
> multi-column data
> --
>
> Key: SPARK-45071
> URL: https://issues.apache.org/jira/browse/SPARK-45071
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0, 3.5.0
>Reporter: ming95
>Assignee: ming95
>Priority: Major
> Fix For: 3.4.2, 3.5.0, 4.0.0
>
>
> Since `BinaryArithmetic#dataType` will recursively process the datatype of 
> each node, the driver will be very slow when multiple columns are processed.
> For example, the following code:
> {code:java}
> ```
>     import spark.implicits._
>     import scala.util.Random
>     import org.apache.spark.sql.functions.sum
>     import org.apache.spark.sql.types.{StructType, StructField, IntegerType}
>     val N = 30
>     val M = 100
>     val columns = Seq.fill(N)(Random.alphanumeric.take(8).mkString)
>     val data = Seq.fill(M)(Seq.fill(N)(Random.nextInt(16) - 5))
>     val schema = StructType(columns.map(StructField(_, IntegerType)))
>     val rdd = spark.sparkContext.parallelize(data.map(Row.fromSeq(_)))
>     val df = spark.createDataFrame(rdd, schema)
>     val colExprs = columns.map(sum(_))
>     // gen a new column , and add the other 30 column
>     df.withColumn("new_col_sum", expr(columns.mkString(" + ")))
> ```
> {code}
>  
> This code will take a few minutes for the driver to execute in the spark3.4 
> version, but only takes a few seconds to execute in the spark3.2 version. 
> Related issue: SPARK-39316



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45080) Kafka DSv2 streaming source implementation calls planInputPartitions 4 times per microbatch

2023-09-05 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762258#comment-17762258
 ] 

Snoot.io commented on SPARK-45080:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/42823

> Kafka DSv2 streaming source implementation calls planInputPartitions 4 times 
> per microbatch
> ---
>
> Key: SPARK-45080
> URL: https://issues.apache.org/jira/browse/SPARK-45080
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> I was tracking through method calls for DSv2 streaming source, and figured 
> out planInputPartitions is called 4 times per microbatch.
> It turned out that multiple calls of planInputPartitions is due to 
> `DataSourceV2ScanExecBase.supportsColumnar`, though it is called through 
> `MicroBatchScanExec.inputPartitions` which is defined as lazy, hence 
> shouldn't happen.
> The behavior seems to be coupled with catalyst and very hard to figure out 
> why, but with SPARK-44505, we can at least fix this per each data source.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45046) Set shadeTestJar of core module to false

2023-09-05 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-45046:


Assignee: Yang Jie

> Set shadeTestJar of core module to false
> 
>
> Key: SPARK-45046
> URL: https://issues.apache.org/jira/browse/SPARK-45046
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45071) Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

2023-09-05 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-45071.
-
Fix Version/s: 3.5.0
   4.0.0
   3.4.2
   Resolution: Fixed

Issue resolved by pull request 42804
[https://github.com/apache/spark/pull/42804]

> Optimize the processing speed of `BinaryArithmetic#dataType` when processing 
> multi-column data
> --
>
> Key: SPARK-45071
> URL: https://issues.apache.org/jira/browse/SPARK-45071
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0, 3.5.0
>Reporter: ming95
>Assignee: ming95
>Priority: Major
> Fix For: 3.5.0, 4.0.0, 3.4.2
>
>
> Since `BinaryArithmetic#dataType` will recursively process the datatype of 
> each node, the driver will be very slow when multiple columns are processed.
> For example, the following code:
> {code:java}
> ```
>     import spark.implicits._
>     import scala.util.Random
>     import org.apache.spark.sql.functions.sum
>     import org.apache.spark.sql.types.{StructType, StructField, IntegerType}
>     val N = 30
>     val M = 100
>     val columns = Seq.fill(N)(Random.alphanumeric.take(8).mkString)
>     val data = Seq.fill(M)(Seq.fill(N)(Random.nextInt(16) - 5))
>     val schema = StructType(columns.map(StructField(_, IntegerType)))
>     val rdd = spark.sparkContext.parallelize(data.map(Row.fromSeq(_)))
>     val df = spark.createDataFrame(rdd, schema)
>     val colExprs = columns.map(sum(_))
>     // gen a new column , and add the other 30 column
>     df.withColumn("new_col_sum", expr(columns.mkString(" + ")))
> ```
> {code}
>  
> This code will take a few minutes for the driver to execute in the spark3.4 
> version, but only takes a few seconds to execute in the spark3.2 version. 
> Related issue: SPARK-39316



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45071) Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

2023-09-05 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reassigned SPARK-45071:
---

Assignee: ming95

> Optimize the processing speed of `BinaryArithmetic#dataType` when processing 
> multi-column data
> --
>
> Key: SPARK-45071
> URL: https://issues.apache.org/jira/browse/SPARK-45071
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0, 3.5.0
>Reporter: ming95
>Assignee: ming95
>Priority: Major
>
> Since `BinaryArithmetic#dataType` will recursively process the datatype of 
> each node, the driver will be very slow when multiple columns are processed.
> For example, the following code:
> {code:java}
> ```
>     import spark.implicits._
>     import scala.util.Random
>     import org.apache.spark.sql.functions.sum
>     import org.apache.spark.sql.types.{StructType, StructField, IntegerType}
>     val N = 30
>     val M = 100
>     val columns = Seq.fill(N)(Random.alphanumeric.take(8).mkString)
>     val data = Seq.fill(M)(Seq.fill(N)(Random.nextInt(16) - 5))
>     val schema = StructType(columns.map(StructField(_, IntegerType)))
>     val rdd = spark.sparkContext.parallelize(data.map(Row.fromSeq(_)))
>     val df = spark.createDataFrame(rdd, schema)
>     val colExprs = columns.map(sum(_))
>     // gen a new column , and add the other 30 column
>     df.withColumn("new_col_sum", expr(columns.mkString(" + ")))
> ```
> {code}
>  
> This code will take a few minutes for the driver to execute in the spark3.4 
> version, but only takes a few seconds to execute in the spark3.2 version. 
> Related issue: SPARK-39316



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45077) Upgrade dagre-d3.js from 0.4.3 to 0.6.4

2023-09-05 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-45077:
-
Labels:   (was: pat)

> Upgrade dagre-d3.js from 0.4.3 to 0.6.4
> ---
>
> Key: SPARK-45077
> URL: https://issues.apache.org/jira/browse/SPARK-45077
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45083) Refine docstring of `min`

2023-09-05 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-45083:
-

Assignee: Allison Wang

> Refine docstring of `min`
> -
>
> Key: SPARK-45083
> URL: https://issues.apache.org/jira/browse/SPARK-45083
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>
> Refine the docstring of the function `min`. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45083) Refine docstring of `min`

2023-09-05 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-45083.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42821
[https://github.com/apache/spark/pull/42821]

> Refine docstring of `min`
> -
>
> Key: SPARK-45083
> URL: https://issues.apache.org/jira/browse/SPARK-45083
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
> Fix For: 4.0.0
>
>
> Refine the docstring of the function `min`. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45085) Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION and refactor some logic

2023-09-05 Thread BingKun Pan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan updated SPARK-45085:

Summary: Merge UNSUPPORTED_TEMP_VIEW_OPERATION into 
UNSUPPORTED_VIEW_OPERATION and refactor some logic  (was: Merge 
UNSUPPORTED_TEMP_VIEW_OPERATION to UNSUPPORTED_VIEW_OPERATION and refactor some 
logic)

> Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION and 
> refactor some logic
> -
>
> Key: SPARK-45085
> URL: https://issues.apache.org/jira/browse/SPARK-45085
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45085) Merge UNSUPPORTED_TEMP_VIEW_OPERATION to UNSUPPORTED_VIEW_OPERATION and refactor some logic

2023-09-05 Thread BingKun Pan (Jira)

BingKun Pan created SPARK-45085:
---

 Summary: Merge UNSUPPORTED_TEMP_VIEW_OPERATION to 
UNSUPPORTED_VIEW_OPERATION and refactor some logic
 Key: SPARK-45085
 URL: https://issues.apache.org/jira/browse/SPARK-45085
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44805) Data lost after union using spark.sql.parquet.enableNestedColumnVectorizedReader=true

2023-09-05 Thread Bruce Robbins (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762234#comment-17762234
 ] 

Bruce Robbins commented on SPARK-44805:
---

I looked at this yesterday and I think I have a handle on what's going on. I 
will make a PR in the coming days.

> Data lost after union using 
> spark.sql.parquet.enableNestedColumnVectorizedReader=true
> -
>
> Key: SPARK-44805
> URL: https://issues.apache.org/jira/browse/SPARK-44805
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.1
> Environment: pySpark, linux, hadoop, parquet. 
>Reporter: Jakub Wozniak
>Priority: Major
>  Labels: correctness
>
> When union-ing two DataFrames read from parquet containing nested structures 
> (2 fields of array types where one is double and second is integer) data from 
> the second field seems to be lost (zeros are set instead). 
> This seems to be the case only if nested vectorised reader is used 
> (spark.sql.parquet.enableNestedColumnVectorizedReader=true). 
> The following Python code reproduces the problem: 
> {code:java}
> from pyspark.sql import SparkSession
> from pyspark.sql.types import *
> # PREPARING DATA
> data1 = []
> data2 = []
> for i in range(2): 
>     data1.append( (([1,2,3],[1,1,2]),i))
>     data2.append( (([1.0,2.0,3.0],[1,1]),i+10))
> schema1 = StructType([
>         StructField('value', StructType([
>              StructField('f1', ArrayType(IntegerType()), True),
>              StructField('f2', ArrayType(IntegerType()), True)             
>              ])),
>          StructField('id', IntegerType(), True)
> ])
> schema2 = StructType([
>         StructField('value', StructType([
>              StructField('f1', ArrayType(DoubleType()), True),
>              StructField('f2', ArrayType(IntegerType()), True)             
>              ])),
>          StructField('id', IntegerType(), True)
> ])
> spark = SparkSession.builder.getOrCreate()
> data_dir = "/user//"
> df1 = spark.createDataFrame(data1, schema1)
> df1.write.mode('overwrite').parquet(data_dir + "data1") 
> df2 = spark.createDataFrame(data2, schema2)
> df2.write.mode('overwrite').parquet(data_dir + "data2") 
> # READING DATA
> parquet1 = spark.read.parquet(data_dir + "data1")
> parquet2 = spark.read.parquet(data_dir + "data2")
> # UNION
> out = parquet1.union(parquet2)
> parquet1.select("value.f2").distinct().show()
> out.select("value.f2").distinct().show()
> print(parquet1.collect())
> print(out.collect()) {code}
> Output: 
> {code:java}
> +-+
> |   f2|
> +-+
> |[1, 1, 2]|
> +-+
> +-+
> |   f2|
> +-+
> |[0, 0, 0]|
> |   [1, 1]|
> +-+
> [
> Row(value=Row(f1=[1, 2, 3], f2=[1, 1, 2]), id=0), 
> Row(value=Row(f1=[1, 2, 3], f2=[1, 1, 2]), id=1)
> ]
> [
> Row(value=Row(f1=[1.0, 2.0, 3.0], f2=[0, 0, 0]), id=0), 
> Row(value=Row(f1=[1.0, 2.0, 3.0], f2=[0, 0, 0]), id=1), 
> Row(value=Row(f1=[1.0, 2.0, 3.0], f2=[1, 1]), id=10), 
> Row(value=Row(f1=[1.0, 2.0, 3.0], f2=[1, 1]), id=11)
> ] {code}
> Please notice that values for the field f2 are lost after the union is done. 
> This only happens when this data is read from parquet files. 
> Could you please look into this? 
> Best regards,
> Jakub



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45084) ProgressReport should include an accurate effective shuffle partition number

2023-09-05 Thread Siying Dong (Jira)

Siying Dong created SPARK-45084:
---

 Summary: ProgressReport should include an accurate effective 
shuffle partition number
 Key: SPARK-45084
 URL: https://issues.apache.org/jira/browse/SPARK-45084
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.4.2
Reporter: Siying Dong


Currently, there is a numShufflePartitions "metric" reported in 
StateOperatorProgress part of the progress report. However, the number is 
reported by aggregating executors so in the case of task retry or speculative 
executor, the metric is higher than number of shuffle partitions for the query 
plan. Number of shuffle partitions can be useful for reporting purpose so 
having a metric is helpful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45083) Refine docstring of `min`

2023-09-05 Thread Allison Wang (Jira)

Allison Wang created SPARK-45083:


 Summary: Refine docstring of `min`
 Key: SPARK-45083
 URL: https://issues.apache.org/jira/browse/SPARK-45083
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Allison Wang


Refine the docstring of the function `min`. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45082) Review and fix issues in API docs

2023-09-05 Thread Yuanjian Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanjian Li updated SPARK-45082:

Fix Version/s: (was: 3.1.1)

> Review and fix issues in API docs
> -
>
> Key: SPARK-45082
> URL: https://issues.apache.org/jira/browse/SPARK-45082
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: Yuanjian Li
>Assignee: Yuanjian Li
>Priority: Major
>
> Compare the 3.4 API doc with the 3.5 RC3 cut. Fix the following issues:
>  * Remove the leaking class/object in API doc



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45057) Deadlock caused by rdd replication level of 2

2023-09-05 Thread Zhongwei Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhongwei Zhu updated SPARK-45057:
-
Description: 
 
When 2 tasks try to compute same rdd with replication level of 2 and running on 
only 2 executors. Deadlock will happen.

Task only release lock after writing into local machine and replicate to remote 
executor.

 
||Time||Exe 1 (Task Thread T1)||Exe 1 (Shuffle Server Thread T2)||Exe 2 (Task 
Thread T3)||Exe 2 (Shuffle Server Thread T4)||
|T0|write lock of rdd| | | |
|T1| | |write lock of rdd| |
|T2|replicate -> UploadBlockSync (blocked by T4)| | | |
|T3| | | |Received UploadBlock request from T1 (blocked by T4)|
|T4| | |replicate -> UploadBlockSync (blocked by T2)| |
|T5| |Received UploadBlock request from T3 (blocked by T1)| | |
|T6|Deadlock|Deadlock|Deadlock|Deadlock|

  was:
 
When 2 tasks try to compute same rdd with replication level of 2 and running on 
only 2 executors. Deadlock will happen.

 
||Time||Exe 1 (Task Thread T1)||Exe 1 (Shuffle Server Thread T2)||Exe 2 (Task 
Thread T3)||Exe 2 (Shuffle Server Thread T4)||
|T0|write lock of rdd| | | |
|T1| | |write lock of rdd| |
|T2|replicate -> UploadBlockSync (blocked by T4)| | | |
|T3| | | |Received UploadBlock request from T1 (blocked by T4)|
|T4| | |replicate -> UploadBlockSync (blocked by T2)| |
|T5| |Received UploadBlock request from T3 (blocked by T1)| | |
|T6|Deadlock|Deadlock|Deadlock|Deadlock|


> Deadlock caused by rdd replication level of 2
> -
>
> Key: SPARK-45057
> URL: https://issues.apache.org/jira/browse/SPARK-45057
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.1
>Reporter: Zhongwei Zhu
>Priority: Major
>
>  
> When 2 tasks try to compute same rdd with replication level of 2 and running 
> on only 2 executors. Deadlock will happen.
> Task only release lock after writing into local machine and replicate to 
> remote executor.
>  
> ||Time||Exe 1 (Task Thread T1)||Exe 1 (Shuffle Server Thread T2)||Exe 2 (Task 
> Thread T3)||Exe 2 (Shuffle Server Thread T4)||
> |T0|write lock of rdd| | | |
> |T1| | |write lock of rdd| |
> |T2|replicate -> UploadBlockSync (blocked by T4)| | | |
> |T3| | | |Received UploadBlock request from T1 (blocked by T4)|
> |T4| | |replicate -> UploadBlockSync (blocked by T2)| |
> |T5| |Received UploadBlock request from T3 (blocked by T1)| | |
> |T6|Deadlock|Deadlock|Deadlock|Deadlock|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45051) Connect: Use UUIDv7 for operation IDs to make operations chronologically sortable

2023-09-05 Thread Robert Dillitz (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Dillitz resolved SPARK-45051.

Resolution: Abandoned

We agreed that the benefits of adding this are not big enough because we can 
not rely on the operation ID being UUIDv7 and need to sort by startDate anyway. 
Closing this PR.

> Connect: Use UUIDv7 for operation IDs to make operations chronologically 
> sortable
> -
>
> Key: SPARK-45051
> URL: https://issues.apache.org/jira/browse/SPARK-45051
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.1
>Reporter: Robert Dillitz
>Priority: Major
>  Labels: Connect
>
> Spark Connect currently uses UUIDv4 for operation IDs. Using UUIDv7 instead 
> allows us to sort operations by ID to receive a chronological order while 
> keeping the collision-free properties we require from this ID.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45082) Review and fix issues in API docs

2023-09-05 Thread Yuanjian Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanjian Li updated SPARK-45082:

Description: 
Compare the 3.4 API doc with the 3.5 RC3 cut. Fix the following issues:
 * Remove the leaking class/object in API doc

  was:
Compare the 3.1.1 API doc with the latest release version 3.0.1. Fix the 
following issues:
 * Add missing `Since` annotation for new APIs
 * Remove the leaking class/object in API doc


> Review and fix issues in API docs
> -
>
> Key: SPARK-45082
> URL: https://issues.apache.org/jira/browse/SPARK-45082
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: Yuanjian Li
>Assignee: Yuanjian Li
>Priority: Major
> Fix For: 3.1.1
>
>
> Compare the 3.4 API doc with the 3.5 RC3 cut. Fix the following issues:
>  * Remove the leaking class/object in API doc



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45082) Review and fix issues in API docs

2023-09-05 Thread Yuanjian Li (Jira)

Yuanjian Li created SPARK-45082:
---

 Summary: Review and fix issues in API docs
 Key: SPARK-45082
 URL: https://issues.apache.org/jira/browse/SPARK-45082
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 3.1.1
Reporter: Yuanjian Li
Assignee: Yuanjian Li
 Fix For: 3.1.1


Compare the 3.1.1 API doc with the latest release version 3.0.1. Fix the 
following issues:
 * Add missing `Since` annotation for new APIs
 * Remove the leaking class/object in API doc



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45082) Review and fix issues in API docs

2023-09-05 Thread Yuanjian Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanjian Li updated SPARK-45082:

Affects Version/s: 3.5.0
   (was: 3.1.1)

> Review and fix issues in API docs
> -
>
> Key: SPARK-45082
> URL: https://issues.apache.org/jira/browse/SPARK-45082
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: Yuanjian Li
>Assignee: Yuanjian Li
>Priority: Major
> Fix For: 3.1.1
>
>
> Compare the 3.1.1 API doc with the latest release version 3.0.1. Fix the 
> following issues:
>  * Add missing `Since` annotation for new APIs
>  * Remove the leaking class/object in API doc



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45072) Fix Outerscopes for same cell evaluation

2023-09-05 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45072.
---
Fix Version/s: 3.5.1
   Resolution: Fixed

This is resolved via https://github.com/apache/spark/pull/42807

> Fix Outerscopes for same cell evaluation
> 
>
> Key: SPARK-45072
> URL: https://issues.apache.org/jira/browse/SPARK-45072
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
> Fix For: 3.5.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45072) Fix Outerscopes for same cell evaluation

2023-09-05 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45072:
--
Issue Type: Bug  (was: New Feature)

> Fix Outerscopes for same cell evaluation
> 
>
> Key: SPARK-45072
> URL: https://issues.apache.org/jira/browse/SPARK-45072
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44284) Introduce simpe conf system for sql/api

2023-09-05 Thread Jira



[ 
https://issues.apache.org/jira/browse/SPARK-44284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762123#comment-17762123
 ] 

Herman van Hövell commented on SPARK-44284:
---

I added a description. IMO the change itself not too spectacular.

> Introduce simpe conf system for sql/api
> ---
>
> Key: SPARK-44284
> URL: https://issues.apache.org/jira/browse/SPARK-44284
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
> Fix For: 3.5.0
>
>
> Create a simple conf system for classes in sql/api. This is needed for a 
> number of classes that are moved from sql/catalyst to sql/api that require 
> configuration access (e.g. timeZone, parsing behavior, ...).
> The change will add a small common interface that allows you to read the 
> needed configurations, this interface is implemented by SQLConf and SQLConf 
> will be used when we are executing on the driver, and there will be an 
> implementation using the default values for when we are in Connect mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45081) Encoders.bean does no longer work with read-only properties

2023-09-05 Thread Giambattista Bloisi (Jira)

Giambattista Bloisi created SPARK-45081:
---

 Summary: Encoders.bean does no longer work with read-only 
properties
 Key: SPARK-45081
 URL: https://issues.apache.org/jira/browse/SPARK-45081
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.1
Reporter: Giambattista Bloisi


Since Spark 3.4.x an exception is thrown when Encoders.bean is called providing 
a bean having read-only properties, such as:

 
{code:java}
public static class ReadOnlyPropertyBean implements Serializable {
public boolean isEmpty() {
  return true;
}
} {code}
 

 
Encoders.bean(ReadOnlyPropertyBean.class) will throw:
{code:java}
java.util.NoSuchElementException: None.get
        at scala.None$.get(Option.scala:529)
        at scala.None$.get(Option.scala:527)
        at 
org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$deserializerFor$8(ScalaReflection.scala:359)
        at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
        at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
        at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
        at scala.collection.TraversableLike.map(TraversableLike.scala:286)
        at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
        at scala.collection.AbstractTraversable.map(Traversable.scala:108)
        at 
org.apache.spark.sql.catalyst.ScalaReflection$.deserializerFor(ScalaReflection.scala:348)
        at 
org.apache.spark.sql.catalyst.ScalaReflection$.deserializerFor(ScalaReflection.scala:183)
        at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:56)
        at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:62)
        at org.apache.spark.sql.Encoders$.bean(Encoders.scala:179)
        at org.apache.spark.sql.Encoders.bean(Encoders.scala) {code}
This problem is described also in [link Encoders.bean doesn't work anymore on a 
Java POJO, with Spark 
3.4.0|https://stackoverflow.com/questions/76036349/encoders-bean-doesnt-work-anymore-on-a-java-pojo-with-spark-3-4-0]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44284) Introduce simpe conf system for sql/api

2023-09-05 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-44284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell updated SPARK-44284:
--
Description: 
Create a simple conf system for classes in sql/api. This is needed for a number 
of classes that are moved from sql/catalyst to sql/api that require 
configuration access (e.g. timeZone, parsing behavior, ...).

The change will add a small common interface that allows you to read the needed 
configurations, this interface is implemented by SQLConf and SQLConf will be 
used when we are executing on the driver, and there will be an implementation 
using the default values for when we are in Connect mode.

  was:Create a simple conf system for classes in sql/api


> Introduce simpe conf system for sql/api
> ---
>
> Key: SPARK-44284
> URL: https://issues.apache.org/jira/browse/SPARK-44284
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
> Fix For: 3.5.0
>
>
> Create a simple conf system for classes in sql/api. This is needed for a 
> number of classes that are moved from sql/catalyst to sql/api that require 
> configuration access (e.g. timeZone, parsing behavior, ...).
> The change will add a small common interface that allows you to read the 
> needed configurations, this interface is implemented by SQLConf and SQLConf 
> will be used when we are executing on the driver, and there will be an 
> implementation using the default values for when we are in Connect mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45075) Alter table with invalid default value will not report error

2023-09-05 Thread Ignite TC Bot (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762110#comment-17762110
 ] 

Ignite TC Bot commented on SPARK-45075:
---

User 'Hisoka-X' has created a pull request for this issue:
https://github.com/apache/spark/pull/42810

> Alter table with invalid default value will not report error
> 
>
> Key: SPARK-45075
> URL: https://issues.apache.org/jira/browse/SPARK-45075
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Jia Fan
>Priority: Major
>
> create table t(i boolean, s bigint);
> alter table t alter column s set default badvalue;
>  
> The code wouldn't report error on DataSource V2, not align with V1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44284) Introduce simpe conf system for sql/api

2023-09-05 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762102#comment-17762102
 ] 

Thomas Graves commented on SPARK-44284:
---

Can we get a description on this? This seems like a fairly significant change 
for a one line without description here or in the pr.

> Introduce simpe conf system for sql/api
> ---
>
> Key: SPARK-44284
> URL: https://issues.apache.org/jira/browse/SPARK-44284
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
> Fix For: 3.5.0
>
>
> Create a simple conf system for classes in sql/api



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45080) Kafka DSv2 streaming source implementation calls planInputPartitions 4 times per microbatch

2023-09-05 Thread Jungtaek Lim (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762089#comment-17762089
 ] 

Jungtaek Lim commented on SPARK-45080:
--

Working on this. Will submit a PR sooner.

> Kafka DSv2 streaming source implementation calls planInputPartitions 4 times 
> per microbatch
> ---
>
> Key: SPARK-45080
> URL: https://issues.apache.org/jira/browse/SPARK-45080
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> I was tracking through method calls for DSv2 streaming source, and figured 
> out planInputPartitions is called 4 times per microbatch.
> It turned out that multiple calls of planInputPartitions is due to 
> `DataSourceV2ScanExecBase.supportsColumnar`, though it is called through 
> `MicroBatchScanExec.inputPartitions` which is defined as lazy, hence 
> shouldn't happen.
> The behavior seems to be coupled with catalyst and very hard to figure out 
> why, but with SPARK-44505, we can at least fix this per each data source.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45080) Kafka DSv2 streaming source implementation calls planInputPartitions 4 times per microbatch

2023-09-05 Thread Jungtaek Lim (Jira)

Jungtaek Lim created SPARK-45080:


 Summary: Kafka DSv2 streaming source implementation calls 
planInputPartitions 4 times per microbatch
 Key: SPARK-45080
 URL: https://issues.apache.org/jira/browse/SPARK-45080
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 4.0.0
Reporter: Jungtaek Lim


I was tracking through method calls for DSv2 streaming source, and figured out 
planInputPartitions is called 4 times per microbatch.

It turned out that multiple calls of planInputPartitions is due to 
`DataSourceV2ScanExecBase.supportsColumnar`, though it is called through 
`MicroBatchScanExec.inputPartitions` which is defined as lazy, hence shouldn't 
happen.

The behavior seems to be coupled with catalyst and very hard to figure out why, 
but with SPARK-44505, we can at least fix this per each data source.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45079) percentile_approx() fails with an internal error on NULL accuracy

2023-09-05 Thread Aparna Garg (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762061#comment-17762061
 ] 

Aparna Garg commented on SPARK-45079:
-

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/42817

> percentile_approx() fails with an internal error on NULL accuracy
> -
>
> Key: SPARK-45079
> URL: https://issues.apache.org/jira/browse/SPARK-45079
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.1, 3.5.0, 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> The example below demonstrates the issue:
> {code:sql}
> spark-sql (default)> SELECT percentile_approx(col, array(0.5, 0.4, 0.1), 
> NULL) FROM VALUES (0), (1), (2), (10) AS tab(col);
> [INTERNAL_ERROR] The Spark SQL phase analysis failed with an internal error. 
> You hit a bug in Spark or the Spark plugins you use. Please, report this bug 
> to the corresponding communities or vendors, and provide the full stack trace.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45079) percentile_approx() fails with an internal error on NULL accuracy

2023-09-05 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-45079:
-
Affects Version/s: 3.3.2

> percentile_approx() fails with an internal error on NULL accuracy
> -
>
> Key: SPARK-45079
> URL: https://issues.apache.org/jira/browse/SPARK-45079
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.1, 3.5.0, 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> The example below demonstrates the issue:
> {code:sql}
> spark-sql (default)> SELECT percentile_approx(col, array(0.5, 0.4, 0.1), 
> NULL) FROM VALUES (0), (1), (2), (10) AS tab(col);
> [INTERNAL_ERROR] The Spark SQL phase analysis failed with an internal error. 
> You hit a bug in Spark or the Spark plugins you use. Please, report this bug 
> to the corresponding communities or vendors, and provide the full stack trace.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45079) percentile_approx() fails with an internal error on NULL accuracy

2023-09-05 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-45079:
-
Affects Version/s: 3.5.0

> percentile_approx() fails with an internal error on NULL accuracy
> -
>
> Key: SPARK-45079
> URL: https://issues.apache.org/jira/browse/SPARK-45079
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0, 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> The example below demonstrates the issue:
> {code:sql}
> spark-sql (default)> SELECT percentile_approx(col, array(0.5, 0.4, 0.1), 
> NULL) FROM VALUES (0), (1), (2), (10) AS tab(col);
> [INTERNAL_ERROR] The Spark SQL phase analysis failed with an internal error. 
> You hit a bug in Spark or the Spark plugins you use. Please, report this bug 
> to the corresponding communities or vendors, and provide the full stack trace.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45079) percentile_approx() fails with an internal error on NULL accuracy

2023-09-05 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-45079:
-
Affects Version/s: 3.4.1

> percentile_approx() fails with an internal error on NULL accuracy
> -
>
> Key: SPARK-45079
> URL: https://issues.apache.org/jira/browse/SPARK-45079
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> The example below demonstrates the issue:
> {code:sql}
> spark-sql (default)> SELECT percentile_approx(col, array(0.5, 0.4, 0.1), 
> NULL) FROM VALUES (0), (1), (2), (10) AS tab(col);
> [INTERNAL_ERROR] The Spark SQL phase analysis failed with an internal error. 
> You hit a bug in Spark or the Spark plugins you use. Please, report this bug 
> to the corresponding communities or vendors, and provide the full stack trace.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44404) Assign names to the error class _LEGACY_ERROR_TEMP_[1009,1010,1013,1015,1016,1278]

2023-09-05 Thread Aparna Garg (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762055#comment-17762055
 ] 

Aparna Garg commented on SPARK-44404:
-

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/42109

> Assign names to the error class 
> _LEGACY_ERROR_TEMP_[1009,1010,1013,1015,1016,1278]
> --
>
> Key: SPARK-44404
> URL: https://issues.apache.org/jira/browse/SPARK-44404
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45079) percentile_approx() fails with an internal error on NULL accuracy

2023-09-05 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-45079:
-
Description: 
The example below demonstrates the issue:

{code:sql}
spark-sql (default)> SELECT percentile_approx(col, array(0.5, 0.4, 0.1), NULL) 
FROM VALUES (0), (1), (2), (10) AS tab(col);
[INTERNAL_ERROR] The Spark SQL phase analysis failed with an internal error. 
You hit a bug in Spark or the Spark plugins you use. Please, report this bug to 
the corresponding communities or vendors, and provide the full stack trace.
{code}


  was:
The example below demonstrates the issue:

{code:sql}
spark-sql (default)> SELECT to_char(x'537061726b2053514c', CAST(NULL AS 
STRING));
[INTERNAL_ERROR] The Spark SQL phase analysis failed with an internal error. 
You hit a bug in Spark or the Spark plugins you use. Please, report this bug to 
the corresponding communities or vendors, and provide the full stack trace.
{code}



> percentile_approx() fails with an internal error on NULL accuracy
> -
>
> Key: SPARK-45079
> URL: https://issues.apache.org/jira/browse/SPARK-45079
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> The example below demonstrates the issue:
> {code:sql}
> spark-sql (default)> SELECT percentile_approx(col, array(0.5, 0.4, 0.1), 
> NULL) FROM VALUES (0), (1), (2), (10) AS tab(col);
> [INTERNAL_ERROR] The Spark SQL phase analysis failed with an internal error. 
> You hit a bug in Spark or the Spark plugins you use. Please, report this bug 
> to the corresponding communities or vendors, and provide the full stack trace.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45079) percentile_approx() fails with an internal error on NULL accuracy

2023-09-05 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-45079:
-
Fix Version/s: (was: 4.0.0)

> percentile_approx() fails with an internal error on NULL accuracy
> -
>
> Key: SPARK-45079
> URL: https://issues.apache.org/jira/browse/SPARK-45079
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> The example below demonstrates the issue:
> {code:sql}
> spark-sql (default)> SELECT to_char(x'537061726b2053514c', CAST(NULL AS 
> STRING));
> [INTERNAL_ERROR] The Spark SQL phase analysis failed with an internal error. 
> You hit a bug in Spark or the Spark plugins you use. Please, report this bug 
> to the corresponding communities or vendors, and provide the full stack trace.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45079) percentile_approx() fails with an internal error on NULL accuracy

2023-09-05 Thread Max Gekk (Jira)

Max Gekk created SPARK-45079:


 Summary: percentile_approx() fails with an internal error on NULL 
accuracy
 Key: SPARK-45079
 URL: https://issues.apache.org/jira/browse/SPARK-45079
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: Max Gekk
Assignee: Max Gekk
 Fix For: 4.0.0


The example below demonstrates the issue:

{code:sql}
spark-sql (default)> SELECT to_char(x'537061726b2053514c', CAST(NULL AS 
STRING));
[INTERNAL_ERROR] The Spark SQL phase analysis failed with an internal error. 
You hit a bug in Spark or the Spark plugins you use. Please, report this bug to 
the corresponding communities or vendors, and provide the full stack trace.
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45078) The ArrayInsert function should make explicit casting when element type not equals derived component type

2023-09-05 Thread Ran Tao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ran Tao updated SPARK-45078:

Description: 
Generally speaking, array_insert has same insert semantic with  
array_prepend/array_append. however, if we run sql use element cast like below, 
array_prepend/array_append can get right result. but array_insert fails.
{code:java}
spark-sql (default)> select array_prepend(array(1), cast(2 as tinyint));
[2,1]
Time taken: 0.123 seconds, Fetched 1 row(s) {code}
{code:java}
spark-sql (default)> select array_append(array(1), cast(2 as tinyint)); [1,2] 
Time taken: 0.206 seconds, Fetched 1 row(s)
{code}
{code:java}
spark-sql (default)> select array_insert(array(1), 2, cast(2 as tinyint));
[DATATYPE_MISMATCH.ARRAY_FUNCTION_DIFF_TYPES] Cannot resolve 
"array_insert(array(1), 2, CAST(2 AS TINYINT))" due to data type mismatch: 
Input to `array_insert` should have been "ARRAY" followed by a value with same 
element type, but it's ["ARRAY", "TINYINT"].; line 1 pos 7;
'Project [unresolvedalias(array_insert(array(1), 2, cast(2 as tinyint)), None)]
+- OneRowRelation {code}
The reported error is clear, however, we may should do explicit casting here. 
because multiset type such as array or map allow the operands of same type 
family  to coexist.

  was:
Generally speaking, array_insert has same insert semantic with  
array_prepend/array_append. however, if we run sql use element cast like below,
 array_prepend/array_append can get right result. but array_insert fails.
{code:java}
spark-sql (default)> select array_prepend(array(1), cast(2 as tinyint));
[2,1]
Time taken: 0.123 seconds, Fetched 1 row(s) {code}
{code:java}
spark-sql (default)> select array_append(array(1), cast(2 as tinyint)); [1,2] 
Time taken: 0.206 seconds, Fetched 1 row(s)
{code}
{code:java}
spark-sql (default)> select array_insert(array(1), 2, cast(2 as tinyint));
[DATATYPE_MISMATCH.ARRAY_FUNCTION_DIFF_TYPES] Cannot resolve 
"array_insert(array(1), 2, CAST(2 AS TINYINT))" due to data type mismatch: 
Input to `array_insert` should have been "ARRAY" followed by a value with same 
element type, but it's ["ARRAY", "TINYINT"].; line 1 pos 7;
'Project [unresolvedalias(array_insert(array(1), 2, cast(2 as tinyint)), None)]
+- OneRowRelation {code}

The reported error is clear, however, we may should do explicit casting here. 
because multiset type such as array or map allow the operands of same type 
family  to coexist.


> The ArrayInsert function should make explicit casting when element type not 
> equals derived component type
> -
>
> Key: SPARK-45078
> URL: https://issues.apache.org/jira/browse/SPARK-45078
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Ran Tao
>Priority: Major
>
> Generally speaking, array_insert has same insert semantic with  
> array_prepend/array_append. however, if we run sql use element cast like 
> below, array_prepend/array_append can get right result. but array_insert 
> fails.
> {code:java}
> spark-sql (default)> select array_prepend(array(1), cast(2 as tinyint));
> [2,1]
> Time taken: 0.123 seconds, Fetched 1 row(s) {code}
> {code:java}
> spark-sql (default)> select array_append(array(1), cast(2 as tinyint)); [1,2] 
> Time taken: 0.206 seconds, Fetched 1 row(s)
> {code}
> {code:java}
> spark-sql (default)> select array_insert(array(1), 2, cast(2 as tinyint));
> [DATATYPE_MISMATCH.ARRAY_FUNCTION_DIFF_TYPES] Cannot resolve 
> "array_insert(array(1), 2, CAST(2 AS TINYINT))" due to data type mismatch: 
> Input to `array_insert` should have been "ARRAY" followed by a value with 
> same element type, but it's ["ARRAY", "TINYINT"].; line 1 pos 7;
> 'Project [unresolvedalias(array_insert(array(1), 2, cast(2 as tinyint)), 
> None)]
> +- OneRowRelation {code}
> The reported error is clear, however, we may should do explicit casting here. 
> because multiset type such as array or map allow the operands of same type 
> family  to coexist.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45078) The ArrayInsert function should make explicit casting when element type not equals derived component type

2023-09-05 Thread Ran Tao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ran Tao updated SPARK-45078:

Description: 
Generally speaking, array_insert has same insert semantic with  
array_prepend/array_append. however, if we run sql use element cast like below, 
array_prepend/array_append can get right result. but array_insert failed.
{code:java}
spark-sql (default)> select array_prepend(array(1), cast(2 as tinyint));
[2,1]
Time taken: 0.123 seconds, Fetched 1 row(s) {code}
{code:java}
spark-sql (default)> select array_append(array(1), cast(2 as tinyint)); [1,2] 
Time taken: 0.206 seconds, Fetched 1 row(s)
{code}
{code:java}
spark-sql (default)> select array_insert(array(1), 2, cast(2 as tinyint));
[DATATYPE_MISMATCH.ARRAY_FUNCTION_DIFF_TYPES] Cannot resolve 
"array_insert(array(1), 2, CAST(2 AS TINYINT))" due to data type mismatch: 
Input to `array_insert` should have been "ARRAY" followed by a value with same 
element type, but it's ["ARRAY", "TINYINT"].; line 1 pos 7;
'Project [unresolvedalias(array_insert(array(1), 2, cast(2 as tinyint)), None)]
+- OneRowRelation {code}
The reported error is clear, however, we may should do explicit casting here. 
because multiset type such as array or map allow the operands of same type 
family  to coexist.

  was:
Generally speaking, array_insert has same insert semantic with  
array_prepend/array_append. however, if we run sql use element cast like below, 
array_prepend/array_append can get right result. but array_insert fails.
{code:java}
spark-sql (default)> select array_prepend(array(1), cast(2 as tinyint));
[2,1]
Time taken: 0.123 seconds, Fetched 1 row(s) {code}
{code:java}
spark-sql (default)> select array_append(array(1), cast(2 as tinyint)); [1,2] 
Time taken: 0.206 seconds, Fetched 1 row(s)
{code}
{code:java}
spark-sql (default)> select array_insert(array(1), 2, cast(2 as tinyint));
[DATATYPE_MISMATCH.ARRAY_FUNCTION_DIFF_TYPES] Cannot resolve 
"array_insert(array(1), 2, CAST(2 AS TINYINT))" due to data type mismatch: 
Input to `array_insert` should have been "ARRAY" followed by a value with same 
element type, but it's ["ARRAY", "TINYINT"].; line 1 pos 7;
'Project [unresolvedalias(array_insert(array(1), 2, cast(2 as tinyint)), None)]
+- OneRowRelation {code}
The reported error is clear, however, we may should do explicit casting here. 
because multiset type such as array or map allow the operands of same type 
family  to coexist.


> The ArrayInsert function should make explicit casting when element type not 
> equals derived component type
> -
>
> Key: SPARK-45078
> URL: https://issues.apache.org/jira/browse/SPARK-45078
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Ran Tao
>Priority: Major
>
> Generally speaking, array_insert has same insert semantic with  
> array_prepend/array_append. however, if we run sql use element cast like 
> below, array_prepend/array_append can get right result. but array_insert 
> failed.
> {code:java}
> spark-sql (default)> select array_prepend(array(1), cast(2 as tinyint));
> [2,1]
> Time taken: 0.123 seconds, Fetched 1 row(s) {code}
> {code:java}
> spark-sql (default)> select array_append(array(1), cast(2 as tinyint)); [1,2] 
> Time taken: 0.206 seconds, Fetched 1 row(s)
> {code}
> {code:java}
> spark-sql (default)> select array_insert(array(1), 2, cast(2 as tinyint));
> [DATATYPE_MISMATCH.ARRAY_FUNCTION_DIFF_TYPES] Cannot resolve 
> "array_insert(array(1), 2, CAST(2 AS TINYINT))" due to data type mismatch: 
> Input to `array_insert` should have been "ARRAY" followed by a value with 
> same element type, but it's ["ARRAY", "TINYINT"].; line 1 pos 7;
> 'Project [unresolvedalias(array_insert(array(1), 2, cast(2 as tinyint)), 
> None)]
> +- OneRowRelation {code}
> The reported error is clear, however, we may should do explicit casting here. 
> because multiset type such as array or map allow the operands of same type 
> family  to coexist.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45078) The ArrayInsert function should make explicit casting when element type not equals derived component type

2023-09-05 Thread Ran Tao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ran Tao updated SPARK-45078:

Description: 
Generally speaking, array_insert has same insert semantic with  
array_prepend/array_append. however, if we run sql use element cast like below, 
array_prepend/array_append can get right result. but array_insert failed.
{code:java}
spark-sql (default)> select array_prepend(array(1), cast(2 as tinyint));
[2,1]
Time taken: 0.123 seconds, Fetched 1 row(s) {code}
{code:java}
spark-sql (default)> select array_append(array(1), cast(2 as tinyint)); 
[1,2] 
Time taken: 0.206 seconds, Fetched 1 row(s)
{code}
{code:java}
spark-sql (default)> select array_insert(array(1), 2, cast(2 as tinyint));
[DATATYPE_MISMATCH.ARRAY_FUNCTION_DIFF_TYPES] Cannot resolve 
"array_insert(array(1), 2, CAST(2 AS TINYINT))" due to data type mismatch: 
Input to `array_insert` should have been "ARRAY" followed by a value with same 
element type, but it's ["ARRAY", "TINYINT"].; line 1 pos 7;
'Project [unresolvedalias(array_insert(array(1), 2, cast(2 as tinyint)), None)]
+- OneRowRelation {code}
The reported error is clear, however, we may should do explicit casting here. 
because multiset type such as array or map allow the operands of same type 
family  to coexist.

  was:
Generally speaking, array_insert has same insert semantic with  
array_prepend/array_append. however, if we run sql use element cast like below, 
array_prepend/array_append can get right result. but array_insert failed.
{code:java}
spark-sql (default)> select array_prepend(array(1), cast(2 as tinyint));
[2,1]
Time taken: 0.123 seconds, Fetched 1 row(s) {code}
{code:java}
spark-sql (default)> select array_append(array(1), cast(2 as tinyint)); [1,2] 
Time taken: 0.206 seconds, Fetched 1 row(s)
{code}
{code:java}
spark-sql (default)> select array_insert(array(1), 2, cast(2 as tinyint));
[DATATYPE_MISMATCH.ARRAY_FUNCTION_DIFF_TYPES] Cannot resolve 
"array_insert(array(1), 2, CAST(2 AS TINYINT))" due to data type mismatch: 
Input to `array_insert` should have been "ARRAY" followed by a value with same 
element type, but it's ["ARRAY", "TINYINT"].; line 1 pos 7;
'Project [unresolvedalias(array_insert(array(1), 2, cast(2 as tinyint)), None)]
+- OneRowRelation {code}
The reported error is clear, however, we may should do explicit casting here. 
because multiset type such as array or map allow the operands of same type 
family  to coexist.


> The ArrayInsert function should make explicit casting when element type not 
> equals derived component type
> -
>
> Key: SPARK-45078
> URL: https://issues.apache.org/jira/browse/SPARK-45078
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Ran Tao
>Priority: Major
>
> Generally speaking, array_insert has same insert semantic with  
> array_prepend/array_append. however, if we run sql use element cast like 
> below, array_prepend/array_append can get right result. but array_insert 
> failed.
> {code:java}
> spark-sql (default)> select array_prepend(array(1), cast(2 as tinyint));
> [2,1]
> Time taken: 0.123 seconds, Fetched 1 row(s) {code}
> {code:java}
> spark-sql (default)> select array_append(array(1), cast(2 as tinyint)); 
> [1,2] 
> Time taken: 0.206 seconds, Fetched 1 row(s)
> {code}
> {code:java}
> spark-sql (default)> select array_insert(array(1), 2, cast(2 as tinyint));
> [DATATYPE_MISMATCH.ARRAY_FUNCTION_DIFF_TYPES] Cannot resolve 
> "array_insert(array(1), 2, CAST(2 AS TINYINT))" due to data type mismatch: 
> Input to `array_insert` should have been "ARRAY" followed by a value with 
> same element type, but it's ["ARRAY", "TINYINT"].; line 1 pos 7;
> 'Project [unresolvedalias(array_insert(array(1), 2, cast(2 as tinyint)), 
> None)]
> +- OneRowRelation {code}
> The reported error is clear, however, we may should do explicit casting here. 
> because multiset type such as array or map allow the operands of same type 
> family  to coexist.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45068) Make function output column name consistent in case

2023-09-05 Thread Aparna Garg (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762050#comment-17762050
 ] 

Aparna Garg commented on SPARK-45068:
-

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/42797

> Make function output column name consistent in case
> ---
>
> Key: SPARK-45068
> URL: https://issues.apache.org/jira/browse/SPARK-45068
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45068) Make function output column name consistent in case

2023-09-05 Thread Aparna Garg (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762049#comment-17762049
 ] 

Aparna Garg commented on SPARK-45068:
-

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/42797

> Make function output column name consistent in case
> ---
>
> Key: SPARK-45068
> URL: https://issues.apache.org/jira/browse/SPARK-45068
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45078) The ArrayInsert function should make explicit casting when element type not equals derived component type

2023-09-05 Thread Ran Tao (Jira)

Ran Tao created SPARK-45078:
---

 Summary: The ArrayInsert function should make explicit casting 
when element type not equals derived component type
 Key: SPARK-45078
 URL: https://issues.apache.org/jira/browse/SPARK-45078
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.1
Reporter: Ran Tao


Generally speaking, array_insert has same insert semantic with  
array_prepend/array_append. however, if we run sql use element cast like below,
 array_prepend/array_append can get right result. but array_insert fails.
{code:java}
spark-sql (default)> select array_prepend(array(1), cast(2 as tinyint));
[2,1]
Time taken: 0.123 seconds, Fetched 1 row(s) {code}
{code:java}
spark-sql (default)> select array_append(array(1), cast(2 as tinyint)); [1,2] 
Time taken: 0.206 seconds, Fetched 1 row(s)
{code}
{code:java}
spark-sql (default)> select array_insert(array(1), 2, cast(2 as tinyint));
[DATATYPE_MISMATCH.ARRAY_FUNCTION_DIFF_TYPES] Cannot resolve 
"array_insert(array(1), 2, CAST(2 AS TINYINT))" due to data type mismatch: 
Input to `array_insert` should have been "ARRAY" followed by a value with same 
element type, but it's ["ARRAY", "TINYINT"].; line 1 pos 7;
'Project [unresolvedalias(array_insert(array(1), 2, cast(2 as tinyint)), None)]
+- OneRowRelation {code}

The reported error is clear, however, we may should do explicit casting here. 
because multiset type such as array or map allow the operands of same type 
family  to coexist.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45070) Describe the binary and datetime formats of `to_char`/`to_varchar`

2023-09-05 Thread Aparna Garg (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762046#comment-17762046
 ] 

Aparna Garg commented on SPARK-45070:
-

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/42801

> Describe the binary and datetime formats of `to_char`/`to_varchar`
> --
>
> Key: SPARK-45070
> URL: https://issues.apache.org/jira/browse/SPARK-45070
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> In the PR, I propose to document the recent changes related to the `format` 
> of the `to_char`/`to_varchar` functions:
> 1. binary formats added by https://github.com/apache/spark/pull/42632
> 2. datetime formats introduced by https://github.com/apache/spark/pull/42534



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45022) Provide context for dataset API errors

2023-09-05 Thread Aparna Garg (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762043#comment-17762043
 ] 

Aparna Garg commented on SPARK-45022:
-

User 'peter-toth' has created a pull request for this issue:
https://github.com/apache/spark/pull/42740

> Provide context for dataset API errors
> --
>
> Key: SPARK-45022
> URL: https://issues.apache.org/jira/browse/SPARK-45022
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Peter Toth
>Priority: Major
>
> SQL failures already provide nice error context when there is a failure:
> {noformat}
> org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. 
> Use `try_divide` to tolerate divisor being 0 and return NULL instead. If 
> necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
> == SQL(line 1, position 1) ==
> a / b
> ^
>   at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:201)
>   at 
> org.apache.spark.sql.errors.QueryExecutionErrors.divideByZeroError(QueryExecutionErrors.scala)
> ...
> {noformat}
> We could add a similar user friendly error context to Dataset APIs.
> E.g. consider the following Spark app SimpleApp.scala:
> {noformat}
>1  import org.apache.spark.sql.SparkSession
>2  import org.apache.spark.sql.functions._
>3
>4  object SimpleApp {
>5def main(args: Array[String]) {
>6  val spark = SparkSession.builder.appName("Simple 
> Application").config("spark.sql.ansi.enabled", true).getOrCreate()
>7  import spark.implicits._
>8
>9  val c = col("a") / col("b")
>   10
>   11  Seq((1, 0)).toDF("a", "b").select(c).show()
>   12
>   13  spark.stop()
>   14}
>   15  }
> {noformat}
> then the error context could be:
> {noformat}
> Exception in thread "main" org.apache.spark.SparkArithmeticException: 
> [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 
> 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to 
> "false" to bypass this error.
> == Dataset ==
> "div" was called from SimpleApp$.main(SimpleApp.scala:9)
>   at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:201)
>   at 
> org.apache.spark.sql.catalyst.expressions.DivModLike.eval(arithmetic.scala:672
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45022) Provide context for dataset API errors

2023-09-05 Thread Aparna Garg (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762041#comment-17762041
 ] 

Aparna Garg commented on SPARK-45022:
-

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/42816

> Provide context for dataset API errors
> --
>
> Key: SPARK-45022
> URL: https://issues.apache.org/jira/browse/SPARK-45022
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Peter Toth
>Priority: Major
>
> SQL failures already provide nice error context when there is a failure:
> {noformat}
> org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. 
> Use `try_divide` to tolerate divisor being 0 and return NULL instead. If 
> necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
> == SQL(line 1, position 1) ==
> a / b
> ^
>   at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:201)
>   at 
> org.apache.spark.sql.errors.QueryExecutionErrors.divideByZeroError(QueryExecutionErrors.scala)
> ...
> {noformat}
> We could add a similar user friendly error context to Dataset APIs.
> E.g. consider the following Spark app SimpleApp.scala:
> {noformat}
>1  import org.apache.spark.sql.SparkSession
>2  import org.apache.spark.sql.functions._
>3
>4  object SimpleApp {
>5def main(args: Array[String]) {
>6  val spark = SparkSession.builder.appName("Simple 
> Application").config("spark.sql.ansi.enabled", true).getOrCreate()
>7  import spark.implicits._
>8
>9  val c = col("a") / col("b")
>   10
>   11  Seq((1, 0)).toDF("a", "b").select(c).show()
>   12
>   13  spark.stop()
>   14}
>   15  }
> {noformat}
> then the error context could be:
> {noformat}
> Exception in thread "main" org.apache.spark.SparkArithmeticException: 
> [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 
> 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to 
> "false" to bypass this error.
> == Dataset ==
> "div" was called from SimpleApp$.main(SimpleApp.scala:9)
>   at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:201)
>   at 
> org.apache.spark.sql.catalyst.expressions.DivModLike.eval(arithmetic.scala:672
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45077) Upgrade dagre-d3.js from 0.4.3 to 0.6.4

2023-09-05 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-45077:
-
Labels: pat  (was: )

> Upgrade dagre-d3.js from 0.4.3 to 0.6.4
> ---
>
> Key: SPARK-45077
> URL: https://issues.apache.org/jira/browse/SPARK-45077
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pat
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45077) Upgrade dagre-d3.js from 0.4.3 to 0.6.4

2023-09-05 Thread Kent Yao (Jira)

Kent Yao created SPARK-45077:


 Summary: Upgrade dagre-d3.js from 0.4.3 to 0.6.4
 Key: SPARK-45077
 URL: https://issues.apache.org/jira/browse/SPARK-45077
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45074) DataFrame.{sort, sortWithinPartitions} support column ordinals

2023-09-05 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45074.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42809
[https://github.com/apache/spark/pull/42809]

> DataFrame.{sort, sortWithinPartitions} support column ordinals
> --
>
> Key: SPARK-45074
> URL: https://issues.apache.org/jira/browse/SPARK-45074
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45074) DataFrame.{sort, sortWithinPartitions} support column ordinals

2023-09-05 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45074:
-

Assignee: Ruifeng Zheng

> DataFrame.{sort, sortWithinPartitions} support column ordinals
> --
>
> Key: SPARK-45074
> URL: https://issues.apache.org/jira/browse/SPARK-45074
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45076) Switch to built-in repeat function

2023-09-05 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-45076.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42812
[https://github.com/apache/spark/pull/42812]

> Switch to built-in repeat function
> --
>
> Key: SPARK-45076
> URL: https://issues.apache.org/jira/browse/SPARK-45076
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45076) Switch to built-in repeat function

2023-09-05 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-45076:
-

Assignee: Ruifeng Zheng

> Switch to built-in repeat function
> --
>
> Key: SPARK-45076
> URL: https://issues.apache.org/jira/browse/SPARK-45076
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

65 matches

Mail list logo