date:20230811

[jira] [Resolved] (SPARK-44778) Add the `TIMEDIFF` aliases for `TIMESTAMPDIFF()`

2023-08-11 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-44778.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42435
[https://github.com/apache/spark/pull/42435]

> Add the `TIMEDIFF` aliases for `TIMESTAMPDIFF()`
> 
>
> Key: SPARK-44778
> URL: https://issues.apache.org/jira/browse/SPARK-44778
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 4.0.0
>
>
> Introduce the timediff() function, which takes three arguments: unit, and two 
> datetime expressions, i.e.,
> {code:sql}
> datediff(unit, startDatetime, endDatetime)
> {code}
> The function can be an alias to timestampdiff().



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44780) Document SQL Session variables

2023-08-11 Thread Serge Rielau (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Rielau updated SPARK-44780:
-
Attachment: Screenshot 2023-08-11 at 10.22.55 PM.png
Screenshot 2023-08-11 at 10.24.33 PM.png
Screenshot 2023-08-11 at 10.26.54 PM.png

> Document SQL Session variables
> --
>
> Key: SPARK-44780
> URL: https://issues.apache.org/jira/browse/SPARK-44780
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 3.4.2
>Reporter: Serge Rielau
>Priority: Major
> Attachments: Screenshot 2023-08-11 at 10.22.55 PM.png, Screenshot 
> 2023-08-11 at 10.24.33 PM.png, Screenshot 2023-08-11 at 10.26.54 PM.png
>
>
> SQL Session variables have been added with: SPARK-42849.
> Here we add the docs for it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-44719) NoClassDefFoundError when using Hive UDF

2023-08-11 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-44719.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42446
[https://github.com/apache/spark/pull/42446]

> NoClassDefFoundError when using Hive UDF
> 
>
> Key: SPARK-44719
> URL: https://issues.apache.org/jira/browse/SPARK-44719
> Project: Spark
>  Issue Type: Bug
>  Components: Build, SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HiveUDFs-1.0-SNAPSHOT.jar
>
>
> How to reproduce:
> {noformat}
> spark-sql (default)> add jar 
> /Users/yumwang/Downloads/HiveUDFs-1.0-SNAPSHOT.jar;
> Time taken: 0.413 seconds
> spark-sql (default)> CREATE TEMPORARY FUNCTION long_to_ip as 
> 'net.petrabarus.hiveudfs.LongToIP';
> Time taken: 0.038 seconds
> spark-sql (default)> SELECT long_to_ip(2130706433L) FROM range(10);
> 23/08/08 20:17:58 ERROR SparkSQLDriver: Failed in [SELECT 
> long_to_ip(2130706433L) FROM range(10)]
> java.lang.NoClassDefFoundError: org/codehaus/jackson/map/type/TypeFactory
>   at org.apache.hadoop.hive.ql.udf.UDFJson.(UDFJson.java:64)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-44719) NoClassDefFoundError when using Hive UDF

2023-08-11 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-44719:


Assignee: Yuming Wang

> NoClassDefFoundError when using Hive UDF
> 
>
> Key: SPARK-44719
> URL: https://issues.apache.org/jira/browse/SPARK-44719
> Project: Spark
>  Issue Type: Bug
>  Components: Build, SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Attachments: HiveUDFs-1.0-SNAPSHOT.jar
>
>
> How to reproduce:
> {noformat}
> spark-sql (default)> add jar 
> /Users/yumwang/Downloads/HiveUDFs-1.0-SNAPSHOT.jar;
> Time taken: 0.413 seconds
> spark-sql (default)> CREATE TEMPORARY FUNCTION long_to_ip as 
> 'net.petrabarus.hiveudfs.LongToIP';
> Time taken: 0.038 seconds
> spark-sql (default)> SELECT long_to_ip(2130706433L) FROM range(10);
> 23/08/08 20:17:58 ERROR SparkSQLDriver: Failed in [SELECT 
> long_to_ip(2130706433L) FROM range(10)]
> java.lang.NoClassDefFoundError: org/codehaus/jackson/map/type/TypeFactory
>   at org.apache.hadoop.hive.ql.udf.UDFJson.(UDFJson.java:64)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44781) Runtime filter should supports reuse exchange if it can reduce the data size of application side

2023-08-11 Thread jiaan.geng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17753458#comment-17753458
 ] 

jiaan.geng commented on SPARK-44781:


I'm working on.

> Runtime filter should supports reuse exchange if it can reduce the data size 
> of application side
> 
>
> Key: SPARK-44781
> URL: https://issues.apache.org/jira/browse/SPARK-44781
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, Spark runtime filter only supports using the subquery on one table.
> In fact, we can reuse the exchange, even if it is a shuffle exchange.
> If the shuffle exchange come from a join which has one side with selective 
> predicates, so the results of the join can be used to prune the data amount 
> of the application side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44781) Runtime filter should supports reuse exchange if it can reduce the data size of application side

2023-08-11 Thread jiaan.geng (Jira)

jiaan.geng created SPARK-44781:
--

 Summary: Runtime filter should supports reuse exchange if it can 
reduce the data size of application side
 Key: SPARK-44781
 URL: https://issues.apache.org/jira/browse/SPARK-44781
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 4.0.0
Reporter: jiaan.geng


Currently, Spark runtime filter only supports using the subquery on one table.
In fact, we can reuse the exchange, even if it is a shuffle exchange.
If the shuffle exchange come from a join which has one side with selective 
predicates, so the results of the join can be used to prune the data amount of 
the application side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-44242) Spark job submission failed because Xmx string is available on one parameter provided into spark.driver.extraJavaOptions

2023-08-11 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-44242.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 41806
[https://github.com/apache/spark/pull/41806]

> Spark job submission failed because Xmx string is available on one parameter 
> provided into spark.driver.extraJavaOptions
> 
>
> Key: SPARK-44242
> URL: https://issues.apache.org/jira/browse/SPARK-44242
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 3.3.2, 3.4.1
>Reporter: Nicolas Fraison
>Assignee: Nicolas Fraison
>Priority: Major
> Fix For: 4.0.0
>
>
> The spark-submit command failed if Xmx string is found on any parameters 
> provided to spark.driver.extraJavaOptions.
> For ex. running this spark-submit command line
> {code:java}
> ./bin/spark-submit --class org.apache.spark.examples.SparkPi --conf 
> "spark.driver.extraJavaOptions=-Dtest=Xmx"  
> examples/jars/spark-examples_2.12-3.4.1.jar 100{code}
> failed due to
> {code:java}
> Error: Not allowed to specify max heap(Xmx) memory settings through java 
> options (was -Dtest=Xmx). Use the corresponding --driver-memory or 
> spark.driver.memory configuration instead.{code}
> The check performed in 
> [https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L314]
>  seems to broad



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-44242) Spark job submission failed because Xmx string is available on one parameter provided into spark.driver.extraJavaOptions

2023-08-11 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-44242:
---

Assignee: Nicolas Fraison

> Spark job submission failed because Xmx string is available on one parameter 
> provided into spark.driver.extraJavaOptions
> 
>
> Key: SPARK-44242
> URL: https://issues.apache.org/jira/browse/SPARK-44242
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 3.3.2, 3.4.1
>Reporter: Nicolas Fraison
>Assignee: Nicolas Fraison
>Priority: Major
>
> The spark-submit command failed if Xmx string is found on any parameters 
> provided to spark.driver.extraJavaOptions.
> For ex. running this spark-submit command line
> {code:java}
> ./bin/spark-submit --class org.apache.spark.examples.SparkPi --conf 
> "spark.driver.extraJavaOptions=-Dtest=Xmx"  
> examples/jars/spark-examples_2.12-3.4.1.jar 100{code}
> failed due to
> {code:java}
> Error: Not allowed to specify max heap(Xmx) memory settings through java 
> options (was -Dtest=Xmx). Use the corresponding --driver-memory or 
> spark.driver.memory configuration instead.{code}
> The check performed in 
> [https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L314]
>  seems to broad



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-43987) Separate finalizeShuffleMerge Processing to Dedicated Thread Pools

2023-08-11 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-43987.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 41489
[https://github.com/apache/spark/pull/41489]

> Separate finalizeShuffleMerge Processing to Dedicated Thread Pools
> --
>
> Key: SPARK-43987
> URL: https://issues.apache.org/jira/browse/SPARK-43987
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 3.2.0, 3.4.0
>Reporter: SHU WANG
>Assignee: SHU WANG
>Priority: Critical
> Fix For: 4.0.0
>
>
> In our production environment, _finalizeShuffleMerge_ processing took longer 
> time (p90 is around 20s) than other PRC requests. This is due to 
> _finalizeShuffleMerge_ invoking IO operations like truncate and file 
> open/close.  
> More importantly, processing this _finalizeShuffleMerge_ can block other 
> critical lightweight messages like authentications, which can cause 
> authentication timeout as well as fetch failures. Those timeout and fetch 
> failures affect the stability of the Spark job executions. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43987) Separate finalizeShuffleMerge Processing to Dedicated Thread Pools

2023-08-11 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-43987:
---

Assignee: SHU WANG

> Separate finalizeShuffleMerge Processing to Dedicated Thread Pools
> --
>
> Key: SPARK-43987
> URL: https://issues.apache.org/jira/browse/SPARK-43987
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 3.2.0, 3.4.0
>Reporter: SHU WANG
>Assignee: SHU WANG
>Priority: Critical
>
> In our production environment, _finalizeShuffleMerge_ processing took longer 
> time (p90 is around 20s) than other PRC requests. This is due to 
> _finalizeShuffleMerge_ invoking IO operations like truncate and file 
> open/close.  
> More importantly, processing this _finalizeShuffleMerge_ can block other 
> critical lightweight messages like authentications, which can cause 
> authentication timeout as well as fetch failures. Those timeout and fetch 
> failures affect the stability of the Spark job executions. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44461) Enable Process Isolation for streaming python worker

2023-08-11 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17753427#comment-17753427
 ] 

Hyukjin Kwon commented on SPARK-44461:
--

[~rangadi] can we switch the JIRA by switching the description and title?

> Enable Process Isolation for streaming python worker
> 
>
> Key: SPARK-44461
> URL: https://issues.apache.org/jira/browse/SPARK-44461
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.4.1
>Reporter: Raghu Angadi
>Priority: Major
>
> Enable PI for Python worker used for foreachBatch() & streaming listener in 
> Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44780) Document SQL Session variables

2023-08-11 Thread Serge Rielau (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Rielau updated SPARK-44780:
-
Summary: Document SQL Session variables  (was: Docuement SQL Session 
variables)

> Document SQL Session variables
> --
>
> Key: SPARK-44780
> URL: https://issues.apache.org/jira/browse/SPARK-44780
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 3.4.2
>Reporter: Serge Rielau
>Priority: Major
>
> SQL Session variables have been added with: SPARK-42849.
> Here we add the docs for it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44780) Docuement SQL Session variables

2023-08-11 Thread Serge Rielau (Jira)

Serge Rielau created SPARK-44780:


 Summary: Docuement SQL Session variables
 Key: SPARK-44780
 URL: https://issues.apache.org/jira/browse/SPARK-44780
 Project: Spark
  Issue Type: Task
  Components: Spark Core
Affects Versions: 3.4.2
Reporter: Serge Rielau


SQL Session variables have been added with: SPARK-42849.
Here we add the docs for it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44778) Add the `TIMEDIFF` aliases for `TIMESTAMPDIFF()`

2023-08-11 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-44778:
-
Epic Link:   (was: SPARK-38783)

> Add the `TIMEDIFF` aliases for `TIMESTAMPDIFF()`
> 
>
> Key: SPARK-44778
> URL: https://issues.apache.org/jira/browse/SPARK-44778
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.3.0
>
>
> Introduce the datediff()/date_diff() function, which takes three arguments: 
> unit, and two datetime expressions, i.e.,
> {code:sql}
> datediff(unit, startDatetime, endDatetime)
> {code}
> The function can be an alias to timestampdiff().



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44778) Add the `TIMEDIFF` aliases for `TIMESTAMPDIFF()`

2023-08-11 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-44778:
-
Description: 
Introduce the timediff() function, which takes three arguments: unit, and two 
datetime expressions, i.e.,
{code:sql}
datediff(unit, startDatetime, endDatetime)
{code}
The function can be an alias to timestampdiff().

  was:
Introduce the datediff()/date_diff() function, which takes three arguments: 
unit, and two datetime expressions, i.e.,
{code:sql}
datediff(unit, startDatetime, endDatetime)
{code}
The function can be an alias to timestampdiff().


> Add the `TIMEDIFF` aliases for `TIMESTAMPDIFF()`
> 
>
> Key: SPARK-44778
> URL: https://issues.apache.org/jira/browse/SPARK-44778
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Introduce the timediff() function, which takes three arguments: unit, and two 
> datetime expressions, i.e.,
> {code:sql}
> datediff(unit, startDatetime, endDatetime)
> {code}
> The function can be an alias to timestampdiff().



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44778) Add the `TIMEDIFF` aliases for `TIMESTAMPDIFF()`

2023-08-11 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-44778:
-
Affects Version/s: 4.0.0
   (was: 3.3.0)

> Add the `TIMEDIFF` aliases for `TIMESTAMPDIFF()`
> 
>
> Key: SPARK-44778
> URL: https://issues.apache.org/jira/browse/SPARK-44778
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Introduce the datediff()/date_diff() function, which takes three arguments: 
> unit, and two datetime expressions, i.e.,
> {code:sql}
> datediff(unit, startDatetime, endDatetime)
> {code}
> The function can be an alias to timestampdiff().



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44778) Add the `TIMEDIFF` aliases for `TIMESTAMPDIFF()`

2023-08-11 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-44778:
-
Fix Version/s: (was: 3.3.0)

> Add the `TIMEDIFF` aliases for `TIMESTAMPDIFF()`
> 
>
> Key: SPARK-44778
> URL: https://issues.apache.org/jira/browse/SPARK-44778
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Introduce the datediff()/date_diff() function, which takes three arguments: 
> unit, and two datetime expressions, i.e.,
> {code:sql}
> datediff(unit, startDatetime, endDatetime)
> {code}
> The function can be an alias to timestampdiff().



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44778) Add the `TIMEDIFF` aliases for `TIMESTAMPDIFF()`

2023-08-11 Thread Max Gekk (Jira)

Max Gekk created SPARK-44778:


 Summary: Add the `TIMEDIFF` aliases for `TIMESTAMPDIFF()`
 Key: SPARK-44778
 URL: https://issues.apache.org/jira/browse/SPARK-44778
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.3.0
Reporter: Max Gekk
Assignee: Max Gekk
 Fix For: 3.3.0


Introduce the datediff()/date_diff() function, which takes three arguments: 
unit, and two datetime expressions, i.e.,
{code:sql}
datediff(unit, startDatetime, endDatetime)
{code}
The function can be an alias to timestampdiff().



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-44625) Spark Connect clean up abandoned executions

2023-08-11 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-44625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-44625.
---
Fix Version/s: 3.5.0
 Assignee: Juliusz Sompolski
   Resolution: Fixed

> Spark Connect clean up abandoned executions
> ---
>
> Key: SPARK-44625
> URL: https://issues.apache.org/jira/browse/SPARK-44625
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Juliusz Sompolski
>Assignee: Juliusz Sompolski
>Priority: Major
> Fix For: 3.5.0
>
>
> With reattachable executions, some executions might get orphaned when 
> ReattachExecute and ReleaseExecute never comes. Add a mechanism to track that 
> and to clean them up.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44777) Allow to specify eagerness to RDD.checkpoint

2023-08-11 Thread Emil Ejbyfeldt (Jira)

Emil Ejbyfeldt created SPARK-44777:
--

 Summary: Allow to specify eagerness to RDD.checkpoint
 Key: SPARK-44777
 URL: https://issues.apache.org/jira/browse/SPARK-44777
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: Emil Ejbyfeldt


Currently Dataset.checkpoint takes a boolean to indicate if the checkpoint 
should be done eagerly. For the same reason that one might want to be able 
eagerly checkpoint an Dataset one might want to do it with a RDD.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44776) Add ProducedRowCount to SparkListenerConnectOperationFinished

2023-08-11 Thread Lingkai Kong (Jira)

Lingkai Kong created SPARK-44776:


 Summary: Add ProducedRowCount to 
SparkListenerConnectOperationFinished
 Key: SPARK-44776
 URL: https://issues.apache.org/jira/browse/SPARK-44776
 Project: Spark
  Issue Type: Task
  Components: Connect
Affects Versions: 3.4.1
Reporter: Lingkai Kong


As title



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-43327) Trigger `committer.setupJob` before plan execute in `FileFormatWriter`

2023-08-11 Thread zzzzming95 (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17718060#comment-17718060
 ] 

ming95 edited comment on SPARK-43327 at 8/11/23 12:58 PM:
--

pr : https://github.com/apache/spark/pull/41154


was (Author: zing):
pr : https://github.com/apache/spark/pull/41000

> Trigger `committer.setupJob` before plan execute in `FileFormatWriter`
> --
>
> Key: SPARK-43327
> URL: https://issues.apache.org/jira/browse/SPARK-43327
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.3
>Reporter: ming95
>Priority: Major
>
> In this jira, the case where `outputOrdering` might not work if AQE is 
> enabled has been resolved.
> https://issues.apache.org/jira/browse/SPARK-40588
> However, since it materializes the AQE plan in advance (triggers 
> getFinalPhysicalPlan) , it may cause the committer.setupJob(job) to not 
> execute When `AdaptiveSparkPlanExec#getFinalPhysicalPlan()` is executed with 
> an error.
> Normally this step should be executed after committer.setupJob(job).
> This may eventually result in the insertoverwrite directory being deleted.
>  
> {code:java}
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.spark.sql.QueryTest
> import org.apache.spark.sql.catalyst.TableIdentifier
> sql("CREATE TABLE IF NOT EXISTS spark32_overwrite(amt1 int) STORED AS ORC")
> sql("CREATE TABLE IF NOT EXISTS spark32_overwrite2(amt1 long) STORED AS ORC")
> sql("INSERT OVERWRITE TABLE spark32_overwrite2 select 644164")
> sql("set spark.sql.ansi.enabled=true")
> val loc =
>   
> spark.sessionState.catalog.getTableMetadata(TableIdentifier("spark32_overwrite")).location
> val fs = FileSystem.get(loc, spark.sparkContext.hadoopConfiguration)
> println("Location exists: " + fs.exists(new Path(loc)))
> try {
>   sql("INSERT OVERWRITE TABLE spark32_overwrite select amt1 from " +
> "(select cast(amt1 as int) as amt1 from spark32_overwrite2 distribute by 
> amt1)")
> } finally {
>   println("Location exists: " + fs.exists(new Path(loc)))
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-44761) Add DataStreamWriter.foreachBatch(org.apache.spark.api.java.function.VoidFunction2) signature

2023-08-11 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-44761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-44761.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

> Add 
> DataStreamWriter.foreachBatch(org.apache.spark.api.java.function.VoidFunction2)
>  signature 
> --
>
> Key: SPARK-44761
> URL: https://issues.apache.org/jira/browse/SPARK-44761
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-44760) Index Out Of Bound for JIRA resolution in merge_spark_pr

2023-08-11 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-44760.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42429
[https://github.com/apache/spark/pull/42429]

> Index Out Of Bound for JIRA resolution in merge_spark_pr
> 
>
> Key: SPARK-44760
> URL: https://issues.apache.org/jira/browse/SPARK-44760
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
> Fix For: 4.0.0
>
>
> I



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44775) Add missing version information in DataFrame APIs

2023-08-11 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-44775:
-

 Summary: Add missing version information in DataFrame APIs
 Key: SPARK-44775
 URL: https://issues.apache.org/jira/browse/SPARK-44775
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, PySpark
Affects Versions: 3.5.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44774) SaveMode.ErrorIfExists does not work with kafka-sql

2023-08-11 Thread Maxim Martynov (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Martynov updated SPARK-44774:
---
Description: 
I' trying to write batch dataframe to Kafka topic with {{mode="error"}}, but 
when topic exists it does not raise exception. Instead it appends data to a 
topic.

Steps to reproduce:

1. Start Kafka:

docker-compose.yml
{code:yaml}
version: '3.9'

services:
  zookeeper:
image: bitnami/zookeeper:3.8
environment:
  ALLOW_ANONYMOUS_LOGIN: 'yes'

  kafka:
image: bitnami/kafka:latest
restart: unless-stopped
ports:
- 9093:9093
environment:
  ALLOW_PLAINTEXT_LISTENER: 'yes'
  KAFKA_ENABLE_KRAFT: 'no'
  KAFKA_CFG_ZOOKEEPER_CONNECT: zookeeper:2181
  KAFKA_CFG_INTER_BROKER_LISTENER_NAME: INTERNAL_PLAINTEXT_ANONYMOUS
  KAFKA_CFG_LISTENERS: 
INTERNAL_PLAINTEXT_ANONYMOUS://:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://:9093
  KAFKA_CFG_ADVERTISED_LISTENERS: 
INTERNAL_PLAINTEXT_ANONYMOUS://kafka:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://localhost:9093
  KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: 
INTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT,EXTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT
  KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE: 'true'
depends_on:
- zookeeper
{code}

{code:bash}
docker-compose up -d
{code}

2. Start Spark session:

{code:bash}
pip install pyspark[sql]==3.4.1
{code}


{code:python}
from pyspark.sql import SparkSession

spark = SparkSession.builder.config("spark.jars.packages", 
"org.apache.spark:spark-sql-kafka-0-10_2.12:3.4.1").getOrCreate()
{code}

3. Create DataFrame and write it to Kafka. First write using {{mode="append"}} 
to create topic, then with {{mode="error"}} to raise because topic already 
exist:
{code}
df = spark.createDataFrame([{"value": "string"}])
df.write.format("kafka").option("kafka.bootstrap.servers", 
"localhost:9093").option("topic", "new_topic").mode("append").save()

# no exception is raised
df.write.format("kafka").option("kafka.bootstrap.servers", 
"localhost:9093").option("topic", "new_topic").mode("error").save()
{code}

4. Check topic content - 2 rows are added to topic instead of one:
{code:python}
spark.read.format("kafka").option("kafka.bootstrap.servers", 
"localhost:9093").option("subscribe", "new_topic").load().show(10, False)
{code}
{code}
++---+-+-+--+---+-+
|key |value  |topic|partition|offset|timestamp  
|timestampType|
++---+-+-+--+---+-+
|null|[73 74 72 69 6E 67]|new_topic|0|0 |2023-08-11 09:39:35.813|0  
  |
|null|[73 74 72 69 6E 67]|new_topic|0|1 |2023-08-11 09:39:36.122|0  
  |
++---+-+-+--+---+-+
{code}

It looks like mode is checked by KafkaSourceProvider, but is not used at all:
https://github.com/apache/spark/blob/v3.4.1/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L172-L178

So data is always appended to topic.

  was:
I' trying to write batch dataframe to Kafka topic with {{mode="error"}}, but 
when topic exists it does not raise exception. Instead it appends data to a 
topic.

Steps to reproduce:

1. Start Kafka:

docker-compose.yml
{code:yaml}
version: '3.9'

services:
  zookeeper:
image: bitnami/zookeeper:3.8
environment:
  ALLOW_ANONYMOUS_LOGIN: 'yes'

  kafka:
image: bitnami/kafka:latest
restart: unless-stopped
ports:
- 9093:9093
environment:
  ALLOW_PLAINTEXT_LISTENER: 'yes'
  KAFKA_ENABLE_KRAFT: 'no'
  KAFKA_CFG_ZOOKEEPER_CONNECT: zookeeper:2181
  KAFKA_CFG_INTER_BROKER_LISTENER_NAME: INTERNAL_PLAINTEXT_ANONYMOUS
  KAFKA_CFG_LISTENERS: 
INTERNAL_PLAINTEXT_ANONYMOUS://:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://:9093
  KAFKA_CFG_ADVERTISED_LISTENERS: 
INTERNAL_PLAINTEXT_ANONYMOUS://kafka:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://localhost:9093
  KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: 
INTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT,EXTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT
  KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE: 'true'
depends_on:
- zookeeper
{code}

{code:bash}
docker-compose up -d
{code}

2. Start Spark session:

{code:bash}
pip install pyspark[sql]==3.4.1
{code}


{code:python}
from pyspark.sql import SparkSession

spark = SparkSession.builder.config("spark.jars.packages", 
"org.apache.spark:spark-sql-kafka-0-10_2.12:3.4.1").getOrCreate()
{code}

3. Create DataFrame and write it to Kafka. First write using {{mode="append"}} 
to create topic, then with {{mode="error"}} to raise because topic already 
exist:
{code}
df = spark.createDataFrame([{"value": "string"}])
df.write.format("kafka").option("kafka.bootstrap.servers", 
"localhost:9093").option("topic", "new_topic").mode("append").save()

# no exception is raised

[jira] [Updated] (SPARK-44774) SaveMode.ErrorIfExists does not work with kafka-sql

2023-08-11 Thread Maxim Martynov (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Martynov updated SPARK-44774:
---
Description: 
I' trying to write batch dataframe to Kafka topic with {{mode="error"}}, but 
when topic exists it does not raise exception. Instead it appends data to a 
topic.

Steps to reproduce:

1. Start Kafka:

docker-compose.yml
{code:yaml}
version: '3.9'

services:
  zookeeper:
image: bitnami/zookeeper:3.8
environment:
  ALLOW_ANONYMOUS_LOGIN: 'yes'

  kafka:
image: bitnami/kafka:latest
restart: unless-stopped
ports:
- 9093:9093
environment:
  ALLOW_PLAINTEXT_LISTENER: 'yes'
  KAFKA_ENABLE_KRAFT: 'yes'
  KAFKA_CLIENT_USERS: onetl
  KAFKA_CLIENT_PASSWORDS: uufoFae9sahSoidoo0eagaidaoreif6z
  KAFKA_CFG_ZOOKEEPER_CONNECT: zookeeper:2181
  KAFKA_CFG_INTER_BROKER_LISTENER_NAME: INTERNAL_PLAINTEXT_ANONYMOUS
  KAFKA_CFG_LISTENERS: 
INTERNAL_PLAINTEXT_ANONYMOUS://:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://:9093
  KAFKA_CFG_ADVERTISED_LISTENERS: 
INTERNAL_PLAINTEXT_ANONYMOUS://kafka:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://localhost:9093
  KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: 
INTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT,EXTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT
  KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE: 'true'
depends_on:
- zookeeper
{code}

{code:bash}
docker-compose up -d
{code}

2. Start Spark session:

{code:bash}
pip install pyspark[sql]==3.4.1
{code}


{code:python}
from pyspark.sql import SparkSession

spark = SparkSession.builder.config("spark.jars.packages", 
"org.apache.spark:spark-sql-kafka-0-10_2.12:3.4.1").getOrCreate()
{code}

3. Create DataFrame and write it to Kafka. First write using {{mode="append"}} 
to create topic, then with {{mode="error"}} to raise because topic already 
exist:
{code}
df = spark.createDataFrame([{"value": "string"}])
df.write.format("kafka").option("kafka.bootstrap.servers", 
"localhost:9093").option("topic", "new_topic").mode("append").save()

# no exception is raised
df.write.format("kafka").option("kafka.bootstrap.servers", 
"localhost:9093").option("topic", "new_topic").mode("error").save()
{code}

4. Check topic content - 2 rows are added to topic instead of one:
{code:python}
spark.read.format("kafka").option("kafka.bootstrap.servers", 
"localhost:9093").option("subscribe", "new_topic").load().show(10, False)
{code}
{code}
++---+-+-+--+---+-+
|key |value  |topic|partition|offset|timestamp  
|timestampType|
++---+-+-+--+---+-+
|null|[73 74 72 69 6E 67]|new_topic|0|0 |2023-08-11 09:39:35.813|0  
  |
|null|[73 74 72 69 6E 67]|new_topic|0|1 |2023-08-11 09:39:36.122|0  
  |
++---+-+-+--+---+-+
{code}

It looks like mode is checked by KafkaSourceProvider, but is not used at all:
https://github.com/apache/spark/blob/6b1ff22dde1ead51cbf370be6e48a802daae58b6/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L172-L178

So data is always appended to topic.

  was:
I' trying to write batch dataframe to Kafka topic with {{mode="error"}}, but 
when topic exists it does not raise exception - instead it appends data to a 
topic.

Steps to reproduce:

1. Start Kafka:

docker-compose.yml
{code:yaml}
version: '3.9'

services:
  zookeeper:
image: bitnami/zookeeper:3.8
environment:
  ALLOW_ANONYMOUS_LOGIN: 'yes'

  kafka:
image: bitnami/kafka:latest
restart: unless-stopped
ports:
- 9093:9093
environment:
  ALLOW_PLAINTEXT_LISTENER: 'yes'
  KAFKA_ENABLE_KRAFT: 'yes'
  KAFKA_CLIENT_USERS: onetl
  KAFKA_CLIENT_PASSWORDS: uufoFae9sahSoidoo0eagaidaoreif6z
  KAFKA_CFG_ZOOKEEPER_CONNECT: zookeeper:2181
  KAFKA_CFG_INTER_BROKER_LISTENER_NAME: INTERNAL_PLAINTEXT_ANONYMOUS
  KAFKA_CFG_LISTENERS: 
INTERNAL_PLAINTEXT_ANONYMOUS://:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://:9093
  KAFKA_CFG_ADVERTISED_LISTENERS: 
INTERNAL_PLAINTEXT_ANONYMOUS://kafka:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://localhost:9093
  KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: 
INTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT,EXTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT
  KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE: 'true'
depends_on:
- zookeeper
{code}

{code:bash}
docker-compose up -d
{code}

2. Start Spark session:

{code:bash}
pip install pyspark[sql]==3.4.1
{code}


{code:python}
from pyspark.sql import SparkSession

spark = SparkSession.builder.config("spark.jars.packages", 
"org.apache.spark:spark-sql-kafka-0-10_2.12:3.4.1").getOrCreate()
{code}

3. Create DataFrame and write it to Kafka. First write using {{mode="append"}} 
to create topic, then with {{mode="error"}} to raise because topic

[jira] [Updated] (SPARK-44774) SaveMode.ErrorIfExists does not work with kafka-sql

2023-08-11 Thread Maxim Martynov (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Martynov updated SPARK-44774:
---
Description: 
I' trying to write batch dataframe to Kafka topic with {{mode="error"}}, but 
when topic exists it does not raise exception. Instead it appends data to a 
topic.

Steps to reproduce:

1. Start Kafka:

docker-compose.yml
{code:yaml}
version: '3.9'

services:
  zookeeper:
image: bitnami/zookeeper:3.8
environment:
  ALLOW_ANONYMOUS_LOGIN: 'yes'

  kafka:
image: bitnami/kafka:latest
restart: unless-stopped
ports:
- 9093:9093
environment:
  ALLOW_PLAINTEXT_LISTENER: 'yes'
  KAFKA_ENABLE_KRAFT: 'yes'
  KAFKA_CFG_ZOOKEEPER_CONNECT: zookeeper:2181
  KAFKA_CFG_INTER_BROKER_LISTENER_NAME: INTERNAL_PLAINTEXT_ANONYMOUS
  KAFKA_CFG_LISTENERS: 
INTERNAL_PLAINTEXT_ANONYMOUS://:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://:9093
  KAFKA_CFG_ADVERTISED_LISTENERS: 
INTERNAL_PLAINTEXT_ANONYMOUS://kafka:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://localhost:9093
  KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: 
INTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT,EXTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT
  KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE: 'true'
depends_on:
- zookeeper
{code}

{code:bash}
docker-compose up -d
{code}

2. Start Spark session:

{code:bash}
pip install pyspark[sql]==3.4.1
{code}


{code:python}
from pyspark.sql import SparkSession

spark = SparkSession.builder.config("spark.jars.packages", 
"org.apache.spark:spark-sql-kafka-0-10_2.12:3.4.1").getOrCreate()
{code}

3. Create DataFrame and write it to Kafka. First write using {{mode="append"}} 
to create topic, then with {{mode="error"}} to raise because topic already 
exist:
{code}
df = spark.createDataFrame([{"value": "string"}])
df.write.format("kafka").option("kafka.bootstrap.servers", 
"localhost:9093").option("topic", "new_topic").mode("append").save()

# no exception is raised
df.write.format("kafka").option("kafka.bootstrap.servers", 
"localhost:9093").option("topic", "new_topic").mode("error").save()
{code}

4. Check topic content - 2 rows are added to topic instead of one:
{code:python}
spark.read.format("kafka").option("kafka.bootstrap.servers", 
"localhost:9093").option("subscribe", "new_topic").load().show(10, False)
{code}
{code}
++---+-+-+--+---+-+
|key |value  |topic|partition|offset|timestamp  
|timestampType|
++---+-+-+--+---+-+
|null|[73 74 72 69 6E 67]|new_topic|0|0 |2023-08-11 09:39:35.813|0  
  |
|null|[73 74 72 69 6E 67]|new_topic|0|1 |2023-08-11 09:39:36.122|0  
  |
++---+-+-+--+---+-+
{code}

It looks like mode is checked by KafkaSourceProvider, but is not used at all:
https://github.com/apache/spark/blob/6b1ff22dde1ead51cbf370be6e48a802daae58b6/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L172-L178

So data is always appended to topic.

  was:
I' trying to write batch dataframe to Kafka topic with {{mode="error"}}, but 
when topic exists it does not raise exception. Instead it appends data to a 
topic.

Steps to reproduce:

1. Start Kafka:

docker-compose.yml
{code:yaml}
version: '3.9'

services:
  zookeeper:
image: bitnami/zookeeper:3.8
environment:
  ALLOW_ANONYMOUS_LOGIN: 'yes'

  kafka:
image: bitnami/kafka:latest
restart: unless-stopped
ports:
- 9093:9093
environment:
  ALLOW_PLAINTEXT_LISTENER: 'yes'
  KAFKA_ENABLE_KRAFT: 'yes'
  KAFKA_CLIENT_USERS: onetl
  KAFKA_CLIENT_PASSWORDS: uufoFae9sahSoidoo0eagaidaoreif6z
  KAFKA_CFG_ZOOKEEPER_CONNECT: zookeeper:2181
  KAFKA_CFG_INTER_BROKER_LISTENER_NAME: INTERNAL_PLAINTEXT_ANONYMOUS
  KAFKA_CFG_LISTENERS: 
INTERNAL_PLAINTEXT_ANONYMOUS://:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://:9093
  KAFKA_CFG_ADVERTISED_LISTENERS: 
INTERNAL_PLAINTEXT_ANONYMOUS://kafka:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://localhost:9093
  KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: 
INTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT,EXTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT
  KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE: 'true'
depends_on:
- zookeeper
{code}

{code:bash}
docker-compose up -d
{code}

2. Start Spark session:

{code:bash}
pip install pyspark[sql]==3.4.1
{code}


{code:python}
from pyspark.sql import SparkSession

spark = SparkSession.builder.config("spark.jars.packages", 
"org.apache.spark:spark-sql-kafka-0-10_2.12:3.4.1").getOrCreate()
{code}

3. Create DataFrame and write it to Kafka. First write using {{mode="append"}} 
to create topic, then with {{mode="error"}} to raise because topic already 
exist:
{code}
df = spark.createDataFrame([{"value": "string"}])
df.write.format("kafka"

[jira] [Updated] (SPARK-44774) SaveMode.ErrorIfExists does not work with kafka-sql

2023-08-11 Thread Maxim Martynov (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Martynov updated SPARK-44774:
---
Description: 
I' trying to write batch dataframe to Kafka topic with {{mode="error"}}, but 
when topic exists it does not raise exception. Instead it appends data to a 
topic.

Steps to reproduce:

1. Start Kafka:

docker-compose.yml
{code:yaml}
version: '3.9'

services:
  zookeeper:
image: bitnami/zookeeper:3.8
environment:
  ALLOW_ANONYMOUS_LOGIN: 'yes'

  kafka:
image: bitnami/kafka:latest
restart: unless-stopped
ports:
- 9093:9093
environment:
  ALLOW_PLAINTEXT_LISTENER: 'yes'
  KAFKA_ENABLE_KRAFT: 'no'
  KAFKA_CFG_ZOOKEEPER_CONNECT: zookeeper:2181
  KAFKA_CFG_INTER_BROKER_LISTENER_NAME: INTERNAL_PLAINTEXT_ANONYMOUS
  KAFKA_CFG_LISTENERS: 
INTERNAL_PLAINTEXT_ANONYMOUS://:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://:9093
  KAFKA_CFG_ADVERTISED_LISTENERS: 
INTERNAL_PLAINTEXT_ANONYMOUS://kafka:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://localhost:9093
  KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: 
INTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT,EXTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT
  KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE: 'true'
depends_on:
- zookeeper
{code}

{code:bash}
docker-compose up -d
{code}

2. Start Spark session:

{code:bash}
pip install pyspark[sql]==3.4.1
{code}


{code:python}
from pyspark.sql import SparkSession

spark = SparkSession.builder.config("spark.jars.packages", 
"org.apache.spark:spark-sql-kafka-0-10_2.12:3.4.1").getOrCreate()
{code}

3. Create DataFrame and write it to Kafka. First write using {{mode="append"}} 
to create topic, then with {{mode="error"}} to raise because topic already 
exist:
{code}
df = spark.createDataFrame([{"value": "string"}])
df.write.format("kafka").option("kafka.bootstrap.servers", 
"localhost:9093").option("topic", "new_topic").mode("append").save()

# no exception is raised
df.write.format("kafka").option("kafka.bootstrap.servers", 
"localhost:9093").option("topic", "new_topic").mode("error").save()
{code}

4. Check topic content - 2 rows are added to topic instead of one:
{code:python}
spark.read.format("kafka").option("kafka.bootstrap.servers", 
"localhost:9093").option("subscribe", "new_topic").load().show(10, False)
{code}
{code}
++---+-+-+--+---+-+
|key |value  |topic|partition|offset|timestamp  
|timestampType|
++---+-+-+--+---+-+
|null|[73 74 72 69 6E 67]|new_topic|0|0 |2023-08-11 09:39:35.813|0  
  |
|null|[73 74 72 69 6E 67]|new_topic|0|1 |2023-08-11 09:39:36.122|0  
  |
++---+-+-+--+---+-+
{code}

It looks like mode is checked by KafkaSourceProvider, but is not used at all:
https://github.com/apache/spark/blob/6b1ff22dde1ead51cbf370be6e48a802daae58b6/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L172-L178

So data is always appended to topic.

  was:
I' trying to write batch dataframe to Kafka topic with {{mode="error"}}, but 
when topic exists it does not raise exception. Instead it appends data to a 
topic.

Steps to reproduce:

1. Start Kafka:

docker-compose.yml
{code:yaml}
version: '3.9'

services:
  zookeeper:
image: bitnami/zookeeper:3.8
environment:
  ALLOW_ANONYMOUS_LOGIN: 'yes'

  kafka:
image: bitnami/kafka:latest
restart: unless-stopped
ports:
- 9093:9093
environment:
  ALLOW_PLAINTEXT_LISTENER: 'yes'
  KAFKA_ENABLE_KRAFT: 'yes'
  KAFKA_CFG_ZOOKEEPER_CONNECT: zookeeper:2181
  KAFKA_CFG_INTER_BROKER_LISTENER_NAME: INTERNAL_PLAINTEXT_ANONYMOUS
  KAFKA_CFG_LISTENERS: 
INTERNAL_PLAINTEXT_ANONYMOUS://:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://:9093
  KAFKA_CFG_ADVERTISED_LISTENERS: 
INTERNAL_PLAINTEXT_ANONYMOUS://kafka:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://localhost:9093
  KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: 
INTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT,EXTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT
  KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE: 'true'
depends_on:
- zookeeper
{code}

{code:bash}
docker-compose up -d
{code}

2. Start Spark session:

{code:bash}
pip install pyspark[sql]==3.4.1
{code}


{code:python}
from pyspark.sql import SparkSession

spark = SparkSession.builder.config("spark.jars.packages", 
"org.apache.spark:spark-sql-kafka-0-10_2.12:3.4.1").getOrCreate()
{code}

3. Create DataFrame and write it to Kafka. First write using {{mode="append"}} 
to create topic, then with {{mode="error"}} to raise because topic already 
exist:
{code}
df = spark.createDataFrame([{"value": "string"}])
df.write.format("kafka").option("kafka.bootstrap.servers", 
"localhost:9093").option("topic", "new_topic").mode("append

[jira] [Assigned] (SPARK-43477) Enable SeriesStringTests.test_string_rsplit for pandas 2.0.0.

2023-08-11 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-43477:
-

Assignee: Haejoon Lee

> Enable SeriesStringTests.test_string_rsplit for pandas 2.0.0.
> -
>
> Key: SPARK-43477
> URL: https://issues.apache.org/jira/browse/SPARK-43477
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> Enable SeriesStringTests.test_string_rsplit for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43478) Enable SeriesStringTests.test_string_split for pandas 2.0.0.

2023-08-11 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-43478:
-

Assignee: Haejoon Lee

> Enable SeriesStringTests.test_string_split for pandas 2.0.0.
> 
>
> Key: SPARK-43478
> URL: https://issues.apache.org/jira/browse/SPARK-43478
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> Enable SeriesStringTests.test_string_split for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-43478) Enable SeriesStringTests.test_string_split for pandas 2.0.0.

2023-08-11 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-43478.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42312
[https://github.com/apache/spark/pull/42312]

> Enable SeriesStringTests.test_string_split for pandas 2.0.0.
> 
>
> Key: SPARK-43478
> URL: https://issues.apache.org/jira/browse/SPARK-43478
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 4.0.0
>
>
> Enable SeriesStringTests.test_string_split for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43476) Enable SeriesStringTests.test_string_replace for pandas 2.0.0.

2023-08-11 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-43476:
-

Assignee: Haejoon Lee

> Enable SeriesStringTests.test_string_replace for pandas 2.0.0.
> --
>
> Key: SPARK-43476
> URL: https://issues.apache.org/jira/browse/SPARK-43476
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> Enable SeriesStringTests.test_string_replace for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-43477) Enable SeriesStringTests.test_string_rsplit for pandas 2.0.0.

2023-08-11 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-43477.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42312
[https://github.com/apache/spark/pull/42312]

> Enable SeriesStringTests.test_string_rsplit for pandas 2.0.0.
> -
>
> Key: SPARK-43477
> URL: https://issues.apache.org/jira/browse/SPARK-43477
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 4.0.0
>
>
> Enable SeriesStringTests.test_string_rsplit for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44774) SaveMode.ErrorIfExists does not work with kafka-sql

2023-08-11 Thread Maxim Martynov (Jira)

Maxim Martynov created SPARK-44774:
--

 Summary: SaveMode.ErrorIfExists does not work with kafka-sql
 Key: SPARK-44774
 URL: https://issues.apache.org/jira/browse/SPARK-44774
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.1
Reporter: Maxim Martynov


I' trying to write batch dataframe to Kafka topic with {{mode="error"}}, but 
when topic exists it does not raise exception - instead it appends data to a 
topic.

Steps to reproduce:

1. Start Kafka:

docker-compose.yml
{code:yaml}
version: '3.9'

services:
  zookeeper:
image: bitnami/zookeeper:3.8
environment:
  ALLOW_ANONYMOUS_LOGIN: 'yes'

  kafka:
image: bitnami/kafka:latest
restart: unless-stopped
ports:
- 9093:9093
environment:
  ALLOW_PLAINTEXT_LISTENER: 'yes'
  KAFKA_ENABLE_KRAFT: 'yes'
  KAFKA_CLIENT_USERS: onetl
  KAFKA_CLIENT_PASSWORDS: uufoFae9sahSoidoo0eagaidaoreif6z
  KAFKA_CFG_ZOOKEEPER_CONNECT: zookeeper:2181
  KAFKA_CFG_INTER_BROKER_LISTENER_NAME: INTERNAL_PLAINTEXT_ANONYMOUS
  KAFKA_CFG_LISTENERS: 
INTERNAL_PLAINTEXT_ANONYMOUS://:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://:9093
  KAFKA_CFG_ADVERTISED_LISTENERS: 
INTERNAL_PLAINTEXT_ANONYMOUS://kafka:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://localhost:9093
  KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: 
INTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT,EXTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT
  KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE: 'true'
depends_on:
- zookeeper
{code}

{code:bash}
docker-compose up -d
{code}

2. Start Spark session:

{code:bash}
pip install pyspark[sql]==3.4.1
{code}


{code:python}
from pyspark.sql import SparkSession

spark = SparkSession.builder.config("spark.jars.packages", 
"org.apache.spark:spark-sql-kafka-0-10_2.12:3.4.1").getOrCreate()
{code}

3. Create DataFrame and write it to Kafka. First write using {{mode="append"}} 
to create topic, then with {{mode="error"}} to raise because topic already 
exist:
{code}
df = spark.createDataFrame([{"value": "string"}])
df.write.format("kafka").option("kafka.bootstrap.servers", 
"localhost:9093").option("topic", "new_topic").mode("append").save()

# no exception is raised
df.write.format("kafka").option("kafka.bootstrap.servers", 
"localhost:9093").option("topic", "new_topic").mode("error").save()
{code}

4. Check topic content - 2 rows are added to topic instead of one:
{code:python}
spark.read.format("kafka").option("kafka.bootstrap.servers", 
"localhost:9093").option("subscribe", "new_topic").load().show(10, False)
{code}
{code}
++---+-+-+--+---+-+
|key |value  |topic|partition|offset|timestamp  
|timestampType|
++---+-+-+--+---+-+
|null|[73 74 72 69 6E 67]|new_topic|0|0 |2023-08-11 09:39:35.813|0  
  |
|null|[73 74 72 69 6E 67]|new_topic|0|1 |2023-08-11 09:39:36.122|0  
  |
++---+-+-+--+---+-+
{code}

It looks like mode is checked by KafkaSourceProvider, but is not used at all:
https://github.com/apache/spark/blob/6b1ff22dde1ead51cbf370be6e48a802daae58b6/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L172-L178

So data is always appended to topic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-43476) Enable SeriesStringTests.test_string_replace for pandas 2.0.0.

2023-08-11 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-43476.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42312
[https://github.com/apache/spark/pull/42312]

> Enable SeriesStringTests.test_string_replace for pandas 2.0.0.
> --
>
> Key: SPARK-43476
> URL: https://issues.apache.org/jira/browse/SPARK-43476
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 4.0.0
>
>
> Enable SeriesStringTests.test_string_replace for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-44731) Support 'spark.sql.timestampType' in Python Spark Connect client

2023-08-11 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-44731.
---
Fix Version/s: 3.5.0
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 42445
[https://github.com/apache/spark/pull/42445]

> Support 'spark.sql.timestampType' in Python Spark Connect client
> 
>
> Key: SPARK-44731
> URL: https://issues.apache.org/jira/browse/SPARK-44731
> Project: Spark
>  Issue Type: Task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.5.0, 4.0.0
>
>
> If Spark session enables 'spark.sql.timestampType', datetime should be 
> inferred as TimestampNTZ type. However, this isn't implemented yet in Python 
> client side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-44731) Support 'spark.sql.timestampType' in Python Spark Connect client

2023-08-11 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-44731:
-

Assignee: Hyukjin Kwon

> Support 'spark.sql.timestampType' in Python Spark Connect client
> 
>
> Key: SPARK-44731
> URL: https://issues.apache.org/jira/browse/SPARK-44731
> Project: Spark
>  Issue Type: Task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> If Spark session enables 'spark.sql.timestampType', datetime should be 
> inferred as TimestampNTZ type. However, this isn't implemented yet in Python 
> client side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44773) Code-gen CodegenFallback expression in WholeStageCodegen if possible

2023-08-11 Thread Wan Kun (Jira)

Wan Kun created SPARK-44773:
---

 Summary: Code-gen CodegenFallback expression in WholeStageCodegen 
if possible
 Key: SPARK-44773
 URL: https://issues.apache.org/jira/browse/SPARK-44773
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0
Reporter: Wan Kun


Now both WholeStageCodegen framework and SubExpressionElimination framework 
does not support CodegenFallback expression, but the CodegenFallback expression 
which contains nullSafeEval method could gen-code just like common expressions, 
now they are always be executed in a new SpecificUnsafeProjection class, and we 
can not eliminate the sub expressions.

For example:
SQL:
{code:sql}
SELECT from_json(regexp_replace(s, 'a', 'x'), 'x INT, b DOUBLE').x,
   from_json(regexp_replace(s, 'a', 'x'), 'x INT, b DOUBLE').b
FROM values('{"a":1, "b":0.8}') t(s)
{code}
plan:
{code:java}
*(1) Project [from_json(StructField(x,IntegerType,true), regexp_replace(s#218, 
a, x, 1), Some(America/Los_Angeles)).x AS from_json(regexp_replace(s, a, x, 
1)).x#219, from_json(StructField(b,DoubleType,true), regexp_replace(s#218, a, 
x, 1), Some(America/Los_Angeles)).b AS from_json(regexp_replace(s, a, x, 
1)).b#220]
+- *(1) LocalTableScan [s#218]
{code}
Due to expression org.apache.spark.sql.catalyst.expressions.JsonToStructs is 
CodegenFallback expression, so we can not reuse the result of 
{*}regexp_replace(s, 'a', 'x'){*}.
We can support expression 
org.apache.spark.sql.catalyst.expressions.JsonToStructs code-gen in 
WholeStageCodegen framework, and then reuse the result of {*}regexp_replace(s, 
'a', 'x'){*}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-44770) Add a displayOrder variable to WebUITab to specify the order in which tabs appear

2023-08-11 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-44770:


Assignee: Jason Li

> Add a displayOrder variable to WebUITab to specify the order in which tabs 
> appear
> -
>
> Key: SPARK-44770
> URL: https://issues.apache.org/jira/browse/SPARK-44770
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.5.1
>Reporter: Jason Li
>Assignee: Jason Li
>Priority: Major
>
> Add a displayOrder variable to WebUITab to specify the order in which tabs 
> appear. Currently, the tabs are ordered by when they get attached, which 
> isn't always desired. The default is MIN_VALUE, meaning if it's not 
> specified, it will appear in the order added before any tabs with a 
> non-default displayOrder.  For example, we would like to have the SQL Tab 
> appear before the Connect tab; however, based on the code flow, the Connect 
> tab will be attached first and with the current logic, that tab would also 
> appear first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-44770) Add a displayOrder variable to WebUITab to specify the order in which tabs appear

2023-08-11 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-44770.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42442
[https://github.com/apache/spark/pull/42442]

> Add a displayOrder variable to WebUITab to specify the order in which tabs 
> appear
> -
>
> Key: SPARK-44770
> URL: https://issues.apache.org/jira/browse/SPARK-44770
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.5.1
>Reporter: Jason Li
>Assignee: Jason Li
>Priority: Major
> Fix For: 4.0.0
>
>
> Add a displayOrder variable to WebUITab to specify the order in which tabs 
> appear. Currently, the tabs are ordered by when they get attached, which 
> isn't always desired. The default is MIN_VALUE, meaning if it's not 
> specified, it will appear in the order added before any tabs with a 
> non-default displayOrder.  For example, we would like to have the SQL Tab 
> appear before the Connect tab; however, based on the code flow, the Connect 
> tab will be attached first and with the current logic, that tab would also 
> appear first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-44727) Improve the error message for dynamic allocation conditions

2023-08-11 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-44727:


Assignee: Cheng Pan

> Improve the error message for dynamic allocation conditions
> ---
>
> Key: SPARK-44727
> URL: https://issues.apache.org/jira/browse/SPARK-44727
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-44727) Improve the error message for dynamic allocation conditions

2023-08-11 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-44727.
--
Fix Version/s: 3.5.0
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 42404
[https://github.com/apache/spark/pull/42404]

> Improve the error message for dynamic allocation conditions
> ---
>
> Key: SPARK-44727
> URL: https://issues.apache.org/jira/browse/SPARK-44727
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
> Fix For: 3.5.0, 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-44737) Should not display json format errors on SQL page for non-SparkThrowables

2023-08-11 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-44737.
--
Fix Version/s: 3.5.0
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 42407
[https://github.com/apache/spark/pull/42407]

> Should not display json format errors on SQL page for non-SparkThrowables
> -
>
> Key: SPARK-44737
> URL: https://issues.apache.org/jira/browse/SPARK-44737
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 3.5.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.5.0, 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-44737) Should not display json format errors on SQL page for non-SparkThrowables

2023-08-11 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-44737:


Assignee: Kent Yao

> Should not display json format errors on SQL page for non-SparkThrowables
> -
>
> Key: SPARK-44737
> URL: https://issues.apache.org/jira/browse/SPARK-44737
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 3.5.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

45 matches

Mail list logo