[jira] [Resolved] (SPARK-44778) Add the `TIMEDIFF` aliases for `TIMESTAMPDIFF()`
[ https://issues.apache.org/jira/browse/SPARK-44778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-44778. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42435 [https://github.com/apache/spark/pull/42435] > Add the `TIMEDIFF` aliases for `TIMESTAMPDIFF()` > > > Key: SPARK-44778 > URL: https://issues.apache.org/jira/browse/SPARK-44778 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 4.0.0 > > > Introduce the timediff() function, which takes three arguments: unit, and two > datetime expressions, i.e., > {code:sql} > datediff(unit, startDatetime, endDatetime) > {code} > The function can be an alias to timestampdiff(). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44780) Document SQL Session variables
[ https://issues.apache.org/jira/browse/SPARK-44780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serge Rielau updated SPARK-44780: - Attachment: Screenshot 2023-08-11 at 10.22.55 PM.png Screenshot 2023-08-11 at 10.24.33 PM.png Screenshot 2023-08-11 at 10.26.54 PM.png > Document SQL Session variables > -- > > Key: SPARK-44780 > URL: https://issues.apache.org/jira/browse/SPARK-44780 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 3.4.2 >Reporter: Serge Rielau >Priority: Major > Attachments: Screenshot 2023-08-11 at 10.22.55 PM.png, Screenshot > 2023-08-11 at 10.24.33 PM.png, Screenshot 2023-08-11 at 10.26.54 PM.png > > > SQL Session variables have been added with: SPARK-42849. > Here we add the docs for it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44719) NoClassDefFoundError when using Hive UDF
[ https://issues.apache.org/jira/browse/SPARK-44719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-44719. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42446 [https://github.com/apache/spark/pull/42446] > NoClassDefFoundError when using Hive UDF > > > Key: SPARK-44719 > URL: https://issues.apache.org/jira/browse/SPARK-44719 > Project: Spark > Issue Type: Bug > Components: Build, SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 4.0.0 > > Attachments: HiveUDFs-1.0-SNAPSHOT.jar > > > How to reproduce: > {noformat} > spark-sql (default)> add jar > /Users/yumwang/Downloads/HiveUDFs-1.0-SNAPSHOT.jar; > Time taken: 0.413 seconds > spark-sql (default)> CREATE TEMPORARY FUNCTION long_to_ip as > 'net.petrabarus.hiveudfs.LongToIP'; > Time taken: 0.038 seconds > spark-sql (default)> SELECT long_to_ip(2130706433L) FROM range(10); > 23/08/08 20:17:58 ERROR SparkSQLDriver: Failed in [SELECT > long_to_ip(2130706433L) FROM range(10)] > java.lang.NoClassDefFoundError: org/codehaus/jackson/map/type/TypeFactory > at org.apache.hadoop.hive.ql.udf.UDFJson.(UDFJson.java:64) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > ... > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44719) NoClassDefFoundError when using Hive UDF
[ https://issues.apache.org/jira/browse/SPARK-44719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-44719: Assignee: Yuming Wang > NoClassDefFoundError when using Hive UDF > > > Key: SPARK-44719 > URL: https://issues.apache.org/jira/browse/SPARK-44719 > Project: Spark > Issue Type: Bug > Components: Build, SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Attachments: HiveUDFs-1.0-SNAPSHOT.jar > > > How to reproduce: > {noformat} > spark-sql (default)> add jar > /Users/yumwang/Downloads/HiveUDFs-1.0-SNAPSHOT.jar; > Time taken: 0.413 seconds > spark-sql (default)> CREATE TEMPORARY FUNCTION long_to_ip as > 'net.petrabarus.hiveudfs.LongToIP'; > Time taken: 0.038 seconds > spark-sql (default)> SELECT long_to_ip(2130706433L) FROM range(10); > 23/08/08 20:17:58 ERROR SparkSQLDriver: Failed in [SELECT > long_to_ip(2130706433L) FROM range(10)] > java.lang.NoClassDefFoundError: org/codehaus/jackson/map/type/TypeFactory > at org.apache.hadoop.hive.ql.udf.UDFJson.(UDFJson.java:64) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > ... > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44781) Runtime filter should supports reuse exchange if it can reduce the data size of application side
[ https://issues.apache.org/jira/browse/SPARK-44781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17753458#comment-17753458 ] jiaan.geng commented on SPARK-44781: I'm working on. > Runtime filter should supports reuse exchange if it can reduce the data size > of application side > > > Key: SPARK-44781 > URL: https://issues.apache.org/jira/browse/SPARK-44781 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: jiaan.geng >Priority: Major > > Currently, Spark runtime filter only supports using the subquery on one table. > In fact, we can reuse the exchange, even if it is a shuffle exchange. > If the shuffle exchange come from a join which has one side with selective > predicates, so the results of the join can be used to prune the data amount > of the application side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44781) Runtime filter should supports reuse exchange if it can reduce the data size of application side
jiaan.geng created SPARK-44781: -- Summary: Runtime filter should supports reuse exchange if it can reduce the data size of application side Key: SPARK-44781 URL: https://issues.apache.org/jira/browse/SPARK-44781 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 4.0.0 Reporter: jiaan.geng Currently, Spark runtime filter only supports using the subquery on one table. In fact, we can reuse the exchange, even if it is a shuffle exchange. If the shuffle exchange come from a join which has one side with selective predicates, so the results of the join can be used to prune the data amount of the application side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44242) Spark job submission failed because Xmx string is available on one parameter provided into spark.driver.extraJavaOptions
[ https://issues.apache.org/jira/browse/SPARK-44242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan resolved SPARK-44242. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 41806 [https://github.com/apache/spark/pull/41806] > Spark job submission failed because Xmx string is available on one parameter > provided into spark.driver.extraJavaOptions > > > Key: SPARK-44242 > URL: https://issues.apache.org/jira/browse/SPARK-44242 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 3.3.2, 3.4.1 >Reporter: Nicolas Fraison >Assignee: Nicolas Fraison >Priority: Major > Fix For: 4.0.0 > > > The spark-submit command failed if Xmx string is found on any parameters > provided to spark.driver.extraJavaOptions. > For ex. running this spark-submit command line > {code:java} > ./bin/spark-submit --class org.apache.spark.examples.SparkPi --conf > "spark.driver.extraJavaOptions=-Dtest=Xmx" > examples/jars/spark-examples_2.12-3.4.1.jar 100{code} > failed due to > {code:java} > Error: Not allowed to specify max heap(Xmx) memory settings through java > options (was -Dtest=Xmx). Use the corresponding --driver-memory or > spark.driver.memory configuration instead.{code} > The check performed in > [https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L314] > seems to broad -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44242) Spark job submission failed because Xmx string is available on one parameter provided into spark.driver.extraJavaOptions
[ https://issues.apache.org/jira/browse/SPARK-44242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan reassigned SPARK-44242: --- Assignee: Nicolas Fraison > Spark job submission failed because Xmx string is available on one parameter > provided into spark.driver.extraJavaOptions > > > Key: SPARK-44242 > URL: https://issues.apache.org/jira/browse/SPARK-44242 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 3.3.2, 3.4.1 >Reporter: Nicolas Fraison >Assignee: Nicolas Fraison >Priority: Major > > The spark-submit command failed if Xmx string is found on any parameters > provided to spark.driver.extraJavaOptions. > For ex. running this spark-submit command line > {code:java} > ./bin/spark-submit --class org.apache.spark.examples.SparkPi --conf > "spark.driver.extraJavaOptions=-Dtest=Xmx" > examples/jars/spark-examples_2.12-3.4.1.jar 100{code} > failed due to > {code:java} > Error: Not allowed to specify max heap(Xmx) memory settings through java > options (was -Dtest=Xmx). Use the corresponding --driver-memory or > spark.driver.memory configuration instead.{code} > The check performed in > [https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L314] > seems to broad -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43987) Separate finalizeShuffleMerge Processing to Dedicated Thread Pools
[ https://issues.apache.org/jira/browse/SPARK-43987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan resolved SPARK-43987. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 41489 [https://github.com/apache/spark/pull/41489] > Separate finalizeShuffleMerge Processing to Dedicated Thread Pools > -- > > Key: SPARK-43987 > URL: https://issues.apache.org/jira/browse/SPARK-43987 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 3.2.0, 3.4.0 >Reporter: SHU WANG >Assignee: SHU WANG >Priority: Critical > Fix For: 4.0.0 > > > In our production environment, _finalizeShuffleMerge_ processing took longer > time (p90 is around 20s) than other PRC requests. This is due to > _finalizeShuffleMerge_ invoking IO operations like truncate and file > open/close. > More importantly, processing this _finalizeShuffleMerge_ can block other > critical lightweight messages like authentications, which can cause > authentication timeout as well as fetch failures. Those timeout and fetch > failures affect the stability of the Spark job executions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43987) Separate finalizeShuffleMerge Processing to Dedicated Thread Pools
[ https://issues.apache.org/jira/browse/SPARK-43987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan reassigned SPARK-43987: --- Assignee: SHU WANG > Separate finalizeShuffleMerge Processing to Dedicated Thread Pools > -- > > Key: SPARK-43987 > URL: https://issues.apache.org/jira/browse/SPARK-43987 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 3.2.0, 3.4.0 >Reporter: SHU WANG >Assignee: SHU WANG >Priority: Critical > > In our production environment, _finalizeShuffleMerge_ processing took longer > time (p90 is around 20s) than other PRC requests. This is due to > _finalizeShuffleMerge_ invoking IO operations like truncate and file > open/close. > More importantly, processing this _finalizeShuffleMerge_ can block other > critical lightweight messages like authentications, which can cause > authentication timeout as well as fetch failures. Those timeout and fetch > failures affect the stability of the Spark job executions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44461) Enable Process Isolation for streaming python worker
[ https://issues.apache.org/jira/browse/SPARK-44461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17753427#comment-17753427 ] Hyukjin Kwon commented on SPARK-44461: -- [~rangadi] can we switch the JIRA by switching the description and title? > Enable Process Isolation for streaming python worker > > > Key: SPARK-44461 > URL: https://issues.apache.org/jira/browse/SPARK-44461 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 3.4.1 >Reporter: Raghu Angadi >Priority: Major > > Enable PI for Python worker used for foreachBatch() & streaming listener in > Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44780) Document SQL Session variables
[ https://issues.apache.org/jira/browse/SPARK-44780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serge Rielau updated SPARK-44780: - Summary: Document SQL Session variables (was: Docuement SQL Session variables) > Document SQL Session variables > -- > > Key: SPARK-44780 > URL: https://issues.apache.org/jira/browse/SPARK-44780 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 3.4.2 >Reporter: Serge Rielau >Priority: Major > > SQL Session variables have been added with: SPARK-42849. > Here we add the docs for it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44780) Docuement SQL Session variables
Serge Rielau created SPARK-44780: Summary: Docuement SQL Session variables Key: SPARK-44780 URL: https://issues.apache.org/jira/browse/SPARK-44780 Project: Spark Issue Type: Task Components: Spark Core Affects Versions: 3.4.2 Reporter: Serge Rielau SQL Session variables have been added with: SPARK-42849. Here we add the docs for it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44778) Add the `TIMEDIFF` aliases for `TIMESTAMPDIFF()`
[ https://issues.apache.org/jira/browse/SPARK-44778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-44778: - Epic Link: (was: SPARK-38783) > Add the `TIMEDIFF` aliases for `TIMESTAMPDIFF()` > > > Key: SPARK-44778 > URL: https://issues.apache.org/jira/browse/SPARK-44778 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.3.0 > > > Introduce the datediff()/date_diff() function, which takes three arguments: > unit, and two datetime expressions, i.e., > {code:sql} > datediff(unit, startDatetime, endDatetime) > {code} > The function can be an alias to timestampdiff(). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44778) Add the `TIMEDIFF` aliases for `TIMESTAMPDIFF()`
[ https://issues.apache.org/jira/browse/SPARK-44778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-44778: - Description: Introduce the timediff() function, which takes three arguments: unit, and two datetime expressions, i.e., {code:sql} datediff(unit, startDatetime, endDatetime) {code} The function can be an alias to timestampdiff(). was: Introduce the datediff()/date_diff() function, which takes three arguments: unit, and two datetime expressions, i.e., {code:sql} datediff(unit, startDatetime, endDatetime) {code} The function can be an alias to timestampdiff(). > Add the `TIMEDIFF` aliases for `TIMESTAMPDIFF()` > > > Key: SPARK-44778 > URL: https://issues.apache.org/jira/browse/SPARK-44778 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Introduce the timediff() function, which takes three arguments: unit, and two > datetime expressions, i.e., > {code:sql} > datediff(unit, startDatetime, endDatetime) > {code} > The function can be an alias to timestampdiff(). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44778) Add the `TIMEDIFF` aliases for `TIMESTAMPDIFF()`
[ https://issues.apache.org/jira/browse/SPARK-44778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-44778: - Affects Version/s: 4.0.0 (was: 3.3.0) > Add the `TIMEDIFF` aliases for `TIMESTAMPDIFF()` > > > Key: SPARK-44778 > URL: https://issues.apache.org/jira/browse/SPARK-44778 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Introduce the datediff()/date_diff() function, which takes three arguments: > unit, and two datetime expressions, i.e., > {code:sql} > datediff(unit, startDatetime, endDatetime) > {code} > The function can be an alias to timestampdiff(). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44778) Add the `TIMEDIFF` aliases for `TIMESTAMPDIFF()`
[ https://issues.apache.org/jira/browse/SPARK-44778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-44778: - Fix Version/s: (was: 3.3.0) > Add the `TIMEDIFF` aliases for `TIMESTAMPDIFF()` > > > Key: SPARK-44778 > URL: https://issues.apache.org/jira/browse/SPARK-44778 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Introduce the datediff()/date_diff() function, which takes three arguments: > unit, and two datetime expressions, i.e., > {code:sql} > datediff(unit, startDatetime, endDatetime) > {code} > The function can be an alias to timestampdiff(). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44778) Add the `TIMEDIFF` aliases for `TIMESTAMPDIFF()`
Max Gekk created SPARK-44778: Summary: Add the `TIMEDIFF` aliases for `TIMESTAMPDIFF()` Key: SPARK-44778 URL: https://issues.apache.org/jira/browse/SPARK-44778 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.3.0 Reporter: Max Gekk Assignee: Max Gekk Fix For: 3.3.0 Introduce the datediff()/date_diff() function, which takes three arguments: unit, and two datetime expressions, i.e., {code:sql} datediff(unit, startDatetime, endDatetime) {code} The function can be an alias to timestampdiff(). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44625) Spark Connect clean up abandoned executions
[ https://issues.apache.org/jira/browse/SPARK-44625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-44625. --- Fix Version/s: 3.5.0 Assignee: Juliusz Sompolski Resolution: Fixed > Spark Connect clean up abandoned executions > --- > > Key: SPARK-44625 > URL: https://issues.apache.org/jira/browse/SPARK-44625 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0, 4.0.0 >Reporter: Juliusz Sompolski >Assignee: Juliusz Sompolski >Priority: Major > Fix For: 3.5.0 > > > With reattachable executions, some executions might get orphaned when > ReattachExecute and ReleaseExecute never comes. Add a mechanism to track that > and to clean them up. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44777) Allow to specify eagerness to RDD.checkpoint
Emil Ejbyfeldt created SPARK-44777: -- Summary: Allow to specify eagerness to RDD.checkpoint Key: SPARK-44777 URL: https://issues.apache.org/jira/browse/SPARK-44777 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.5.0 Reporter: Emil Ejbyfeldt Currently Dataset.checkpoint takes a boolean to indicate if the checkpoint should be done eagerly. For the same reason that one might want to be able eagerly checkpoint an Dataset one might want to do it with a RDD. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44776) Add ProducedRowCount to SparkListenerConnectOperationFinished
Lingkai Kong created SPARK-44776: Summary: Add ProducedRowCount to SparkListenerConnectOperationFinished Key: SPARK-44776 URL: https://issues.apache.org/jira/browse/SPARK-44776 Project: Spark Issue Type: Task Components: Connect Affects Versions: 3.4.1 Reporter: Lingkai Kong As title -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-43327) Trigger `committer.setupJob` before plan execute in `FileFormatWriter`
[ https://issues.apache.org/jira/browse/SPARK-43327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17718060#comment-17718060 ] ming95 edited comment on SPARK-43327 at 8/11/23 12:58 PM: -- pr : https://github.com/apache/spark/pull/41154 was (Author: zing): pr : https://github.com/apache/spark/pull/41000 > Trigger `committer.setupJob` before plan execute in `FileFormatWriter` > -- > > Key: SPARK-43327 > URL: https://issues.apache.org/jira/browse/SPARK-43327 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.3 >Reporter: ming95 >Priority: Major > > In this jira, the case where `outputOrdering` might not work if AQE is > enabled has been resolved. > https://issues.apache.org/jira/browse/SPARK-40588 > However, since it materializes the AQE plan in advance (triggers > getFinalPhysicalPlan) , it may cause the committer.setupJob(job) to not > execute When `AdaptiveSparkPlanExec#getFinalPhysicalPlan()` is executed with > an error. > Normally this step should be executed after committer.setupJob(job). > This may eventually result in the insertoverwrite directory being deleted. > > {code:java} > import org.apache.hadoop.fs.{FileSystem, Path} > import org.apache.spark.sql.QueryTest > import org.apache.spark.sql.catalyst.TableIdentifier > sql("CREATE TABLE IF NOT EXISTS spark32_overwrite(amt1 int) STORED AS ORC") > sql("CREATE TABLE IF NOT EXISTS spark32_overwrite2(amt1 long) STORED AS ORC") > sql("INSERT OVERWRITE TABLE spark32_overwrite2 select 644164") > sql("set spark.sql.ansi.enabled=true") > val loc = > > spark.sessionState.catalog.getTableMetadata(TableIdentifier("spark32_overwrite")).location > val fs = FileSystem.get(loc, spark.sparkContext.hadoopConfiguration) > println("Location exists: " + fs.exists(new Path(loc))) > try { > sql("INSERT OVERWRITE TABLE spark32_overwrite select amt1 from " + > "(select cast(amt1 as int) as amt1 from spark32_overwrite2 distribute by > amt1)") > } finally { > println("Location exists: " + fs.exists(new Path(loc))) > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44761) Add DataStreamWriter.foreachBatch(org.apache.spark.api.java.function.VoidFunction2) signature
[ https://issues.apache.org/jira/browse/SPARK-44761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-44761. --- Fix Version/s: 3.5.0 Resolution: Fixed > Add > DataStreamWriter.foreachBatch(org.apache.spark.api.java.function.VoidFunction2) > signature > -- > > Key: SPARK-44761 > URL: https://issues.apache.org/jira/browse/SPARK-44761 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44760) Index Out Of Bound for JIRA resolution in merge_spark_pr
[ https://issues.apache.org/jira/browse/SPARK-44760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-44760. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42429 [https://github.com/apache/spark/pull/42429] > Index Out Of Bound for JIRA resolution in merge_spark_pr > > > Key: SPARK-44760 > URL: https://issues.apache.org/jira/browse/SPARK-44760 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Fix For: 4.0.0 > > > I -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44775) Add missing version information in DataFrame APIs
Ruifeng Zheng created SPARK-44775: - Summary: Add missing version information in DataFrame APIs Key: SPARK-44775 URL: https://issues.apache.org/jira/browse/SPARK-44775 Project: Spark Issue Type: Improvement Components: Documentation, PySpark Affects Versions: 3.5.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44774) SaveMode.ErrorIfExists does not work with kafka-sql
[ https://issues.apache.org/jira/browse/SPARK-44774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Martynov updated SPARK-44774: --- Description: I' trying to write batch dataframe to Kafka topic with {{mode="error"}}, but when topic exists it does not raise exception. Instead it appends data to a topic. Steps to reproduce: 1. Start Kafka: docker-compose.yml {code:yaml} version: '3.9' services: zookeeper: image: bitnami/zookeeper:3.8 environment: ALLOW_ANONYMOUS_LOGIN: 'yes' kafka: image: bitnami/kafka:latest restart: unless-stopped ports: - 9093:9093 environment: ALLOW_PLAINTEXT_LISTENER: 'yes' KAFKA_ENABLE_KRAFT: 'no' KAFKA_CFG_ZOOKEEPER_CONNECT: zookeeper:2181 KAFKA_CFG_INTER_BROKER_LISTENER_NAME: INTERNAL_PLAINTEXT_ANONYMOUS KAFKA_CFG_LISTENERS: INTERNAL_PLAINTEXT_ANONYMOUS://:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://:9093 KAFKA_CFG_ADVERTISED_LISTENERS: INTERNAL_PLAINTEXT_ANONYMOUS://kafka:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://localhost:9093 KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT,EXTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE: 'true' depends_on: - zookeeper {code} {code:bash} docker-compose up -d {code} 2. Start Spark session: {code:bash} pip install pyspark[sql]==3.4.1 {code} {code:python} from pyspark.sql import SparkSession spark = SparkSession.builder.config("spark.jars.packages", "org.apache.spark:spark-sql-kafka-0-10_2.12:3.4.1").getOrCreate() {code} 3. Create DataFrame and write it to Kafka. First write using {{mode="append"}} to create topic, then with {{mode="error"}} to raise because topic already exist: {code} df = spark.createDataFrame([{"value": "string"}]) df.write.format("kafka").option("kafka.bootstrap.servers", "localhost:9093").option("topic", "new_topic").mode("append").save() # no exception is raised df.write.format("kafka").option("kafka.bootstrap.servers", "localhost:9093").option("topic", "new_topic").mode("error").save() {code} 4. Check topic content - 2 rows are added to topic instead of one: {code:python} spark.read.format("kafka").option("kafka.bootstrap.servers", "localhost:9093").option("subscribe", "new_topic").load().show(10, False) {code} {code} ++---+-+-+--+---+-+ |key |value |topic|partition|offset|timestamp |timestampType| ++---+-+-+--+---+-+ |null|[73 74 72 69 6E 67]|new_topic|0|0 |2023-08-11 09:39:35.813|0 | |null|[73 74 72 69 6E 67]|new_topic|0|1 |2023-08-11 09:39:36.122|0 | ++---+-+-+--+---+-+ {code} It looks like mode is checked by KafkaSourceProvider, but is not used at all: https://github.com/apache/spark/blob/v3.4.1/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L172-L178 So data is always appended to topic. was: I' trying to write batch dataframe to Kafka topic with {{mode="error"}}, but when topic exists it does not raise exception. Instead it appends data to a topic. Steps to reproduce: 1. Start Kafka: docker-compose.yml {code:yaml} version: '3.9' services: zookeeper: image: bitnami/zookeeper:3.8 environment: ALLOW_ANONYMOUS_LOGIN: 'yes' kafka: image: bitnami/kafka:latest restart: unless-stopped ports: - 9093:9093 environment: ALLOW_PLAINTEXT_LISTENER: 'yes' KAFKA_ENABLE_KRAFT: 'no' KAFKA_CFG_ZOOKEEPER_CONNECT: zookeeper:2181 KAFKA_CFG_INTER_BROKER_LISTENER_NAME: INTERNAL_PLAINTEXT_ANONYMOUS KAFKA_CFG_LISTENERS: INTERNAL_PLAINTEXT_ANONYMOUS://:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://:9093 KAFKA_CFG_ADVERTISED_LISTENERS: INTERNAL_PLAINTEXT_ANONYMOUS://kafka:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://localhost:9093 KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT,EXTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE: 'true' depends_on: - zookeeper {code} {code:bash} docker-compose up -d {code} 2. Start Spark session: {code:bash} pip install pyspark[sql]==3.4.1 {code} {code:python} from pyspark.sql import SparkSession spark = SparkSession.builder.config("spark.jars.packages", "org.apache.spark:spark-sql-kafka-0-10_2.12:3.4.1").getOrCreate() {code} 3. Create DataFrame and write it to Kafka. First write using {{mode="append"}} to create topic, then with {{mode="error"}} to raise because topic already exist: {code} df = spark.createDataFrame([{"value": "string"}]) df.write.format("kafka").option("kafka.bootstrap.servers", "localhost:9093").option("topic", "new_topic").mode("append").save() # no exception is raised
[jira] [Updated] (SPARK-44774) SaveMode.ErrorIfExists does not work with kafka-sql
[ https://issues.apache.org/jira/browse/SPARK-44774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Martynov updated SPARK-44774: --- Description: I' trying to write batch dataframe to Kafka topic with {{mode="error"}}, but when topic exists it does not raise exception. Instead it appends data to a topic. Steps to reproduce: 1. Start Kafka: docker-compose.yml {code:yaml} version: '3.9' services: zookeeper: image: bitnami/zookeeper:3.8 environment: ALLOW_ANONYMOUS_LOGIN: 'yes' kafka: image: bitnami/kafka:latest restart: unless-stopped ports: - 9093:9093 environment: ALLOW_PLAINTEXT_LISTENER: 'yes' KAFKA_ENABLE_KRAFT: 'yes' KAFKA_CLIENT_USERS: onetl KAFKA_CLIENT_PASSWORDS: uufoFae9sahSoidoo0eagaidaoreif6z KAFKA_CFG_ZOOKEEPER_CONNECT: zookeeper:2181 KAFKA_CFG_INTER_BROKER_LISTENER_NAME: INTERNAL_PLAINTEXT_ANONYMOUS KAFKA_CFG_LISTENERS: INTERNAL_PLAINTEXT_ANONYMOUS://:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://:9093 KAFKA_CFG_ADVERTISED_LISTENERS: INTERNAL_PLAINTEXT_ANONYMOUS://kafka:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://localhost:9093 KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT,EXTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE: 'true' depends_on: - zookeeper {code} {code:bash} docker-compose up -d {code} 2. Start Spark session: {code:bash} pip install pyspark[sql]==3.4.1 {code} {code:python} from pyspark.sql import SparkSession spark = SparkSession.builder.config("spark.jars.packages", "org.apache.spark:spark-sql-kafka-0-10_2.12:3.4.1").getOrCreate() {code} 3. Create DataFrame and write it to Kafka. First write using {{mode="append"}} to create topic, then with {{mode="error"}} to raise because topic already exist: {code} df = spark.createDataFrame([{"value": "string"}]) df.write.format("kafka").option("kafka.bootstrap.servers", "localhost:9093").option("topic", "new_topic").mode("append").save() # no exception is raised df.write.format("kafka").option("kafka.bootstrap.servers", "localhost:9093").option("topic", "new_topic").mode("error").save() {code} 4. Check topic content - 2 rows are added to topic instead of one: {code:python} spark.read.format("kafka").option("kafka.bootstrap.servers", "localhost:9093").option("subscribe", "new_topic").load().show(10, False) {code} {code} ++---+-+-+--+---+-+ |key |value |topic|partition|offset|timestamp |timestampType| ++---+-+-+--+---+-+ |null|[73 74 72 69 6E 67]|new_topic|0|0 |2023-08-11 09:39:35.813|0 | |null|[73 74 72 69 6E 67]|new_topic|0|1 |2023-08-11 09:39:36.122|0 | ++---+-+-+--+---+-+ {code} It looks like mode is checked by KafkaSourceProvider, but is not used at all: https://github.com/apache/spark/blob/6b1ff22dde1ead51cbf370be6e48a802daae58b6/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L172-L178 So data is always appended to topic. was: I' trying to write batch dataframe to Kafka topic with {{mode="error"}}, but when topic exists it does not raise exception - instead it appends data to a topic. Steps to reproduce: 1. Start Kafka: docker-compose.yml {code:yaml} version: '3.9' services: zookeeper: image: bitnami/zookeeper:3.8 environment: ALLOW_ANONYMOUS_LOGIN: 'yes' kafka: image: bitnami/kafka:latest restart: unless-stopped ports: - 9093:9093 environment: ALLOW_PLAINTEXT_LISTENER: 'yes' KAFKA_ENABLE_KRAFT: 'yes' KAFKA_CLIENT_USERS: onetl KAFKA_CLIENT_PASSWORDS: uufoFae9sahSoidoo0eagaidaoreif6z KAFKA_CFG_ZOOKEEPER_CONNECT: zookeeper:2181 KAFKA_CFG_INTER_BROKER_LISTENER_NAME: INTERNAL_PLAINTEXT_ANONYMOUS KAFKA_CFG_LISTENERS: INTERNAL_PLAINTEXT_ANONYMOUS://:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://:9093 KAFKA_CFG_ADVERTISED_LISTENERS: INTERNAL_PLAINTEXT_ANONYMOUS://kafka:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://localhost:9093 KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT,EXTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE: 'true' depends_on: - zookeeper {code} {code:bash} docker-compose up -d {code} 2. Start Spark session: {code:bash} pip install pyspark[sql]==3.4.1 {code} {code:python} from pyspark.sql import SparkSession spark = SparkSession.builder.config("spark.jars.packages", "org.apache.spark:spark-sql-kafka-0-10_2.12:3.4.1").getOrCreate() {code} 3. Create DataFrame and write it to Kafka. First write using {{mode="append"}} to create topic, then with {{mode="error"}} to raise because topic
[jira] [Updated] (SPARK-44774) SaveMode.ErrorIfExists does not work with kafka-sql
[ https://issues.apache.org/jira/browse/SPARK-44774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Martynov updated SPARK-44774: --- Description: I' trying to write batch dataframe to Kafka topic with {{mode="error"}}, but when topic exists it does not raise exception. Instead it appends data to a topic. Steps to reproduce: 1. Start Kafka: docker-compose.yml {code:yaml} version: '3.9' services: zookeeper: image: bitnami/zookeeper:3.8 environment: ALLOW_ANONYMOUS_LOGIN: 'yes' kafka: image: bitnami/kafka:latest restart: unless-stopped ports: - 9093:9093 environment: ALLOW_PLAINTEXT_LISTENER: 'yes' KAFKA_ENABLE_KRAFT: 'yes' KAFKA_CFG_ZOOKEEPER_CONNECT: zookeeper:2181 KAFKA_CFG_INTER_BROKER_LISTENER_NAME: INTERNAL_PLAINTEXT_ANONYMOUS KAFKA_CFG_LISTENERS: INTERNAL_PLAINTEXT_ANONYMOUS://:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://:9093 KAFKA_CFG_ADVERTISED_LISTENERS: INTERNAL_PLAINTEXT_ANONYMOUS://kafka:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://localhost:9093 KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT,EXTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE: 'true' depends_on: - zookeeper {code} {code:bash} docker-compose up -d {code} 2. Start Spark session: {code:bash} pip install pyspark[sql]==3.4.1 {code} {code:python} from pyspark.sql import SparkSession spark = SparkSession.builder.config("spark.jars.packages", "org.apache.spark:spark-sql-kafka-0-10_2.12:3.4.1").getOrCreate() {code} 3. Create DataFrame and write it to Kafka. First write using {{mode="append"}} to create topic, then with {{mode="error"}} to raise because topic already exist: {code} df = spark.createDataFrame([{"value": "string"}]) df.write.format("kafka").option("kafka.bootstrap.servers", "localhost:9093").option("topic", "new_topic").mode("append").save() # no exception is raised df.write.format("kafka").option("kafka.bootstrap.servers", "localhost:9093").option("topic", "new_topic").mode("error").save() {code} 4. Check topic content - 2 rows are added to topic instead of one: {code:python} spark.read.format("kafka").option("kafka.bootstrap.servers", "localhost:9093").option("subscribe", "new_topic").load().show(10, False) {code} {code} ++---+-+-+--+---+-+ |key |value |topic|partition|offset|timestamp |timestampType| ++---+-+-+--+---+-+ |null|[73 74 72 69 6E 67]|new_topic|0|0 |2023-08-11 09:39:35.813|0 | |null|[73 74 72 69 6E 67]|new_topic|0|1 |2023-08-11 09:39:36.122|0 | ++---+-+-+--+---+-+ {code} It looks like mode is checked by KafkaSourceProvider, but is not used at all: https://github.com/apache/spark/blob/6b1ff22dde1ead51cbf370be6e48a802daae58b6/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L172-L178 So data is always appended to topic. was: I' trying to write batch dataframe to Kafka topic with {{mode="error"}}, but when topic exists it does not raise exception. Instead it appends data to a topic. Steps to reproduce: 1. Start Kafka: docker-compose.yml {code:yaml} version: '3.9' services: zookeeper: image: bitnami/zookeeper:3.8 environment: ALLOW_ANONYMOUS_LOGIN: 'yes' kafka: image: bitnami/kafka:latest restart: unless-stopped ports: - 9093:9093 environment: ALLOW_PLAINTEXT_LISTENER: 'yes' KAFKA_ENABLE_KRAFT: 'yes' KAFKA_CLIENT_USERS: onetl KAFKA_CLIENT_PASSWORDS: uufoFae9sahSoidoo0eagaidaoreif6z KAFKA_CFG_ZOOKEEPER_CONNECT: zookeeper:2181 KAFKA_CFG_INTER_BROKER_LISTENER_NAME: INTERNAL_PLAINTEXT_ANONYMOUS KAFKA_CFG_LISTENERS: INTERNAL_PLAINTEXT_ANONYMOUS://:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://:9093 KAFKA_CFG_ADVERTISED_LISTENERS: INTERNAL_PLAINTEXT_ANONYMOUS://kafka:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://localhost:9093 KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT,EXTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE: 'true' depends_on: - zookeeper {code} {code:bash} docker-compose up -d {code} 2. Start Spark session: {code:bash} pip install pyspark[sql]==3.4.1 {code} {code:python} from pyspark.sql import SparkSession spark = SparkSession.builder.config("spark.jars.packages", "org.apache.spark:spark-sql-kafka-0-10_2.12:3.4.1").getOrCreate() {code} 3. Create DataFrame and write it to Kafka. First write using {{mode="append"}} to create topic, then with {{mode="error"}} to raise because topic already exist: {code} df = spark.createDataFrame([{"value": "string"}]) df.write.format("kafka"
[jira] [Updated] (SPARK-44774) SaveMode.ErrorIfExists does not work with kafka-sql
[ https://issues.apache.org/jira/browse/SPARK-44774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Martynov updated SPARK-44774: --- Description: I' trying to write batch dataframe to Kafka topic with {{mode="error"}}, but when topic exists it does not raise exception. Instead it appends data to a topic. Steps to reproduce: 1. Start Kafka: docker-compose.yml {code:yaml} version: '3.9' services: zookeeper: image: bitnami/zookeeper:3.8 environment: ALLOW_ANONYMOUS_LOGIN: 'yes' kafka: image: bitnami/kafka:latest restart: unless-stopped ports: - 9093:9093 environment: ALLOW_PLAINTEXT_LISTENER: 'yes' KAFKA_ENABLE_KRAFT: 'no' KAFKA_CFG_ZOOKEEPER_CONNECT: zookeeper:2181 KAFKA_CFG_INTER_BROKER_LISTENER_NAME: INTERNAL_PLAINTEXT_ANONYMOUS KAFKA_CFG_LISTENERS: INTERNAL_PLAINTEXT_ANONYMOUS://:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://:9093 KAFKA_CFG_ADVERTISED_LISTENERS: INTERNAL_PLAINTEXT_ANONYMOUS://kafka:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://localhost:9093 KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT,EXTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE: 'true' depends_on: - zookeeper {code} {code:bash} docker-compose up -d {code} 2. Start Spark session: {code:bash} pip install pyspark[sql]==3.4.1 {code} {code:python} from pyspark.sql import SparkSession spark = SparkSession.builder.config("spark.jars.packages", "org.apache.spark:spark-sql-kafka-0-10_2.12:3.4.1").getOrCreate() {code} 3. Create DataFrame and write it to Kafka. First write using {{mode="append"}} to create topic, then with {{mode="error"}} to raise because topic already exist: {code} df = spark.createDataFrame([{"value": "string"}]) df.write.format("kafka").option("kafka.bootstrap.servers", "localhost:9093").option("topic", "new_topic").mode("append").save() # no exception is raised df.write.format("kafka").option("kafka.bootstrap.servers", "localhost:9093").option("topic", "new_topic").mode("error").save() {code} 4. Check topic content - 2 rows are added to topic instead of one: {code:python} spark.read.format("kafka").option("kafka.bootstrap.servers", "localhost:9093").option("subscribe", "new_topic").load().show(10, False) {code} {code} ++---+-+-+--+---+-+ |key |value |topic|partition|offset|timestamp |timestampType| ++---+-+-+--+---+-+ |null|[73 74 72 69 6E 67]|new_topic|0|0 |2023-08-11 09:39:35.813|0 | |null|[73 74 72 69 6E 67]|new_topic|0|1 |2023-08-11 09:39:36.122|0 | ++---+-+-+--+---+-+ {code} It looks like mode is checked by KafkaSourceProvider, but is not used at all: https://github.com/apache/spark/blob/6b1ff22dde1ead51cbf370be6e48a802daae58b6/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L172-L178 So data is always appended to topic. was: I' trying to write batch dataframe to Kafka topic with {{mode="error"}}, but when topic exists it does not raise exception. Instead it appends data to a topic. Steps to reproduce: 1. Start Kafka: docker-compose.yml {code:yaml} version: '3.9' services: zookeeper: image: bitnami/zookeeper:3.8 environment: ALLOW_ANONYMOUS_LOGIN: 'yes' kafka: image: bitnami/kafka:latest restart: unless-stopped ports: - 9093:9093 environment: ALLOW_PLAINTEXT_LISTENER: 'yes' KAFKA_ENABLE_KRAFT: 'yes' KAFKA_CFG_ZOOKEEPER_CONNECT: zookeeper:2181 KAFKA_CFG_INTER_BROKER_LISTENER_NAME: INTERNAL_PLAINTEXT_ANONYMOUS KAFKA_CFG_LISTENERS: INTERNAL_PLAINTEXT_ANONYMOUS://:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://:9093 KAFKA_CFG_ADVERTISED_LISTENERS: INTERNAL_PLAINTEXT_ANONYMOUS://kafka:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://localhost:9093 KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT,EXTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE: 'true' depends_on: - zookeeper {code} {code:bash} docker-compose up -d {code} 2. Start Spark session: {code:bash} pip install pyspark[sql]==3.4.1 {code} {code:python} from pyspark.sql import SparkSession spark = SparkSession.builder.config("spark.jars.packages", "org.apache.spark:spark-sql-kafka-0-10_2.12:3.4.1").getOrCreate() {code} 3. Create DataFrame and write it to Kafka. First write using {{mode="append"}} to create topic, then with {{mode="error"}} to raise because topic already exist: {code} df = spark.createDataFrame([{"value": "string"}]) df.write.format("kafka").option("kafka.bootstrap.servers", "localhost:9093").option("topic", "new_topic").mode("append
[jira] [Assigned] (SPARK-43477) Enable SeriesStringTests.test_string_rsplit for pandas 2.0.0.
[ https://issues.apache.org/jira/browse/SPARK-43477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43477: - Assignee: Haejoon Lee > Enable SeriesStringTests.test_string_rsplit for pandas 2.0.0. > - > > Key: SPARK-43477 > URL: https://issues.apache.org/jira/browse/SPARK-43477 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > Enable SeriesStringTests.test_string_rsplit for pandas 2.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43478) Enable SeriesStringTests.test_string_split for pandas 2.0.0.
[ https://issues.apache.org/jira/browse/SPARK-43478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43478: - Assignee: Haejoon Lee > Enable SeriesStringTests.test_string_split for pandas 2.0.0. > > > Key: SPARK-43478 > URL: https://issues.apache.org/jira/browse/SPARK-43478 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > Enable SeriesStringTests.test_string_split for pandas 2.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43478) Enable SeriesStringTests.test_string_split for pandas 2.0.0.
[ https://issues.apache.org/jira/browse/SPARK-43478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43478. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42312 [https://github.com/apache/spark/pull/42312] > Enable SeriesStringTests.test_string_split for pandas 2.0.0. > > > Key: SPARK-43478 > URL: https://issues.apache.org/jira/browse/SPARK-43478 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 4.0.0 > > > Enable SeriesStringTests.test_string_split for pandas 2.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43476) Enable SeriesStringTests.test_string_replace for pandas 2.0.0.
[ https://issues.apache.org/jira/browse/SPARK-43476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43476: - Assignee: Haejoon Lee > Enable SeriesStringTests.test_string_replace for pandas 2.0.0. > -- > > Key: SPARK-43476 > URL: https://issues.apache.org/jira/browse/SPARK-43476 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > Enable SeriesStringTests.test_string_replace for pandas 2.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43477) Enable SeriesStringTests.test_string_rsplit for pandas 2.0.0.
[ https://issues.apache.org/jira/browse/SPARK-43477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43477. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42312 [https://github.com/apache/spark/pull/42312] > Enable SeriesStringTests.test_string_rsplit for pandas 2.0.0. > - > > Key: SPARK-43477 > URL: https://issues.apache.org/jira/browse/SPARK-43477 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 4.0.0 > > > Enable SeriesStringTests.test_string_rsplit for pandas 2.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44774) SaveMode.ErrorIfExists does not work with kafka-sql
Maxim Martynov created SPARK-44774: -- Summary: SaveMode.ErrorIfExists does not work with kafka-sql Key: SPARK-44774 URL: https://issues.apache.org/jira/browse/SPARK-44774 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.1 Reporter: Maxim Martynov I' trying to write batch dataframe to Kafka topic with {{mode="error"}}, but when topic exists it does not raise exception - instead it appends data to a topic. Steps to reproduce: 1. Start Kafka: docker-compose.yml {code:yaml} version: '3.9' services: zookeeper: image: bitnami/zookeeper:3.8 environment: ALLOW_ANONYMOUS_LOGIN: 'yes' kafka: image: bitnami/kafka:latest restart: unless-stopped ports: - 9093:9093 environment: ALLOW_PLAINTEXT_LISTENER: 'yes' KAFKA_ENABLE_KRAFT: 'yes' KAFKA_CLIENT_USERS: onetl KAFKA_CLIENT_PASSWORDS: uufoFae9sahSoidoo0eagaidaoreif6z KAFKA_CFG_ZOOKEEPER_CONNECT: zookeeper:2181 KAFKA_CFG_INTER_BROKER_LISTENER_NAME: INTERNAL_PLAINTEXT_ANONYMOUS KAFKA_CFG_LISTENERS: INTERNAL_PLAINTEXT_ANONYMOUS://:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://:9093 KAFKA_CFG_ADVERTISED_LISTENERS: INTERNAL_PLAINTEXT_ANONYMOUS://kafka:9092,EXTERNAL_PLAINTEXT_ANONYMOUS://localhost:9093 KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT,EXTERNAL_PLAINTEXT_ANONYMOUS:PLAINTEXT KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE: 'true' depends_on: - zookeeper {code} {code:bash} docker-compose up -d {code} 2. Start Spark session: {code:bash} pip install pyspark[sql]==3.4.1 {code} {code:python} from pyspark.sql import SparkSession spark = SparkSession.builder.config("spark.jars.packages", "org.apache.spark:spark-sql-kafka-0-10_2.12:3.4.1").getOrCreate() {code} 3. Create DataFrame and write it to Kafka. First write using {{mode="append"}} to create topic, then with {{mode="error"}} to raise because topic already exist: {code} df = spark.createDataFrame([{"value": "string"}]) df.write.format("kafka").option("kafka.bootstrap.servers", "localhost:9093").option("topic", "new_topic").mode("append").save() # no exception is raised df.write.format("kafka").option("kafka.bootstrap.servers", "localhost:9093").option("topic", "new_topic").mode("error").save() {code} 4. Check topic content - 2 rows are added to topic instead of one: {code:python} spark.read.format("kafka").option("kafka.bootstrap.servers", "localhost:9093").option("subscribe", "new_topic").load().show(10, False) {code} {code} ++---+-+-+--+---+-+ |key |value |topic|partition|offset|timestamp |timestampType| ++---+-+-+--+---+-+ |null|[73 74 72 69 6E 67]|new_topic|0|0 |2023-08-11 09:39:35.813|0 | |null|[73 74 72 69 6E 67]|new_topic|0|1 |2023-08-11 09:39:36.122|0 | ++---+-+-+--+---+-+ {code} It looks like mode is checked by KafkaSourceProvider, but is not used at all: https://github.com/apache/spark/blob/6b1ff22dde1ead51cbf370be6e48a802daae58b6/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L172-L178 So data is always appended to topic. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43476) Enable SeriesStringTests.test_string_replace for pandas 2.0.0.
[ https://issues.apache.org/jira/browse/SPARK-43476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43476. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42312 [https://github.com/apache/spark/pull/42312] > Enable SeriesStringTests.test_string_replace for pandas 2.0.0. > -- > > Key: SPARK-43476 > URL: https://issues.apache.org/jira/browse/SPARK-43476 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 4.0.0 > > > Enable SeriesStringTests.test_string_replace for pandas 2.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44731) Support 'spark.sql.timestampType' in Python Spark Connect client
[ https://issues.apache.org/jira/browse/SPARK-44731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-44731. --- Fix Version/s: 3.5.0 4.0.0 Resolution: Fixed Issue resolved by pull request 42445 [https://github.com/apache/spark/pull/42445] > Support 'spark.sql.timestampType' in Python Spark Connect client > > > Key: SPARK-44731 > URL: https://issues.apache.org/jira/browse/SPARK-44731 > Project: Spark > Issue Type: Task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.5.0, 4.0.0 > > > If Spark session enables 'spark.sql.timestampType', datetime should be > inferred as TimestampNTZ type. However, this isn't implemented yet in Python > client side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44731) Support 'spark.sql.timestampType' in Python Spark Connect client
[ https://issues.apache.org/jira/browse/SPARK-44731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-44731: - Assignee: Hyukjin Kwon > Support 'spark.sql.timestampType' in Python Spark Connect client > > > Key: SPARK-44731 > URL: https://issues.apache.org/jira/browse/SPARK-44731 > Project: Spark > Issue Type: Task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > If Spark session enables 'spark.sql.timestampType', datetime should be > inferred as TimestampNTZ type. However, this isn't implemented yet in Python > client side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44773) Code-gen CodegenFallback expression in WholeStageCodegen if possible
Wan Kun created SPARK-44773: --- Summary: Code-gen CodegenFallback expression in WholeStageCodegen if possible Key: SPARK-44773 URL: https://issues.apache.org/jira/browse/SPARK-44773 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0 Reporter: Wan Kun Now both WholeStageCodegen framework and SubExpressionElimination framework does not support CodegenFallback expression, but the CodegenFallback expression which contains nullSafeEval method could gen-code just like common expressions, now they are always be executed in a new SpecificUnsafeProjection class, and we can not eliminate the sub expressions. For example: SQL: {code:sql} SELECT from_json(regexp_replace(s, 'a', 'x'), 'x INT, b DOUBLE').x, from_json(regexp_replace(s, 'a', 'x'), 'x INT, b DOUBLE').b FROM values('{"a":1, "b":0.8}') t(s) {code} plan: {code:java} *(1) Project [from_json(StructField(x,IntegerType,true), regexp_replace(s#218, a, x, 1), Some(America/Los_Angeles)).x AS from_json(regexp_replace(s, a, x, 1)).x#219, from_json(StructField(b,DoubleType,true), regexp_replace(s#218, a, x, 1), Some(America/Los_Angeles)).b AS from_json(regexp_replace(s, a, x, 1)).b#220] +- *(1) LocalTableScan [s#218] {code} Due to expression org.apache.spark.sql.catalyst.expressions.JsonToStructs is CodegenFallback expression, so we can not reuse the result of {*}regexp_replace(s, 'a', 'x'){*}. We can support expression org.apache.spark.sql.catalyst.expressions.JsonToStructs code-gen in WholeStageCodegen framework, and then reuse the result of {*}regexp_replace(s, 'a', 'x'){*}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44770) Add a displayOrder variable to WebUITab to specify the order in which tabs appear
[ https://issues.apache.org/jira/browse/SPARK-44770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-44770: Assignee: Jason Li > Add a displayOrder variable to WebUITab to specify the order in which tabs > appear > - > > Key: SPARK-44770 > URL: https://issues.apache.org/jira/browse/SPARK-44770 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.5.1 >Reporter: Jason Li >Assignee: Jason Li >Priority: Major > > Add a displayOrder variable to WebUITab to specify the order in which tabs > appear. Currently, the tabs are ordered by when they get attached, which > isn't always desired. The default is MIN_VALUE, meaning if it's not > specified, it will appear in the order added before any tabs with a > non-default displayOrder. For example, we would like to have the SQL Tab > appear before the Connect tab; however, based on the code flow, the Connect > tab will be attached first and with the current logic, that tab would also > appear first. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44770) Add a displayOrder variable to WebUITab to specify the order in which tabs appear
[ https://issues.apache.org/jira/browse/SPARK-44770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-44770. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42442 [https://github.com/apache/spark/pull/42442] > Add a displayOrder variable to WebUITab to specify the order in which tabs > appear > - > > Key: SPARK-44770 > URL: https://issues.apache.org/jira/browse/SPARK-44770 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.5.1 >Reporter: Jason Li >Assignee: Jason Li >Priority: Major > Fix For: 4.0.0 > > > Add a displayOrder variable to WebUITab to specify the order in which tabs > appear. Currently, the tabs are ordered by when they get attached, which > isn't always desired. The default is MIN_VALUE, meaning if it's not > specified, it will appear in the order added before any tabs with a > non-default displayOrder. For example, we would like to have the SQL Tab > appear before the Connect tab; however, based on the code flow, the Connect > tab will be attached first and with the current logic, that tab would also > appear first. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44727) Improve the error message for dynamic allocation conditions
[ https://issues.apache.org/jira/browse/SPARK-44727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-44727: Assignee: Cheng Pan > Improve the error message for dynamic allocation conditions > --- > > Key: SPARK-44727 > URL: https://issues.apache.org/jira/browse/SPARK-44727 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44727) Improve the error message for dynamic allocation conditions
[ https://issues.apache.org/jira/browse/SPARK-44727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-44727. -- Fix Version/s: 3.5.0 4.0.0 Resolution: Fixed Issue resolved by pull request 42404 [https://github.com/apache/spark/pull/42404] > Improve the error message for dynamic allocation conditions > --- > > Key: SPARK-44727 > URL: https://issues.apache.org/jira/browse/SPARK-44727 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Fix For: 3.5.0, 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44737) Should not display json format errors on SQL page for non-SparkThrowables
[ https://issues.apache.org/jira/browse/SPARK-44737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-44737. -- Fix Version/s: 3.5.0 4.0.0 Resolution: Fixed Issue resolved by pull request 42407 [https://github.com/apache/spark/pull/42407] > Should not display json format errors on SQL page for non-SparkThrowables > - > > Key: SPARK-44737 > URL: https://issues.apache.org/jira/browse/SPARK-44737 > Project: Spark > Issue Type: Bug > Components: SQL, Web UI >Affects Versions: 3.5.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.5.0, 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44737) Should not display json format errors on SQL page for non-SparkThrowables
[ https://issues.apache.org/jira/browse/SPARK-44737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-44737: Assignee: Kent Yao > Should not display json format errors on SQL page for non-SparkThrowables > - > > Key: SPARK-44737 > URL: https://issues.apache.org/jira/browse/SPARK-44737 > Project: Spark > Issue Type: Bug > Components: SQL, Web UI >Affects Versions: 3.5.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org