[jira] [Resolved] (SPARK-39417) Handle Null partition values in PartitioningUtils
[ https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-39417. Fix Version/s: 3.3.0 3.4.0 Resolution: Fixed > Handle Null partition values in PartitioningUtils > - > > Key: SPARK-39417 > URL: https://issues.apache.org/jira/browse/SPARK-39417 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Prashant Singh >Assignee: Prashant Singh >Priority: Major > Fix For: 3.3.0, 3.4.0 > > > partitions with null values we get a NPE on partition discovery, earlier we > use to get `DEFAULT_PARTITION_NAME` > > {quote} [info] java.lang.NullPointerException: > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362) > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355) > [info] at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > [info] at scala.collection.Iterator.foreach(Iterator.scala:943) > [info] at scala.collection.Iterator.foreach$(Iterator.scala:943){quote} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39417) Handle Null partition values in PartitioningUtils
[ https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-39417: --- Assignee: Prashant Singh > Handle Null partition values in PartitioningUtils > - > > Key: SPARK-39417 > URL: https://issues.apache.org/jira/browse/SPARK-39417 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Prashant Singh >Assignee: Prashant Singh >Priority: Major > > partitions with null values we get a NPE on partition discovery, earlier we > use to get `DEFAULT_PARTITION_NAME` > > {quote} [info] java.lang.NullPointerException: > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362) > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355) > [info] at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > [info] at scala.collection.Iterator.foreach(Iterator.scala:943) > [info] at scala.collection.Iterator.foreach$(Iterator.scala:943){quote} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39415) Local mode supports HadoopDelegationTokenManager
[ https://issues.apache.org/jira/browse/SPARK-39415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dzcxzl resolved SPARK-39415. Resolution: Duplicate > Local mode supports HadoopDelegationTokenManager > > > Key: SPARK-39415 > URL: https://issues.apache.org/jira/browse/SPARK-39415 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: dzcxzl >Priority: Minor > > Now in the kerberos environment, using spark-submit --master=local > --proxy-user xxx cannot access Hive Meta Store, and using --keytab will not > automatically relogin. > {code:java} > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1743) > at > org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:483) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39421) Sphinx build fails with "node class 'meta' is already registered, its visitors will be overridden"
[ https://issues.apache.org/jira/browse/SPARK-39421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-39421. -- Fix Version/s: 3.3.0 3.2.2 3.4.0 Assignee: Hyukjin Kwon Resolution: Fixed Fixed in https://github.com/apache/spark/pull/36813 > Sphinx build fails with "node class 'meta' is already registered, its > visitors will be overridden" > -- > > Key: SPARK-39421 > URL: https://issues.apache.org/jira/browse/SPARK-39421 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.0.3, 3.1.2, 3.2.1, 3.3.0, 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.3.0, 3.2.2, 3.4.0 > > > {code} > Moving to python/docs directory and building sphinx. > Running Sphinx v3.0.4 > WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It > is required to set this environment variable to '1' in both driver and > executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you > but it does not work if there is a Spark context already launched. > /__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: > Warning: Latest version of pandas(>=1.4.0) is required to generate the > documentation; however, your version was 1.3.5 > warnings.warn( > Warning, treated as error: > node class 'meta' is already registered, its visitors will be overridden > make: *** [Makefile:35: html] Error 2 > > Jekyll 4.2.1 Please append `--trace` to the `build` command > for any additional information or backtrace. > > {code} > Sphinx build fails apparently with the latest docutils (see also > https://issues.apache.org/jira/browse/FLINK-24662). we should pin the version. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39421) Sphinx build fails with "node class 'meta' is already registered, its visitors will be overridden"
[ https://issues.apache.org/jira/browse/SPARK-39421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-39421: - Affects Version/s: (was: 3.1.2) (was: 3.0.3) > Sphinx build fails with "node class 'meta' is already registered, its > visitors will be overridden" > -- > > Key: SPARK-39421 > URL: https://issues.apache.org/jira/browse/SPARK-39421 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.2.1, 3.3.0, 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.3.0, 3.2.2, 3.4.0 > > > {code} > Moving to python/docs directory and building sphinx. > Running Sphinx v3.0.4 > WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It > is required to set this environment variable to '1' in both driver and > executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you > but it does not work if there is a Spark context already launched. > /__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: > Warning: Latest version of pandas(>=1.4.0) is required to generate the > documentation; however, your version was 1.3.5 > warnings.warn( > Warning, treated as error: > node class 'meta' is already registered, its visitors will be overridden > make: *** [Makefile:35: html] Error 2 > > Jekyll 4.2.1 Please append `--trace` to the `build` command > for any additional information or backtrace. > > {code} > Sphinx build fails apparently with the latest docutils (see also > https://issues.apache.org/jira/browse/FLINK-24662). we should pin the version. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39425) Add migration guide for PS behavior changes
[ https://issues.apache.org/jira/browse/SPARK-39425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551954#comment-17551954 ] Apache Spark commented on SPARK-39425: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/36816 > Add migration guide for PS behavior changes > --- > > Key: SPARK-39425 > URL: https://issues.apache.org/jira/browse/SPARK-39425 > Project: Spark > Issue Type: Sub-task > Components: Documentation, Pandas API on Spark >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39425) Add migration guide for PS behavior changes
[ https://issues.apache.org/jira/browse/SPARK-39425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39425: Assignee: Apache Spark > Add migration guide for PS behavior changes > --- > > Key: SPARK-39425 > URL: https://issues.apache.org/jira/browse/SPARK-39425 > Project: Spark > Issue Type: Sub-task > Components: Documentation, Pandas API on Spark >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39425) Add migration guide for PS behavior changes
[ https://issues.apache.org/jira/browse/SPARK-39425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551953#comment-17551953 ] Apache Spark commented on SPARK-39425: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/36816 > Add migration guide for PS behavior changes > --- > > Key: SPARK-39425 > URL: https://issues.apache.org/jira/browse/SPARK-39425 > Project: Spark > Issue Type: Sub-task > Components: Documentation, Pandas API on Spark >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39425) Add migration guide for PS behavior changes
[ https://issues.apache.org/jira/browse/SPARK-39425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39425: Assignee: (was: Apache Spark) > Add migration guide for PS behavior changes > --- > > Key: SPARK-39425 > URL: https://issues.apache.org/jira/browse/SPARK-39425 > Project: Spark > Issue Type: Sub-task > Components: Documentation, Pandas API on Spark >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39426) Subquery star select creates broken plan in case of self join
Denis created SPARK-39426: - Summary: Subquery star select creates broken plan in case of self join Key: SPARK-39426 URL: https://issues.apache.org/jira/browse/SPARK-39426 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.1 Reporter: Denis Subquery star select creates broken plan in case of self join How to reproduce: {code:java} import spark.implicits._ spark.sparkContext.setCheckpointDir(Files.createTempDirectory("some-prefix").toFile.toString) val frame = Seq(1).toDF("id").checkpoint() val joined = frame .join(frame, Seq("id"), "left") .select("id") joined .join(joined, Seq("id"), "left") .as("a") .select("a.*"){code} This query throws exception: {code:java} Exception in thread "main" org.apache.spark.sql.AnalysisException: Resolved attribute(s) id#7 missing from id#10,id#11 in operator !Project [id#7, id#10]. Attribute(s) with the same name appear in the operation: id. Please check if the right attribute(s) are used.; Project [id#10, id#4] +- SubqueryAlias a +- Project [id#10, id#4] +- Join LeftOuter, (id#4 = id#10) :- Project [id#4] : +- Project [id#7, id#4] : +- Join LeftOuter, (id#4 = id#7) : :- LogicalRDD [id#4], false : +- LogicalRDD [id#7], false +- Project [id#10] +- !Project [id#7, id#10] +- Join LeftOuter, (id#10 = id#11) :- LogicalRDD [id#10], false +- LogicalRDD [id#11], false at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:51) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:50) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:182) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:471) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:262) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:262) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:262) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:262) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:262) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:262) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:262) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:262) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:262) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:262) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:262) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:262) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode
[jira] [Commented] (SPARK-39424) `Run documentation build` failed in master branch GA
[ https://issues.apache.org/jira/browse/SPARK-39424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551934#comment-17551934 ] Yang Jie commented on SPARK-39424: -- Thanks ~ > `Run documentation build` failed in master branch GA > > > Key: SPARK-39424 > URL: https://issues.apache.org/jira/browse/SPARK-39424 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > {code:java} > /__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: > Warning: Latest version of pandas(>=1.4.0) is required to generate the > documentation; however, your version was 1.3.5 > warnings.warn( > Warning, treated as error: > node class 'meta' is already registered, its visitors will be overridden > make: *** [Makefile:35: html] Error 2 > > Jekyll 4.2.1 Please append `--trace` to the `build` command > for any additional information or backtrace. > > /__w/spark/spark/docs/_plugins/copy_api_dirs.rb:130:in `': > Python doc generation failed (RuntimeError) > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in > `require' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in > `block in require_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in > `require_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in > `block in require_plugin_files' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in > `require_plugin_files' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in > `conscientious_require' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in > `setup' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:36:in > `initialize' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in > `new' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in > `process' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in > `block in process_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in > `process_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:18:in > `block (2 levels) in init_with_program' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in > `block in execute' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in > `execute' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/program.rb:44:in > `go' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary.rb:21:in > `program' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/exe/jekyll:15:in > `' > from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in > `load' > from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in > `' > Error: Process completed with exit code 1. {code} > The latest builds seem to have failed: > * [https://github.com/apache/spark/runs/6803919840?check_suite_focus=true] > * [https://github.com/apache/spark/runs/6799560292?check_suite_focus=true] > * [https://github.com/apache/spark/runs/6801448545?check_suite_focus=true] > * [https://github.com/apache/spark/runs/6803919840?check_suite_focus=true] > > -- This messag
[jira] [Resolved] (SPARK-39424) `Run documentation build` failed in master branch GA
[ https://issues.apache.org/jira/browse/SPARK-39424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-39424. -- Resolution: Duplicate Thanks! I actually already created a JIRA and fix :-) > `Run documentation build` failed in master branch GA > > > Key: SPARK-39424 > URL: https://issues.apache.org/jira/browse/SPARK-39424 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > {code:java} > /__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: > Warning: Latest version of pandas(>=1.4.0) is required to generate the > documentation; however, your version was 1.3.5 > warnings.warn( > Warning, treated as error: > node class 'meta' is already registered, its visitors will be overridden > make: *** [Makefile:35: html] Error 2 > > Jekyll 4.2.1 Please append `--trace` to the `build` command > for any additional information or backtrace. > > /__w/spark/spark/docs/_plugins/copy_api_dirs.rb:130:in `': > Python doc generation failed (RuntimeError) > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in > `require' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in > `block in require_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in > `require_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in > `block in require_plugin_files' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in > `require_plugin_files' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in > `conscientious_require' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in > `setup' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:36:in > `initialize' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in > `new' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in > `process' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in > `block in process_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in > `process_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:18:in > `block (2 levels) in init_with_program' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in > `block in execute' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in > `execute' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/program.rb:44:in > `go' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary.rb:21:in > `program' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/exe/jekyll:15:in > `' > from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in > `load' > from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in > `' > Error: Process completed with exit code 1. {code} > The latest builds seem to have failed: > * [https://github.com/apache/spark/runs/6803919840?check_suite_focus=true] > * [https://github.com/apache/spark/runs/6799560292?check_suite_focus=true] > * [https://github.com/apache/spark/runs/6801448545?check_suite_focus=true] > * [https://github.com/apache/spark/runs/6803919840?check_suite_focus=true]
[jira] [Created] (SPARK-39425) Add migration guide for PS behavior changes
Yikun Jiang created SPARK-39425: --- Summary: Add migration guide for PS behavior changes Key: SPARK-39425 URL: https://issues.apache.org/jira/browse/SPARK-39425 Project: Spark Issue Type: Sub-task Components: Documentation, Pandas API on Spark Affects Versions: 3.4.0 Reporter: Yikun Jiang -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39424) `Run documentation build` failed in master branch GA
[ https://issues.apache.org/jira/browse/SPARK-39424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551925#comment-17551925 ] Yang Jie commented on SPARK-39424: -- cc [~hyukjin.kwon] > `Run documentation build` failed in master branch GA > > > Key: SPARK-39424 > URL: https://issues.apache.org/jira/browse/SPARK-39424 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > {code:java} > /__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: > Warning: Latest version of pandas(>=1.4.0) is required to generate the > documentation; however, your version was 1.3.5 > warnings.warn( > Warning, treated as error: > node class 'meta' is already registered, its visitors will be overridden > make: *** [Makefile:35: html] Error 2 > > Jekyll 4.2.1 Please append `--trace` to the `build` command > for any additional information or backtrace. > > /__w/spark/spark/docs/_plugins/copy_api_dirs.rb:130:in `': > Python doc generation failed (RuntimeError) > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in > `require' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in > `block in require_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in > `require_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in > `block in require_plugin_files' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in > `require_plugin_files' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in > `conscientious_require' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in > `setup' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:36:in > `initialize' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in > `new' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in > `process' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in > `block in process_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in > `process_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:18:in > `block (2 levels) in init_with_program' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in > `block in execute' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in > `execute' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/program.rb:44:in > `go' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary.rb:21:in > `program' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/exe/jekyll:15:in > `' > from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in > `load' > from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in > `' > Error: Process completed with exit code 1. {code} > The latest builds seem to have failed: > * [https://github.com/apache/spark/runs/6803919840?check_suite_focus=true] > * [https://github.com/apache/spark/runs/6799560292?check_suite_focus=true] > * [https://github.com/apache/spark/runs/6801448545?check_suite_focus=true] > * [https://github.com/apache/spark/runs/6803919840?check_suite_focus=true] > > --
[jira] [Created] (SPARK-39424) `Run documentation build` failed in master branch GA
Yang Jie created SPARK-39424: Summary: `Run documentation build` failed in master branch GA Key: SPARK-39424 URL: https://issues.apache.org/jira/browse/SPARK-39424 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.4.0 Reporter: Yang Jie {code:java} /__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: Warning: Latest version of pandas(>=1.4.0) is required to generate the documentation; however, your version was 1.3.5 warnings.warn( Warning, treated as error: node class 'meta' is already registered, its visitors will be overridden make: *** [Makefile:35: html] Error 2 Jekyll 4.2.1 Please append `--trace` to the `build` command for any additional information or backtrace. /__w/spark/spark/docs/_plugins/copy_api_dirs.rb:130:in `': Python doc generation failed (RuntimeError) from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in `require' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in `block in require_with_graceful_fail' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in `each' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in `require_with_graceful_fail' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in `block in require_plugin_files' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in `each' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in `require_plugin_files' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in `conscientious_require' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in `setup' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:36:in `initialize' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in `new' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in `process' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in `block in process_with_graceful_fail' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in `each' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in `process_with_graceful_fail' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:18:in `block (2 levels) in init_with_program' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in `block in execute' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in `each' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in `execute' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/program.rb:44:in `go' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary.rb:21:in `program' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/exe/jekyll:15:in `' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in `load' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in `' Error: Process completed with exit code 1. {code} The latest builds seem to have failed: * [https://github.com/apache/spark/runs/6803919840?check_suite_focus=true] * [https://github.com/apache/spark/runs/6799560292?check_suite_focus=true] * [https://github.com/apache/spark/runs/6801448545?check_suite_focus=true] * [https://github.com/apache/spark/runs/6803919840?check_suite_focus=true] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39236) Make CreateTable API and ListTables API compatible
[ https://issues.apache.org/jira/browse/SPARK-39236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-39236. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36586 [https://github.com/apache/spark/pull/36586] > Make CreateTable API and ListTables API compatible > --- > > Key: SPARK-39236 > URL: https://issues.apache.org/jira/browse/SPARK-39236 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Fix For: 3.4.0 > > > https://github.com/apache/spark/blob/c6dccc7dd412a95007f5bb2584d69b85ff9ebf8e/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala#L364 > https://github.com/apache/spark/blob/c6dccc7dd412a95007f5bb2584d69b85ff9ebf8e/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala#L99 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39236) Make CreateTable API and ListTables API compatible
[ https://issues.apache.org/jira/browse/SPARK-39236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-39236: --- Assignee: Rui Wang > Make CreateTable API and ListTables API compatible > --- > > Key: SPARK-39236 > URL: https://issues.apache.org/jira/browse/SPARK-39236 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > > https://github.com/apache/spark/blob/c6dccc7dd412a95007f5bb2584d69b85ff9ebf8e/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala#L364 > https://github.com/apache/spark/blob/c6dccc7dd412a95007f5bb2584d69b85ff9ebf8e/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala#L99 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39423) Spark Sql create table using jbdc add preSql option
[ https://issues.apache.org/jira/browse/SPARK-39423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WeiNan Zhao updated SPARK-39423: Description: In my recent process of using Spark Sql, I tried to use create using jdbc to create a spark table that I need to use, but I may need to consider that before inserting, I need to delete some of the previous data, so I want to be able to expose This option, the use example can refer to the following picture. And I can submit a pullrequest to solve this problem. please assign to me. Thanks. !image-2022-06-09-10-48-05-347.png! was: In my recent process of using Spark Sql, I tried to use create using jdbc to create a spark table that I need to use, but I may need to consider that before inserting, I need to delete some of the previous data, so I want to be able to expose This option, the use example can refer to the following picture. !image-2022-06-09-10-47-25-558.png! > Spark Sql create table using jbdc add preSql option > --- > > Key: SPARK-39423 > URL: https://issues.apache.org/jira/browse/SPARK-39423 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.1 >Reporter: WeiNan Zhao >Priority: Major > Attachments: image-2022-06-09-10-48-05-347.png > > > In my recent process of using Spark Sql, I tried to use create using jdbc to > create a spark table that I need to use, but I may need to consider that > before inserting, I need to delete some of the previous data, so I want to be > able to expose This option, the use example can refer to the following > picture. > And I can submit a pullrequest to solve this problem. please assign to me. > Thanks. > !image-2022-06-09-10-48-05-347.png! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39423) Spark Sql create table using jbdc add preSql option
[ https://issues.apache.org/jira/browse/SPARK-39423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WeiNan Zhao updated SPARK-39423: Attachment: image-2022-06-09-10-48-05-347.png > Spark Sql create table using jbdc add preSql option > --- > > Key: SPARK-39423 > URL: https://issues.apache.org/jira/browse/SPARK-39423 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.1 >Reporter: WeiNan Zhao >Priority: Major > Attachments: image-2022-06-09-10-48-05-347.png > > > In my recent process of using Spark Sql, I tried to use create using jdbc to > create a spark table that I need to use, but I may need to consider that > before inserting, I need to delete some of the previous data, so I want to be > able to expose This option, the use example can refer to the following > picture. > > !image-2022-06-09-10-47-25-558.png! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39423) Spark Sql create table using jbdc add preSql option
WeiNan Zhao created SPARK-39423: --- Summary: Spark Sql create table using jbdc add preSql option Key: SPARK-39423 URL: https://issues.apache.org/jira/browse/SPARK-39423 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.1 Reporter: WeiNan Zhao In my recent process of using Spark Sql, I tried to use create using jdbc to create a spark table that I need to use, but I may need to consider that before inserting, I need to delete some of the previous data, so I want to be able to expose This option, the use example can refer to the following picture. !image-2022-06-09-10-47-25-558.png! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37670) Support predicate pushdown and column pruning for de-duped CTEs
[ https://issues.apache.org/jira/browse/SPARK-37670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551914#comment-17551914 ] Apache Spark commented on SPARK-37670: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/36815 > Support predicate pushdown and column pruning for de-duped CTEs > --- > > Key: SPARK-37670 > URL: https://issues.apache.org/jira/browse/SPARK-37670 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wei Xue >Assignee: Wei Xue >Priority: Major > Fix For: 3.3.0, 3.2.2 > > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39422) SHOW CREATE TABLE should suggest 'AS SERDE' for Hive tables with unsupported serde configurations
[ https://issues.apache.org/jira/browse/SPARK-39422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551909#comment-17551909 ] Apache Spark commented on SPARK-39422: -- User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/36814 > SHOW CREATE TABLE should suggest 'AS SERDE' for Hive tables with unsupported > serde configurations > - > > Key: SPARK-39422 > URL: https://issues.apache.org/jira/browse/SPARK-39422 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Minor > > If you run `SHOW CREATE TABLE` against a Hive table which uses an unsupported > Serde configuration, Spark will return an error message like > {code:java} > org.apache.spark.sql.AnalysisException: Failed to execute SHOW CREATE TABLE > against table rcFileTable, which is created by Hive and uses the following > unsupported serde configuration > SERDE: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe > INPUTFORMAT: org.apache.hadoop.hive.ql.io.RCFileInputFormat OUTPUTFORMAT: > org.apache.hadoop.hive.ql.io.RCFileOutputFormat {code} > which is confusing to end users. > In this situation, I think the error should suggest `SHOW CREATE TABLE ... AS > SERDE` to users (similar to other error messages in this code path). -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39422) SHOW CREATE TABLE should suggest 'AS SERDE' for Hive tables with unsupported serde configurations
[ https://issues.apache.org/jira/browse/SPARK-39422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551907#comment-17551907 ] Apache Spark commented on SPARK-39422: -- User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/36814 > SHOW CREATE TABLE should suggest 'AS SERDE' for Hive tables with unsupported > serde configurations > - > > Key: SPARK-39422 > URL: https://issues.apache.org/jira/browse/SPARK-39422 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Minor > > If you run `SHOW CREATE TABLE` against a Hive table which uses an unsupported > Serde configuration, Spark will return an error message like > {code:java} > org.apache.spark.sql.AnalysisException: Failed to execute SHOW CREATE TABLE > against table rcFileTable, which is created by Hive and uses the following > unsupported serde configuration > SERDE: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe > INPUTFORMAT: org.apache.hadoop.hive.ql.io.RCFileInputFormat OUTPUTFORMAT: > org.apache.hadoop.hive.ql.io.RCFileOutputFormat {code} > which is confusing to end users. > In this situation, I think the error should suggest `SHOW CREATE TABLE ... AS > SERDE` to users (similar to other error messages in this code path). -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39422) SHOW CREATE TABLE should suggest 'AS SERDE' for Hive tables with unsupported serde configurations
Josh Rosen created SPARK-39422: -- Summary: SHOW CREATE TABLE should suggest 'AS SERDE' for Hive tables with unsupported serde configurations Key: SPARK-39422 URL: https://issues.apache.org/jira/browse/SPARK-39422 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Environment: If you run `SHOW CREATE TABLE` against a Hive table which uses an unsupported Serde configuration, Spark will return an error message like {code:java} org.apache.spark.sql.AnalysisException: Failed to execute SHOW CREATE TABLE against table rcFileTable, which is created by Hive and uses the following unsupported serde configuration SERDE: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe INPUTFORMAT: org.apache.hadoop.hive.ql.io.RCFileInputFormat OUTPUTFORMAT: org.apache.hadoop.hive.ql.io.RCFileOutputFormat {code} which is confusing to end users. In this situation, I think the error should suggest `SHOW CREATE TABLE ... AS SERDE` to users (similar to other error messages in this code path). Reporter: Josh Rosen -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39422) SHOW CREATE TABLE should suggest 'AS SERDE' for Hive tables with unsupported serde configurations
[ https://issues.apache.org/jira/browse/SPARK-39422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-39422: --- Priority: Minor (was: Major) > SHOW CREATE TABLE should suggest 'AS SERDE' for Hive tables with unsupported > serde configurations > - > > Key: SPARK-39422 > URL: https://issues.apache.org/jira/browse/SPARK-39422 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Minor > > If you run `SHOW CREATE TABLE` against a Hive table which uses an unsupported > Serde configuration, Spark will return an error message like > {code:java} > org.apache.spark.sql.AnalysisException: Failed to execute SHOW CREATE TABLE > against table rcFileTable, which is created by Hive and uses the following > unsupported serde configuration > SERDE: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe > INPUTFORMAT: org.apache.hadoop.hive.ql.io.RCFileInputFormat OUTPUTFORMAT: > org.apache.hadoop.hive.ql.io.RCFileOutputFormat {code} > which is confusing to end users. > In this situation, I think the error should suggest `SHOW CREATE TABLE ... AS > SERDE` to users (similar to other error messages in this code path). -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39422) SHOW CREATE TABLE should suggest 'AS SERDE' for Hive tables with unsupported serde configurations
[ https://issues.apache.org/jira/browse/SPARK-39422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-39422: --- Description: If you run `SHOW CREATE TABLE` against a Hive table which uses an unsupported Serde configuration, Spark will return an error message like {code:java} org.apache.spark.sql.AnalysisException: Failed to execute SHOW CREATE TABLE against table rcFileTable, which is created by Hive and uses the following unsupported serde configuration SERDE: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe INPUTFORMAT: org.apache.hadoop.hive.ql.io.RCFileInputFormat OUTPUTFORMAT: org.apache.hadoop.hive.ql.io.RCFileOutputFormat {code} which is confusing to end users. In this situation, I think the error should suggest `SHOW CREATE TABLE ... AS SERDE` to users (similar to other error messages in this code path). > SHOW CREATE TABLE should suggest 'AS SERDE' for Hive tables with unsupported > serde configurations > - > > Key: SPARK-39422 > URL: https://issues.apache.org/jira/browse/SPARK-39422 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Major > > If you run `SHOW CREATE TABLE` against a Hive table which uses an unsupported > Serde configuration, Spark will return an error message like > {code:java} > org.apache.spark.sql.AnalysisException: Failed to execute SHOW CREATE TABLE > against table rcFileTable, which is created by Hive and uses the following > unsupported serde configuration > SERDE: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe > INPUTFORMAT: org.apache.hadoop.hive.ql.io.RCFileInputFormat OUTPUTFORMAT: > org.apache.hadoop.hive.ql.io.RCFileOutputFormat {code} > which is confusing to end users. > In this situation, I think the error should suggest `SHOW CREATE TABLE ... AS > SERDE` to users (similar to other error messages in this code path). -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39422) SHOW CREATE TABLE should suggest 'AS SERDE' for Hive tables with unsupported serde configurations
[ https://issues.apache.org/jira/browse/SPARK-39422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen reassigned SPARK-39422: -- Assignee: Josh Rosen > SHOW CREATE TABLE should suggest 'AS SERDE' for Hive tables with unsupported > serde configurations > - > > Key: SPARK-39422 > URL: https://issues.apache.org/jira/browse/SPARK-39422 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 > Environment: If you run `SHOW CREATE TABLE` against a Hive table > which uses an unsupported Serde configuration, Spark will return an error > message like > {code:java} > org.apache.spark.sql.AnalysisException: Failed to execute SHOW CREATE TABLE > against table rcFileTable, which is created by Hive and uses the following > unsupported serde configuration > SERDE: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe > INPUTFORMAT: org.apache.hadoop.hive.ql.io.RCFileInputFormat OUTPUTFORMAT: > org.apache.hadoop.hive.ql.io.RCFileOutputFormat {code} > which is confusing to end users. > In this situation, I think the error should suggest `SHOW CREATE TABLE ... AS > SERDE` to users (similar to other error messages in this code path). >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39422) SHOW CREATE TABLE should suggest 'AS SERDE' for Hive tables with unsupported serde configurations
[ https://issues.apache.org/jira/browse/SPARK-39422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-39422: --- Environment: (was: If you run `SHOW CREATE TABLE` against a Hive table which uses an unsupported Serde configuration, Spark will return an error message like {code:java} org.apache.spark.sql.AnalysisException: Failed to execute SHOW CREATE TABLE against table rcFileTable, which is created by Hive and uses the following unsupported serde configuration SERDE: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe INPUTFORMAT: org.apache.hadoop.hive.ql.io.RCFileInputFormat OUTPUTFORMAT: org.apache.hadoop.hive.ql.io.RCFileOutputFormat {code} which is confusing to end users. In this situation, I think the error should suggest `SHOW CREATE TABLE ... AS SERDE` to users (similar to other error messages in this code path).) > SHOW CREATE TABLE should suggest 'AS SERDE' for Hive tables with unsupported > serde configurations > - > > Key: SPARK-39422 > URL: https://issues.apache.org/jira/browse/SPARK-39422 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39349) Add a CheckError() method to SparkFunSuite
[ https://issues.apache.org/jira/browse/SPARK-39349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-39349: --- Assignee: Serge Rielau > Add a CheckError() method to SparkFunSuite > -- > > Key: SPARK-39349 > URL: https://issues.apache.org/jira/browse/SPARK-39349 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.1 >Reporter: Serge Rielau >Assignee: Serge Rielau >Priority: Major > > We want to standardize on a generic way to QA error messages without impeding > the ability to enhance/rework error messages. > CheckError() allows for efficient asserting on the "payload": > * Errorclass, subclass > * SQLState > * Parameters (both names and values) > > It does not test the actual English text. Which is the feature -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39349) Add a CheckError() method to SparkFunSuite
[ https://issues.apache.org/jira/browse/SPARK-39349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-39349. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36693 [https://github.com/apache/spark/pull/36693] > Add a CheckError() method to SparkFunSuite > -- > > Key: SPARK-39349 > URL: https://issues.apache.org/jira/browse/SPARK-39349 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.1 >Reporter: Serge Rielau >Assignee: Serge Rielau >Priority: Major > Fix For: 3.4.0 > > > We want to standardize on a generic way to QA error messages without impeding > the ability to enhance/rework error messages. > CheckError() allows for efficient asserting on the "payload": > * Errorclass, subclass > * SQLState > * Parameters (both names and values) > > It does not test the actual English text. Which is the feature -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39410) Exclude rules in analyzer
[ https://issues.apache.org/jira/browse/SPARK-39410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-39410. -- Resolution: Invalid > Exclude rules in analyzer > -- > > Key: SPARK-39410 > URL: https://issues.apache.org/jira/browse/SPARK-39410 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.1 >Reporter: shi yuhang >Priority: Major > > I have found that we can use `spark.sql.optimizer.excludedRules` to exclude > rules in the optimizer. I'd like to have a similar capability in the analyzer. > I don't know if it is possible or if it breaks the design of catalyst? -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39410) Exclude rules in analyzer
[ https://issues.apache.org/jira/browse/SPARK-39410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551887#comment-17551887 ] Hyukjin Kwon commented on SPARK-39410: -- Analyzer rules cannot be excluded for Spark SQL to work. Optimizer rules can be because conceptually it should work without all Optimizer rules. > Exclude rules in analyzer > -- > > Key: SPARK-39410 > URL: https://issues.apache.org/jira/browse/SPARK-39410 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.1 >Reporter: shi yuhang >Priority: Major > > I have found that we can use `spark.sql.optimizer.excludedRules` to exclude > rules in the optimizer. I'd like to have a similar capability in the analyzer. > I don't know if it is possible or if it breaks the design of catalyst? -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39420) Support ANALYZE TABLE on v2 tables
[ https://issues.apache.org/jira/browse/SPARK-39420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-39420: - Priority: Major (was: Blocker) > Support ANALYZE TABLE on v2 tables > -- > > Key: SPARK-39420 > URL: https://issues.apache.org/jira/browse/SPARK-39420 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 3.2.1 >Reporter: Felipe >Priority: Major > > According to [https://github.com/delta-io/delta/pull/840,] to implement > ANALYZE TABLE in Delta, we need to add the missing APIs in Spark to allow a > data source to report the file set to calculate the stats. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39421) Sphinx build fails with "node class 'meta' is already registered, its visitors will be overridden"
[ https://issues.apache.org/jira/browse/SPARK-39421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39421: Assignee: (was: Apache Spark) > Sphinx build fails with "node class 'meta' is already registered, its > visitors will be overridden" > -- > > Key: SPARK-39421 > URL: https://issues.apache.org/jira/browse/SPARK-39421 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.0.3, 3.1.2, 3.2.1, 3.3.0, 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > Moving to python/docs directory and building sphinx. > Running Sphinx v3.0.4 > WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It > is required to set this environment variable to '1' in both driver and > executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you > but it does not work if there is a Spark context already launched. > /__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: > Warning: Latest version of pandas(>=1.4.0) is required to generate the > documentation; however, your version was 1.3.5 > warnings.warn( > Warning, treated as error: > node class 'meta' is already registered, its visitors will be overridden > make: *** [Makefile:35: html] Error 2 > > Jekyll 4.2.1 Please append `--trace` to the `build` command > for any additional information or backtrace. > > {code} > Sphinx build fails apparently with the latest docutils (see also > https://issues.apache.org/jira/browse/FLINK-24662). we should pin the version. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39421) Sphinx build fails with "node class 'meta' is already registered, its visitors will be overridden"
[ https://issues.apache.org/jira/browse/SPARK-39421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551886#comment-17551886 ] Apache Spark commented on SPARK-39421: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/36813 > Sphinx build fails with "node class 'meta' is already registered, its > visitors will be overridden" > -- > > Key: SPARK-39421 > URL: https://issues.apache.org/jira/browse/SPARK-39421 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.0.3, 3.1.2, 3.2.1, 3.3.0, 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > Moving to python/docs directory and building sphinx. > Running Sphinx v3.0.4 > WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It > is required to set this environment variable to '1' in both driver and > executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you > but it does not work if there is a Spark context already launched. > /__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: > Warning: Latest version of pandas(>=1.4.0) is required to generate the > documentation; however, your version was 1.3.5 > warnings.warn( > Warning, treated as error: > node class 'meta' is already registered, its visitors will be overridden > make: *** [Makefile:35: html] Error 2 > > Jekyll 4.2.1 Please append `--trace` to the `build` command > for any additional information or backtrace. > > {code} > Sphinx build fails apparently with the latest docutils (see also > https://issues.apache.org/jira/browse/FLINK-24662). we should pin the version. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39421) Sphinx build fails with "node class 'meta' is already registered, its visitors will be overridden"
[ https://issues.apache.org/jira/browse/SPARK-39421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39421: Assignee: Apache Spark > Sphinx build fails with "node class 'meta' is already registered, its > visitors will be overridden" > -- > > Key: SPARK-39421 > URL: https://issues.apache.org/jira/browse/SPARK-39421 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.0.3, 3.1.2, 3.2.1, 3.3.0, 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > {code} > Moving to python/docs directory and building sphinx. > Running Sphinx v3.0.4 > WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It > is required to set this environment variable to '1' in both driver and > executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you > but it does not work if there is a Spark context already launched. > /__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: > Warning: Latest version of pandas(>=1.4.0) is required to generate the > documentation; however, your version was 1.3.5 > warnings.warn( > Warning, treated as error: > node class 'meta' is already registered, its visitors will be overridden > make: *** [Makefile:35: html] Error 2 > > Jekyll 4.2.1 Please append `--trace` to the `build` command > for any additional information or backtrace. > > {code} > Sphinx build fails apparently with the latest docutils (see also > https://issues.apache.org/jira/browse/FLINK-24662). we should pin the version. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39421) Sphinx build fails with "node class 'meta' is already registered, its visitors will be overridden"
[ https://issues.apache.org/jira/browse/SPARK-39421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-39421: - Affects Version/s: 3.2.1 3.1.2 3.0.3 3.3.0 > Sphinx build fails with "node class 'meta' is already registered, its > visitors will be overridden" > -- > > Key: SPARK-39421 > URL: https://issues.apache.org/jira/browse/SPARK-39421 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.0.3, 3.1.2, 3.2.1, 3.3.0, 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > Moving to python/docs directory and building sphinx. > Running Sphinx v3.0.4 > WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It > is required to set this environment variable to '1' in both driver and > executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you > but it does not work if there is a Spark context already launched. > /__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: > Warning: Latest version of pandas(>=1.4.0) is required to generate the > documentation; however, your version was 1.3.5 > warnings.warn( > Warning, treated as error: > node class 'meta' is already registered, its visitors will be overridden > make: *** [Makefile:35: html] Error 2 > > Jekyll 4.2.1 Please append `--trace` to the `build` command > for any additional information or backtrace. > > {code} > Sphinx build fails apparently with the latest docutils (see also > https://issues.apache.org/jira/browse/FLINK-24662). we should pin the version. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39421) Sphinx build fails with "node class 'meta' is already registered, its visitors will be overridden"
Hyukjin Kwon created SPARK-39421: Summary: Sphinx build fails with "node class 'meta' is already registered, its visitors will be overridden" Key: SPARK-39421 URL: https://issues.apache.org/jira/browse/SPARK-39421 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 3.4.0 Environment: {code} Moving to python/docs directory and building sphinx. Running Sphinx v3.0.4 WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is required to set this environment variable to '1' in both driver and executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you but it does not work if there is a Spark context already launched. /__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: Warning: Latest version of pandas(>=1.4.0) is required to generate the documentation; however, your version was 1.3.5 warnings.warn( Warning, treated as error: node class 'meta' is already registered, its visitors will be overridden make: *** [Makefile:35: html] Error 2 Jekyll 4.2.1 Please append `--trace` to the `build` command for any additional information or backtrace. {code} Sphinx build fails apparently with the latest docutils (see also https://issues.apache.org/jira/browse/FLINK-24662). we should pin the version. Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39421) Sphinx build fails with "node class 'meta' is already registered, its visitors will be overridden"
[ https://issues.apache.org/jira/browse/SPARK-39421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-39421: - Environment: (was: {code} Moving to python/docs directory and building sphinx. Running Sphinx v3.0.4 WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is required to set this environment variable to '1' in both driver and executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you but it does not work if there is a Spark context already launched. /__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: Warning: Latest version of pandas(>=1.4.0) is required to generate the documentation; however, your version was 1.3.5 warnings.warn( Warning, treated as error: node class 'meta' is already registered, its visitors will be overridden make: *** [Makefile:35: html] Error 2 Jekyll 4.2.1 Please append `--trace` to the `build` command for any additional information or backtrace. {code} Sphinx build fails apparently with the latest docutils (see also https://issues.apache.org/jira/browse/FLINK-24662). we should pin the version.) > Sphinx build fails with "node class 'meta' is already registered, its > visitors will be overridden" > -- > > Key: SPARK-39421 > URL: https://issues.apache.org/jira/browse/SPARK-39421 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39421) Sphinx build fails with "node class 'meta' is already registered, its visitors will be overridden"
[ https://issues.apache.org/jira/browse/SPARK-39421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-39421: - Description: {code} Moving to python/docs directory and building sphinx. Running Sphinx v3.0.4 WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is required to set this environment variable to '1' in both driver and executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you but it does not work if there is a Spark context already launched. /__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: Warning: Latest version of pandas(>=1.4.0) is required to generate the documentation; however, your version was 1.3.5 warnings.warn( Warning, treated as error: node class 'meta' is already registered, its visitors will be overridden make: *** [Makefile:35: html] Error 2 Jekyll 4.2.1 Please append `--trace` to the `build` command for any additional information or backtrace. {code} Sphinx build fails apparently with the latest docutils (see also https://issues.apache.org/jira/browse/FLINK-24662). we should pin the version. > Sphinx build fails with "node class 'meta' is already registered, its > visitors will be overridden" > -- > > Key: SPARK-39421 > URL: https://issues.apache.org/jira/browse/SPARK-39421 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > Moving to python/docs directory and building sphinx. > Running Sphinx v3.0.4 > WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It > is required to set this environment variable to '1' in both driver and > executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you > but it does not work if there is a Spark context already launched. > /__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: > Warning: Latest version of pandas(>=1.4.0) is required to generate the > documentation; however, your version was 1.3.5 > warnings.warn( > Warning, treated as error: > node class 'meta' is already registered, its visitors will be overridden > make: *** [Makefile:35: html] Error 2 > > Jekyll 4.2.1 Please append `--trace` to the `build` command > for any additional information or backtrace. > > {code} > Sphinx build fails apparently with the latest docutils (see also > https://issues.apache.org/jira/browse/FLINK-24662). we should pin the version. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39420) Support ANALYZE TABLE on v2 tables
Felipe created SPARK-39420: -- Summary: Support ANALYZE TABLE on v2 tables Key: SPARK-39420 URL: https://issues.apache.org/jira/browse/SPARK-39420 Project: Spark Issue Type: Improvement Components: Optimizer Affects Versions: 3.2.1 Reporter: Felipe According to [https://github.com/delta-io/delta/pull/840,] to implement ANALYZE TABLE in Delta, we need to add the missing APIs in Spark to allow a data source to report the file set to calculate the stats. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39418) DECODE docs refer to Oracle instead of Spark
[ https://issues.apache.org/jira/browse/SPARK-39418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-39418. - Resolution: Duplicate > DECODE docs refer to Oracle instead of Spark > > > Key: SPARK-39418 > URL: https://issues.apache.org/jira/browse/SPARK-39418 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.2.0 >Reporter: Serge Rielau >Priority: Critical > > [https://spark.apache.org/docs/latest/api/sql/index.html#decode] > If no match is found, then {color:#de350b}Oracle{color} returns default. If > default is omitted, returns null. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39418) DECODE docs refer to Oracle instead of Spark
[ https://issues.apache.org/jira/browse/SPARK-39418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551877#comment-17551877 ] Wenchen Fan commented on SPARK-39418: - yea this has been fixed by https://issues.apache.org/jira/browse/SPARK-39286 > DECODE docs refer to Oracle instead of Spark > > > Key: SPARK-39418 > URL: https://issues.apache.org/jira/browse/SPARK-39418 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.2.0 >Reporter: Serge Rielau >Priority: Critical > > [https://spark.apache.org/docs/latest/api/sql/index.html#decode] > If no match is found, then {color:#de350b}Oracle{color} returns default. If > default is omitted, returns null. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39400) spark-sql remain hive resource download dir after exit
[ https://issues.apache.org/jira/browse/SPARK-39400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-39400. - Resolution: Fixed Issue resolved by pull request 36786 [https://github.com/apache/spark/pull/36786] > spark-sql remain hive resource download dir after exit > -- > > Key: SPARK-39400 > URL: https://issues.apache.org/jira/browse/SPARK-39400 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.4.0 > > > {code:java} > drwxrwxr-x 2 yi.zhu yi.zhu4096 Jun 7 18:06 > da92eec4-2db1-4941-9e53-b28c38e25e31_resources > drwxrwxr-x 2 yi.zhu yi.zhu4096 Jun 7 18:14 > dad364e8-ed1d-4ced-a6df-4897361c69b1_resources > drwxrwxr-x 2 yi.zhu yi.zhu4096 Jun 7 18:13 > ee0a2ee7-ff3e-4346-9181-e8e491b1ca15_resources > drwxr-xr-x 2 yi.zhu yi.zhu4096 Jun 7 18:16 > hsperfdata_yi.zhu > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39400) spark-sql remain hive resource download dir after exit
[ https://issues.apache.org/jira/browse/SPARK-39400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang reassigned SPARK-39400: --- Assignee: angerszhu > spark-sql remain hive resource download dir after exit > -- > > Key: SPARK-39400 > URL: https://issues.apache.org/jira/browse/SPARK-39400 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.4.0 > > > {code:java} > drwxrwxr-x 2 yi.zhu yi.zhu4096 Jun 7 18:06 > da92eec4-2db1-4941-9e53-b28c38e25e31_resources > drwxrwxr-x 2 yi.zhu yi.zhu4096 Jun 7 18:14 > dad364e8-ed1d-4ced-a6df-4897361c69b1_resources > drwxrwxr-x 2 yi.zhu yi.zhu4096 Jun 7 18:13 > ee0a2ee7-ff3e-4346-9181-e8e491b1ca15_resources > drwxr-xr-x 2 yi.zhu yi.zhu4096 Jun 7 18:16 > hsperfdata_yi.zhu > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39419) When the comparator of ArraySort returns null, it should fail.
[ https://issues.apache.org/jira/browse/SPARK-39419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39419: Assignee: (was: Apache Spark) > When the comparator of ArraySort returns null, it should fail. > -- > > Key: SPARK-39419 > URL: https://issues.apache.org/jira/browse/SPARK-39419 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Priority: Major > > When the comparator of {{ArraySort}} returns {{null}}, currently it handles > it as {{0}} (equal). > According to the doc, > {quote} > It returns -1, 0, or 1 as the first element is less than, equal to, or > greater than the second element. If the comparator function returns other > values (including null), the function will fail and raise an error. > {quote} > It's fine to return non -1, 0, 1 integers to follow the Java convention > (still need to update the doc, though), but it should throw an exception for > {{null}} result. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39419) When the comparator of ArraySort returns null, it should fail.
[ https://issues.apache.org/jira/browse/SPARK-39419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551867#comment-17551867 ] Apache Spark commented on SPARK-39419: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/36812 > When the comparator of ArraySort returns null, it should fail. > -- > > Key: SPARK-39419 > URL: https://issues.apache.org/jira/browse/SPARK-39419 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Priority: Major > > When the comparator of {{ArraySort}} returns {{null}}, currently it handles > it as {{0}} (equal). > According to the doc, > {quote} > It returns -1, 0, or 1 as the first element is less than, equal to, or > greater than the second element. If the comparator function returns other > values (including null), the function will fail and raise an error. > {quote} > It's fine to return non -1, 0, 1 integers to follow the Java convention > (still need to update the doc, though), but it should throw an exception for > {{null}} result. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39419) When the comparator of ArraySort returns null, it should fail.
[ https://issues.apache.org/jira/browse/SPARK-39419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551866#comment-17551866 ] Apache Spark commented on SPARK-39419: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/36812 > When the comparator of ArraySort returns null, it should fail. > -- > > Key: SPARK-39419 > URL: https://issues.apache.org/jira/browse/SPARK-39419 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Priority: Major > > When the comparator of {{ArraySort}} returns {{null}}, currently it handles > it as {{0}} (equal). > According to the doc, > {quote} > It returns -1, 0, or 1 as the first element is less than, equal to, or > greater than the second element. If the comparator function returns other > values (including null), the function will fail and raise an error. > {quote} > It's fine to return non -1, 0, 1 integers to follow the Java convention > (still need to update the doc, though), but it should throw an exception for > {{null}} result. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39419) When the comparator of ArraySort returns null, it should fail.
[ https://issues.apache.org/jira/browse/SPARK-39419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39419: Assignee: Apache Spark > When the comparator of ArraySort returns null, it should fail. > -- > > Key: SPARK-39419 > URL: https://issues.apache.org/jira/browse/SPARK-39419 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark >Priority: Major > > When the comparator of {{ArraySort}} returns {{null}}, currently it handles > it as {{0}} (equal). > According to the doc, > {quote} > It returns -1, 0, or 1 as the first element is less than, equal to, or > greater than the second element. If the comparator function returns other > values (including null), the function will fail and raise an error. > {quote} > It's fine to return non -1, 0, 1 integers to follow the Java convention > (still need to update the doc, though), but it should throw an exception for > {{null}} result. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39419) When the comparator of ArraySort returns null, it should fail.
Takuya Ueshin created SPARK-39419: - Summary: When the comparator of ArraySort returns null, it should fail. Key: SPARK-39419 URL: https://issues.apache.org/jira/browse/SPARK-39419 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.0 Reporter: Takuya Ueshin When the comparator of {{ArraySort}} returns {{null}}, currently it handles it as {{0}} (equal). According to the doc, {quote} It returns -1, 0, or 1 as the first element is less than, equal to, or greater than the second element. If the comparator function returns other values (including null), the function will fail and raise an error. {quote} It's fine to return non -1, 0, 1 integers to follow the Java convention (still need to update the doc, though), but it should throw an exception for {{null}} result. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39418) DECODE docs refer to Oracle instead of Spark
[ https://issues.apache.org/jira/browse/SPARK-39418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551801#comment-17551801 ] Bruce Robbins commented on SPARK-39418: --- Possibly a dup of SPARK-39286? > DECODE docs refer to Oracle instead of Spark > > > Key: SPARK-39418 > URL: https://issues.apache.org/jira/browse/SPARK-39418 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.2.0 >Reporter: Serge Rielau >Priority: Critical > > [https://spark.apache.org/docs/latest/api/sql/index.html#decode] > If no match is found, then {color:#de350b}Oracle{color} returns default. If > default is omitted, returns null. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39418) DECODE docs refer to Oracle instead of Spark
Serge Rielau created SPARK-39418: Summary: DECODE docs refer to Oracle instead of Spark Key: SPARK-39418 URL: https://issues.apache.org/jira/browse/SPARK-39418 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 3.2.0 Reporter: Serge Rielau [https://spark.apache.org/docs/latest/api/sql/index.html#decode] If no match is found, then {color:#de350b}Oracle{color} returns default. If default is omitted, returns null. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39393) Parquet data source only supports push-down predicate filters for non-repeated primitive types
[ https://issues.apache.org/jira/browse/SPARK-39393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-39393. Fix Version/s: 3.1.3 3.3.0 3.2.2 3.4.0 Assignee: Amin Borjian Resolution: Fixed > Parquet data source only supports push-down predicate filters for > non-repeated primitive types > -- > > Key: SPARK-39393 > URL: https://issues.apache.org/jira/browse/SPARK-39393 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.2.1 >Reporter: Amin Borjian >Assignee: Amin Borjian >Priority: Major > Labels: parquet > Fix For: 3.1.3, 3.3.0, 3.2.2, 3.4.0 > > > I use an example to illustrate the problem. The reason for the problem and > the problem-solving approach are stated below. > Assume follow Protocol buffer schema: > {code:java} > message Model { > string name = 1; > repeated string keywords = 2; > } > {code} > Suppose a parquet file is created from a set of records in the above format > with the help of the {{parquet-protobuf}} library. > Using Spark version 3.0.2 or older, we could run the following query using > {{{}spark-shell{}}}: > {code:java} > val data = spark.read.parquet("/path/to/parquet") > data.registerTempTable("models") > spark.sql("select * from models where array_contains(keywords, > 'X')").show(false) > {code} > But after updating Spark, we get the following error: > {code:java} > Caused by: java.lang.IllegalArgumentException: FilterPredicates do not > currently support repeated columns. Column keywords is repeated. > at > org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:176) > at > org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:149) > at > org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:89) > at > org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:56) > at > org.apache.parquet.filter2.predicate.Operators$NotEq.accept(Operators.java:192) > at > org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:61) > at > org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:95) > at > org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:45) > at > org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:149) > at > org.apache.parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:72) > at > org.apache.parquet.hadoop.ParquetFileReader.filterRowGroups(ParquetFileReader.java:870) > at > org.apache.parquet.hadoop.ParquetFileReader.(ParquetFileReader.java:789) > at > org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:657) > at > org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:162) > at > org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:373) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127) > ... > {code} > At first it seems the problem is the parquet library. But in fact, our > problem is because of this line that has been around since 2014 (based on Git > history): > [Parquet Schema Compatibility > Validator|https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/filter2/predicate/SchemaCompatibilityValidator.java#L194] > After some check, I notice that the cause of the problem is due to a change > in the data filtering conditions: > {code:java} > spark.sql("select * from log where array_contains(keywords, > 'X')").explain(true); > // Spark 3.0.2 and older > == Physical Plan == > ... > +- FileScan parquet [link#0,keywords#1] > DataFilters: [array_contains(keywords#1, Google)] > PushedFilters: [] > ... > // Spark 3.1.0 and newer > == Physical Plan == ... > +- FileScan parquet [link#0,keywords#1] > DataFilters: [isnotnull(keywords#1), array_contains(keywords#1, Google)] > PushedFilters: [IsNotNull(keywords)] > ...{code} > It's good that the filtering section has become smarter. Unfortunately, due > to unfamiliarity with code base, I could not find the exact location of the > change and rela
[jira] [Updated] (SPARK-39417) Handle Null partition values in PartitioningUtils
[ https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-39417: --- Target Version/s: 3.3.0 > Handle Null partition values in PartitioningUtils > - > > Key: SPARK-39417 > URL: https://issues.apache.org/jira/browse/SPARK-39417 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Prashant Singh >Priority: Major > > partitions with null values we get a NPE on partition discovery, earlier we > use to get `DEFAULT_PARTITION_NAME` > > {quote} [info] java.lang.NullPointerException: > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362) > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355) > [info] at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > [info] at scala.collection.Iterator.foreach(Iterator.scala:943) > [info] at scala.collection.Iterator.foreach$(Iterator.scala:943){quote} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39417) Handle Null partition values in PartitioningUtils
[ https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-39417: --- Fix Version/s: (was: 3.3.0) > Handle Null partition values in PartitioningUtils > - > > Key: SPARK-39417 > URL: https://issues.apache.org/jira/browse/SPARK-39417 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Prashant Singh >Priority: Major > > partitions with null values we get a NPE on partition discovery, earlier we > use to get `DEFAULT_PARTITION_NAME` > > {quote} [info] java.lang.NullPointerException: > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362) > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355) > [info] at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > [info] at scala.collection.Iterator.foreach(Iterator.scala:943) > [info] at scala.collection.Iterator.foreach$(Iterator.scala:943){quote} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39412) IllegalStateException from connector does not work well with error class framework
[ https://issues.apache.org/jira/browse/SPARK-39412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-39412. -- Fix Version/s: 3.3.1 3.4.0 Resolution: Fixed Issue resolved by pull request 36804 [https://github.com/apache/spark/pull/36804] > IllegalStateException from connector does not work well with error class > framework > -- > > Key: SPARK-39412 > URL: https://issues.apache.org/jira/browse/SPARK-39412 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.3.0 >Reporter: Jungtaek Lim >Assignee: Max Gekk >Priority: Blocker > Fix For: 3.3.1, 3.4.0 > > Attachments: kafka-dataloss-error-msg-in-spark-3-2.log, > kafka-dataloss-error-msg-in-spark-3-3-or-master.log > > > With SPARK-39346, Spark SQL binds several exceptions to the internal error, > and produces different guidance on dealing with the exception. This assumes > these exceptions are only used for noticing internal bugs. > This applies to "connectors" as well, and introduces side-effect on the error > log. For Kafka data source, it is a breaking and unacceptable change, because > there is an important use case Kafka data source determines a case of > "dataloss", and throws IllegalStateException with instruction message on > workaround. > I mentioned this as "important" use case, because it can even happen with > some valid scenarios - streaming query has some maintenance period and > Kafka's retention on topic removes some records in the meanwhile. > Two problems arise: > 1) This does not mean Spark has a bug and end users have to report, hence the > guidance message on internal error is misleading. > 2) Most importantly, instruction message is shown after a long stack trace. > With the modification of existing test suite, I see the message being > appeared in "line 90" of the error log. > We should roll the right error message back, at least for Kafka's case. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39412) IllegalStateException from connector does not work well with error class framework
[ https://issues.apache.org/jira/browse/SPARK-39412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-39412: Assignee: Max Gekk > IllegalStateException from connector does not work well with error class > framework > -- > > Key: SPARK-39412 > URL: https://issues.apache.org/jira/browse/SPARK-39412 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.3.0 >Reporter: Jungtaek Lim >Assignee: Max Gekk >Priority: Blocker > Attachments: kafka-dataloss-error-msg-in-spark-3-2.log, > kafka-dataloss-error-msg-in-spark-3-3-or-master.log > > > With SPARK-39346, Spark SQL binds several exceptions to the internal error, > and produces different guidance on dealing with the exception. This assumes > these exceptions are only used for noticing internal bugs. > This applies to "connectors" as well, and introduces side-effect on the error > log. For Kafka data source, it is a breaking and unacceptable change, because > there is an important use case Kafka data source determines a case of > "dataloss", and throws IllegalStateException with instruction message on > workaround. > I mentioned this as "important" use case, because it can even happen with > some valid scenarios - streaming query has some maintenance period and > Kafka's retention on topic removes some records in the meanwhile. > Two problems arise: > 1) This does not mean Spark has a bug and end users have to report, hence the > guidance message on internal error is misleading. > 2) Most importantly, instruction message is shown after a long stack trace. > With the modification of existing test suite, I see the message being > appeared in "line 90" of the error log. > We should roll the right error message back, at least for Kafka's case. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39417) Handle Null partition values in PartitioningUtils
[ https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Singh updated SPARK-39417: --- Affects Version/s: 3.3.0 (was: 3.2.1) > Handle Null partition values in PartitioningUtils > - > > Key: SPARK-39417 > URL: https://issues.apache.org/jira/browse/SPARK-39417 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Prashant Singh >Priority: Major > Fix For: 3.3.0 > > > partitions with null values we get a NPE on partition discovery, earlier we > use to get `DEFAULT_PARTITION_NAME` > > {quote} [info] java.lang.NullPointerException: > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362) > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355) > [info] at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > [info] at scala.collection.Iterator.foreach(Iterator.scala:943) > [info] at scala.collection.Iterator.foreach$(Iterator.scala:943){quote} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39417) Handle Null partition values in PartitioningUtils
[ https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551731#comment-17551731 ] Prashant Singh commented on SPARK-39417: Appologies, the problem I think is only in 3.3.0 seems to introduced in https://github.com/apache/spark/commit/fc29c91f27d866502f5b6cc4261d4943b57e Let me correct it. > Handle Null partition values in PartitioningUtils > - > > Key: SPARK-39417 > URL: https://issues.apache.org/jira/browse/SPARK-39417 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: Prashant Singh >Priority: Major > Fix For: 3.3.0 > > > partitions with null values we get a NPE on partition discovery, earlier we > use to get `DEFAULT_PARTITION_NAME` > > {quote} [info] java.lang.NullPointerException: > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362) > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355) > [info] at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > [info] at scala.collection.Iterator.foreach(Iterator.scala:943) > [info] at scala.collection.Iterator.foreach$(Iterator.scala:943){quote} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39417) Handle Null partition values in PartitioningUtils
[ https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551722#comment-17551722 ] Josh Rosen commented on SPARK-39417: I see that the "affected versions" is currently set to 3.2.1. Does this problem actually occur in that version or is it a regression in 3.3.0? > Handle Null partition values in PartitioningUtils > - > > Key: SPARK-39417 > URL: https://issues.apache.org/jira/browse/SPARK-39417 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: Prashant Singh >Priority: Major > Fix For: 3.3.0 > > > partitions with null values we get a NPE on partition discovery, earlier we > use to get `DEFAULT_PARTITION_NAME` > > {quote} [info] java.lang.NullPointerException: > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362) > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355) > [info] at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > [info] at scala.collection.Iterator.foreach(Iterator.scala:943) > [info] at scala.collection.Iterator.foreach$(Iterator.scala:943){quote} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39417) Handle Null partition values in PartitioningUtils
[ https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39417: Assignee: Apache Spark > Handle Null partition values in PartitioningUtils > - > > Key: SPARK-39417 > URL: https://issues.apache.org/jira/browse/SPARK-39417 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: Prashant Singh >Assignee: Apache Spark >Priority: Major > Fix For: 3.3.0 > > > partitions with null values we get a NPE on partition discovery, earlier we > use to get `DEFAULT_PARTITION_NAME` > > {quote} [info] java.lang.NullPointerException: > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362) > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355) > [info] at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > [info] at scala.collection.Iterator.foreach(Iterator.scala:943) > [info] at scala.collection.Iterator.foreach$(Iterator.scala:943){quote} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39417) Handle Null partition values in PartitioningUtils
[ https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39417: Assignee: (was: Apache Spark) > Handle Null partition values in PartitioningUtils > - > > Key: SPARK-39417 > URL: https://issues.apache.org/jira/browse/SPARK-39417 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: Prashant Singh >Priority: Major > Fix For: 3.3.0 > > > partitions with null values we get a NPE on partition discovery, earlier we > use to get `DEFAULT_PARTITION_NAME` > > {quote} [info] java.lang.NullPointerException: > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362) > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355) > [info] at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > [info] at scala.collection.Iterator.foreach(Iterator.scala:943) > [info] at scala.collection.Iterator.foreach$(Iterator.scala:943){quote} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39417) Handle Null partition values in PartitioningUtils
[ https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551715#comment-17551715 ] Apache Spark commented on SPARK-39417: -- User 'singhpk234' has created a pull request for this issue: https://github.com/apache/spark/pull/36810 > Handle Null partition values in PartitioningUtils > - > > Key: SPARK-39417 > URL: https://issues.apache.org/jira/browse/SPARK-39417 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: Prashant Singh >Priority: Major > Fix For: 3.3.0 > > > partitions with null values we get a NPE on partition discovery, earlier we > use to get `DEFAULT_PARTITION_NAME` > > {quote} [info] java.lang.NullPointerException: > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362) > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355) > [info] at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > [info] at scala.collection.Iterator.foreach(Iterator.scala:943) > [info] at scala.collection.Iterator.foreach$(Iterator.scala:943){quote} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39417) Handle Null partition values in PartitioningUtils
[ https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Singh updated SPARK-39417: --- Description: partitions with null values we get a NPE on partition discovery, earlier we use to get `DEFAULT_PARTITION_NAME` {quote} [info] java.lang.NullPointerException: [info] at org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362) [info] at org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355) [info] at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) [info] at scala.collection.Iterator.foreach(Iterator.scala:943) [info] at scala.collection.Iterator.foreach$(Iterator.scala:943){quote} was: table with partitions will null values fails with NPE, during partition discovery. {quote} [info] java.lang.NullPointerException: [info] at org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362) [info] at org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355) [info] at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) [info] at scala.collection.Iterator.foreach(Iterator.scala:943) [info] at scala.collection.Iterator.foreach$(Iterator.scala:943){quote} > Handle Null partition values in PartitioningUtils > - > > Key: SPARK-39417 > URL: https://issues.apache.org/jira/browse/SPARK-39417 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: Prashant Singh >Priority: Major > Fix For: 3.3.0 > > > partitions with null values we get a NPE on partition discovery, earlier we > use to get `DEFAULT_PARTITION_NAME` > > {quote} [info] java.lang.NullPointerException: > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362) > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355) > [info] at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > [info] at scala.collection.Iterator.foreach(Iterator.scala:943) > [info] at scala.collection.Iterator.foreach$(Iterator.scala:943){quote} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39417) Handle Null partition values in PartitioningUtils
[ https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551706#comment-17551706 ] Prashant Singh commented on SPARK-39417: PR : https://github.com/apache/spark/pull/36810/files > Handle Null partition values in PartitioningUtils > - > > Key: SPARK-39417 > URL: https://issues.apache.org/jira/browse/SPARK-39417 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: Prashant Singh >Priority: Major > Fix For: 3.3.0 > > > table with partitions will null values fails with NPE, during partition > discovery. > > {quote} [info] java.lang.NullPointerException: > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362) > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355) > [info] at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > [info] at scala.collection.Iterator.foreach(Iterator.scala:943) > [info] at scala.collection.Iterator.foreach$(Iterator.scala:943){quote} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39417) Handle Null partition values in PartitioningUtils
[ https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Singh updated SPARK-39417: --- Description: table with partitions will null values fails with NPE, during partition discovery. {quote} [info] java.lang.NullPointerException: [info] at org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362) [info] at org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355) [info] at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) [info] at scala.collection.Iterator.foreach(Iterator.scala:943) [info] at scala.collection.Iterator.foreach$(Iterator.scala:943){quote} was: table with partitions will null values fails with NPE, during partition discovery. > [info] java.lang.NullPointerException: [info] at org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362) [info] at org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355) [info] at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) [info] at scala.collection.Iterator.foreach(Iterator.scala:943) [info] at scala.collection.Iterator.foreach$(Iterator.scala:943) > Handle Null partition values in PartitioningUtils > - > > Key: SPARK-39417 > URL: https://issues.apache.org/jira/browse/SPARK-39417 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: Prashant Singh >Priority: Major > Fix For: 3.3.0 > > > table with partitions will null values fails with NPE, during partition > discovery. > > {quote} [info] java.lang.NullPointerException: > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362) > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355) > [info] at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > [info] at scala.collection.Iterator.foreach(Iterator.scala:943) > [info] at scala.collection.Iterator.foreach$(Iterator.scala:943){quote} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39417) Handle Null partition values in PartitioningUtils
[ https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Singh updated SPARK-39417: --- Description: table with partitions will null values fails with NPE, during partition discovery. > [info] java.lang.NullPointerException: [info] at org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362) [info] at org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355) [info] at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) [info] at scala.collection.Iterator.foreach(Iterator.scala:943) [info] at scala.collection.Iterator.foreach$(Iterator.scala:943) was: partitions will null values fails with NPE now. ``` [info] - Null partition value *** FAILED *** (142 milliseconds) [info] java.lang.NullPointerException: [info] at org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362) [info] at org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355) [info] at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) [info] at scala.collection.Iterator.foreach(Iterator.scala:943) [info] at scala.collection.Iterator.foreach$(Iterator.scala:943) ``` > Handle Null partition values in PartitioningUtils > - > > Key: SPARK-39417 > URL: https://issues.apache.org/jira/browse/SPARK-39417 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: Prashant Singh >Priority: Major > Fix For: 3.3.0 > > > table with partitions will null values fails with NPE, during partition > discovery. > > > [info] java.lang.NullPointerException: > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362) > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355) > [info] at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > [info] at scala.collection.Iterator.foreach(Iterator.scala:943) > [info] at scala.collection.Iterator.foreach$(Iterator.scala:943) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39417) Handle Null partition values in PartitioningUtils
[ https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551701#comment-17551701 ] Prashant Singh commented on SPARK-39417: adding a PR for this shortly > Handle Null partition values in PartitioningUtils > - > > Key: SPARK-39417 > URL: https://issues.apache.org/jira/browse/SPARK-39417 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: Prashant Singh >Priority: Major > Fix For: 3.3.0 > > > partitions will null values fails with NPE now. > > ``` > [info] - Null partition value *** FAILED *** (142 milliseconds) > [info] java.lang.NullPointerException: > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362) > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355) > [info] at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > [info] at scala.collection.Iterator.foreach(Iterator.scala:943) > [info] at scala.collection.Iterator.foreach$(Iterator.scala:943) > ``` -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39417) Handle Null partition values in PartitioningUtils
Prashant Singh created SPARK-39417: -- Summary: Handle Null partition values in PartitioningUtils Key: SPARK-39417 URL: https://issues.apache.org/jira/browse/SPARK-39417 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.1 Reporter: Prashant Singh Fix For: 3.3.0 partitions will null values fails with NPE now. ``` [info] - Null partition value *** FAILED *** (142 milliseconds) [info] java.lang.NullPointerException: [info] at org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362) [info] at org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355) [info] at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) [info] at scala.collection.Iterator.foreach(Iterator.scala:943) [info] at scala.collection.Iterator.foreach$(Iterator.scala:943) ``` -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39416) When raising an exception, pass parameters as a map instead of an array
[ https://issues.apache.org/jira/browse/SPARK-39416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serge Rielau updated SPARK-39416: - Description: We have moved away from c-style parameters in error message texts towards symbolic parameters. E.g. {code:java} "CANNOT_CAST_DATATYPE" : { "message" : [ "Cannot cast to ." ], "sqlState" : "22005" },{code} {{However when we raise an exception we merely pass a simple array and assume positional assignment. }} {code:java} def cannotCastFromNullTypeError(to: DataType): Throwable = { new SparkException(errorClass = "CANNOT_CAST_DATATYPE", messageParameters = Array(NullType.typeName, to.typeName), null) }{code} This has multiple downsides: # It's not possible to mention the same parameter twice in an error message. # When reworking an error message we cannon shuffle parameters without changing the code # There is a risk that the error message and the exception go out of synch unnoticed given we do not want to check for the message text in the code. So in this PR we propose the following new usage: {code:java} def cannotCastFromNullTypeError(to: DataType): Throwable = { new SparkException(errorClass = "CANNOT_CAST_DATATYPE", messageParameters = Map("sourceType" -> NullType.typeName, "targetType" ->to.typeName), context = null) }{code} getMessage will then substitute the parameters in the message appropriately. Moving forward this should be the preferred way to raise exceptions. was: We have moved away from c-style parameters in error message texts towards symbolic parameters. E.g. {code:java} "CANNOT_CAST_DATATYPE" : { "message" : [ "Cannot cast to ." ], "sqlState" : "22005" },{code} {{However when we raise an exception we merely pass a simple array and assume positional assignment. }} {{}} {code:java} def cannotCastFromNullTypeError(to: DataType): Throwable = { new SparkException(errorClass = "CANNOT_CAST_DATATYPE", messageParameters = Array(NullType.typeName, to.typeName), null) }{code} This has multiple downsides: # It's not possible to mention the same parameter twice in an error message. # When reworking an error message we cannon shuffle parameters without changing the code # There is a risk that the error message and the exception go out of synch unnoticed given we do not want to check for the message text in the code. So in this PR we propose the following new usage: {code:java} def cannotCastFromNullTypeError(to: DataType): Throwable = { new SparkException(errorClass = "CANNOT_CAST_DATATYPE", messageParameters = Map("sourceType" -> NullType.typeName, "targetType" ->to.typeName), context = null) }{code} getMessage will then substitute the parameters in the message appropriately. Moving forward this should be the preferred way to raise exceptions. > When raising an exception, pass parameters as a map instead of an array > --- > > Key: SPARK-39416 > URL: https://issues.apache.org/jira/browse/SPARK-39416 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.1 >Reporter: Serge Rielau >Priority: Major > > We have moved away from c-style parameters in error message texts towards > symbolic parameters. E.g. > > {code:java} > "CANNOT_CAST_DATATYPE" : { > "message" : [ > "Cannot cast to ." > ], > "sqlState" : "22005" > },{code} > {{However when we raise an exception we merely pass a simple array and assume > positional assignment. }} > {code:java} > def cannotCastFromNullTypeError(to: DataType): Throwable = { > new SparkException(errorClass = "CANNOT_CAST_DATATYPE", > messageParameters = Array(NullType.typeName, to.typeName), null) > }{code} > > This has multiple downsides: > # It's not possible to mention the same parameter twice in an error message. > # When reworking an error message we cannon shuffle parameters without > changing the code > # There is a risk that the error message and the exception go out of synch > unnoticed given we do not want to check for the message text in the code. > So in this PR we propose the following new usage: > {code:java} > def cannotCastFromNullTypeError(to: DataType): Throwable = { > new SparkException(errorClass = "CANNOT_CAST_DATATYPE", > messageParameters = Map("sourceType" -> NullType.typeName, "targetType" > ->to.typeName), > context = null) > }{code} > getMessage will then substitute the parameters in the message appropriately. > Moving forward this should be the preferred way to raise exceptions. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h.
[jira] [Created] (SPARK-39416) When raising an exception, pass parameters as a map instead of an array
Serge Rielau created SPARK-39416: Summary: When raising an exception, pass parameters as a map instead of an array Key: SPARK-39416 URL: https://issues.apache.org/jira/browse/SPARK-39416 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.3.1 Reporter: Serge Rielau We have moved away from c-style parameters in error message texts towards symbolic parameters. E.g. {code:java} "CANNOT_CAST_DATATYPE" : { "message" : [ "Cannot cast to ." ], "sqlState" : "22005" },{code} {{However when we raise an exception we merely pass a simple array and assume positional assignment. }} {{}} {code:java} def cannotCastFromNullTypeError(to: DataType): Throwable = { new SparkException(errorClass = "CANNOT_CAST_DATATYPE", messageParameters = Array(NullType.typeName, to.typeName), null) }{code} This has multiple downsides: # It's not possible to mention the same parameter twice in an error message. # When reworking an error message we cannon shuffle parameters without changing the code # There is a risk that the error message and the exception go out of synch unnoticed given we do not want to check for the message text in the code. So in this PR we propose the following new usage: {code:java} def cannotCastFromNullTypeError(to: DataType): Throwable = { new SparkException(errorClass = "CANNOT_CAST_DATATYPE", messageParameters = Map("sourceType" -> NullType.typeName, "targetType" ->to.typeName), context = null) }{code} getMessage will then substitute the parameters in the message appropriately. Moving forward this should be the preferred way to raise exceptions. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39413) Capitalize sql keywords in JDBCV2Suite
[ https://issues.apache.org/jira/browse/SPARK-39413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-39413. Fix Version/s: 3.4.0 Assignee: jiaan.geng Resolution: Fixed > Capitalize sql keywords in JDBCV2Suite > -- > > Key: SPARK-39413 > URL: https://issues.apache.org/jira/browse/SPARK-39413 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.4.0 > > > JDBCV2Suite exists some test case which uses sql keywords without capitalized. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39415) Local mode supports HadoopDelegationTokenManager
[ https://issues.apache.org/jira/browse/SPARK-39415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551647#comment-17551647 ] Apache Spark commented on SPARK-39415: -- User 'cxzl25' has created a pull request for this issue: https://github.com/apache/spark/pull/36808 > Local mode supports HadoopDelegationTokenManager > > > Key: SPARK-39415 > URL: https://issues.apache.org/jira/browse/SPARK-39415 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: dzcxzl >Priority: Minor > > Now in the kerberos environment, using spark-submit --master=local > --proxy-user xxx cannot access Hive Meta Store, and using --keytab will not > automatically relogin. > {code:java} > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1743) > at > org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:483) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39415) Local mode supports HadoopDelegationTokenManager
[ https://issues.apache.org/jira/browse/SPARK-39415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39415: Assignee: (was: Apache Spark) > Local mode supports HadoopDelegationTokenManager > > > Key: SPARK-39415 > URL: https://issues.apache.org/jira/browse/SPARK-39415 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: dzcxzl >Priority: Minor > > Now in the kerberos environment, using spark-submit --master=local > --proxy-user xxx cannot access Hive Meta Store, and using --keytab will not > automatically relogin. > {code:java} > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1743) > at > org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:483) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39415) Local mode supports HadoopDelegationTokenManager
[ https://issues.apache.org/jira/browse/SPARK-39415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39415: Assignee: Apache Spark > Local mode supports HadoopDelegationTokenManager > > > Key: SPARK-39415 > URL: https://issues.apache.org/jira/browse/SPARK-39415 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: dzcxzl >Assignee: Apache Spark >Priority: Minor > > Now in the kerberos environment, using spark-submit --master=local > --proxy-user xxx cannot access Hive Meta Store, and using --keytab will not > automatically relogin. > {code:java} > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1743) > at > org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:483) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39415) Local mode supports HadoopDelegationTokenManager
[ https://issues.apache.org/jira/browse/SPARK-39415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dzcxzl updated SPARK-39415: --- Summary: Local mode supports HadoopDelegationTokenManager (was: Local mode supports delegationTokenManager) > Local mode supports HadoopDelegationTokenManager > > > Key: SPARK-39415 > URL: https://issues.apache.org/jira/browse/SPARK-39415 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: dzcxzl >Priority: Minor > > Now in the kerberos environment, using spark-submit --master=local > --proxy-user xxx cannot access Hive Meta Store, and using --keytab will not > automatically relogin. > {code:java} > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1743) > at > org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:483) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39415) Local mode supports delegationTokenManager
dzcxzl created SPARK-39415: -- Summary: Local mode supports delegationTokenManager Key: SPARK-39415 URL: https://issues.apache.org/jira/browse/SPARK-39415 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.2.1 Reporter: dzcxzl Now in the kerberos environment, using spark-submit --master=local --proxy-user xxx cannot access Hive Meta Store, and using --keytab will not automatically relogin. {code:java} javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1743) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:483) {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39414) Upgrade Scala to 2.12.16
[ https://issues.apache.org/jira/browse/SPARK-39414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39414: Assignee: (was: Apache Spark) > Upgrade Scala to 2.12.16 > > > Key: SPARK-39414 > URL: https://issues.apache.org/jira/browse/SPARK-39414 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > https://github.com/scala/scala/releases/tag/v2.12.16 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39414) Upgrade Scala to 2.12.16
[ https://issues.apache.org/jira/browse/SPARK-39414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551596#comment-17551596 ] Apache Spark commented on SPARK-39414: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/36807 > Upgrade Scala to 2.12.16 > > > Key: SPARK-39414 > URL: https://issues.apache.org/jira/browse/SPARK-39414 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > https://github.com/scala/scala/releases/tag/v2.12.16 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39414) Upgrade Scala to 2.12.16
[ https://issues.apache.org/jira/browse/SPARK-39414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39414: Assignee: Apache Spark > Upgrade Scala to 2.12.16 > > > Key: SPARK-39414 > URL: https://issues.apache.org/jira/browse/SPARK-39414 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > > https://github.com/scala/scala/releases/tag/v2.12.16 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39414) Upgrade Scala to 2.12.16
[ https://issues.apache.org/jira/browse/SPARK-39414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551595#comment-17551595 ] Apache Spark commented on SPARK-39414: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/36807 > Upgrade Scala to 2.12.16 > > > Key: SPARK-39414 > URL: https://issues.apache.org/jira/browse/SPARK-39414 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > https://github.com/scala/scala/releases/tag/v2.12.16 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39414) Upgrade Scala to 2.12.16
Yang Jie created SPARK-39414: Summary: Upgrade Scala to 2.12.16 Key: SPARK-39414 URL: https://issues.apache.org/jira/browse/SPARK-39414 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.4.0 Reporter: Yang Jie https://github.com/scala/scala/releases/tag/v2.12.16 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38852) Better Data Source V2 operator pushdown framework
[ https://issues.apache.org/jira/browse/SPARK-38852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-38852: --- Description: Currently, Spark supports push down Filters and Aggregates to data source. However, the Data Source V2 operator pushdown framework has the following shortcomings: # Only simple filter and aggregate are supported, which makes it impossible to apply in most scenarios # The incompatibility of SQL syntax makes it impossible to apply in most scenarios # Aggregate push down does not support multiple partitions of data sources # Spark's additional aggregate will cause some overhead # Limit push down is not supported # Top n push down is not supported # Aggregate push down does not support group by expressions # Aggregate push down does not support not use aggregate functions. # Offset push down is not supported # Paging push down is not supported was: Currently, Spark supports push down Filters and Aggregates to data source. However, the Data Source V2 operator pushdown framework has the following shortcomings: # Only simple filter and aggregate are supported, which makes it impossible to apply in most scenarios # The incompatibility of SQL syntax makes it impossible to apply in most scenarios # Aggregate push down does not support multiple partitions of data sources # Spark's additional aggregate will cause some overhead # Limit push down is not supported # Top n push down is not supported # Aggregate push down does not support group by expressions # Offset push down is not supported # Paging push down is not supported > Better Data Source V2 operator pushdown framework > - > > Key: SPARK-38852 > URL: https://issues.apache.org/jira/browse/SPARK-38852 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > > Currently, Spark supports push down Filters and Aggregates to data source. > However, the Data Source V2 operator pushdown framework has the following > shortcomings: > # Only simple filter and aggregate are supported, which makes it impossible > to apply in most scenarios > # The incompatibility of SQL syntax makes it impossible to apply in most > scenarios > # Aggregate push down does not support multiple partitions of data sources > # Spark's additional aggregate will cause some overhead > # Limit push down is not supported > # Top n push down is not supported > # Aggregate push down does not support group by expressions > # Aggregate push down does not support not use aggregate functions. > # Offset push down is not supported > # Paging push down is not supported -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38852) Better Data Source V2 operator pushdown framework
[ https://issues.apache.org/jira/browse/SPARK-38852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-38852: --- Description: Currently, Spark supports push down Filters and Aggregates to data source. However, the Data Source V2 operator pushdown framework has the following shortcomings: # Only simple filter and aggregate are supported, which makes it impossible to apply in most scenarios # The incompatibility of SQL syntax makes it impossible to apply in most scenarios # Aggregate push down does not support multiple partitions of data sources # Spark's additional aggregate will cause some overhead # Limit push down is not supported # Top n push down is not supported # Aggregate push down does not support group by expressions # Aggregate push down does not support not use aggregate functions # Offset push down is not supported # Paging push down is not supported was: Currently, Spark supports push down Filters and Aggregates to data source. However, the Data Source V2 operator pushdown framework has the following shortcomings: # Only simple filter and aggregate are supported, which makes it impossible to apply in most scenarios # The incompatibility of SQL syntax makes it impossible to apply in most scenarios # Aggregate push down does not support multiple partitions of data sources # Spark's additional aggregate will cause some overhead # Limit push down is not supported # Top n push down is not supported # Aggregate push down does not support group by expressions # Aggregate push down does not support not use aggregate functions. # Offset push down is not supported # Paging push down is not supported > Better Data Source V2 operator pushdown framework > - > > Key: SPARK-38852 > URL: https://issues.apache.org/jira/browse/SPARK-38852 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > > Currently, Spark supports push down Filters and Aggregates to data source. > However, the Data Source V2 operator pushdown framework has the following > shortcomings: > # Only simple filter and aggregate are supported, which makes it impossible > to apply in most scenarios > # The incompatibility of SQL syntax makes it impossible to apply in most > scenarios > # Aggregate push down does not support multiple partitions of data sources > # Spark's additional aggregate will cause some overhead > # Limit push down is not supported > # Top n push down is not supported > # Aggregate push down does not support group by expressions > # Aggregate push down does not support not use aggregate functions > # Offset push down is not supported > # Paging push down is not supported -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38852) Better Data Source V2 operator pushdown framework
[ https://issues.apache.org/jira/browse/SPARK-38852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-38852: --- Description: Currently, Spark supports push down Filters and Aggregates to data source. However, the Data Source V2 operator pushdown framework has the following shortcomings: # Only simple filter and aggregate are supported, which makes it impossible to apply in most scenarios # The incompatibility of SQL syntax makes it impossible to apply in most scenarios # Aggregate push down does not support multiple partitions of data sources # Spark's additional aggregate will cause some overhead # Limit push down is not supported # Top n push down is not supported # Aggregate push down does not support group by expressions # Offset push down is not supported # Paging push down is not supported was: Currently, Spark supports push down Filters and Aggregates to data source. However, the Data Source V2 operator pushdown framework has the following shortcomings: # Only simple filter and aggregate are supported, which makes it impossible to apply in most scenarios # The incompatibility of SQL syntax makes it impossible to apply in most scenarios # Aggregate push down does not support multiple partitions of data sources # Spark's additional aggregate will cause some overhead # Limit push down is not supported # Top n push down is not supported # Aggregate push down does not support group by expressions # Offset push down is not supported > Better Data Source V2 operator pushdown framework > - > > Key: SPARK-38852 > URL: https://issues.apache.org/jira/browse/SPARK-38852 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > > Currently, Spark supports push down Filters and Aggregates to data source. > However, the Data Source V2 operator pushdown framework has the following > shortcomings: > # Only simple filter and aggregate are supported, which makes it impossible > to apply in most scenarios > # The incompatibility of SQL syntax makes it impossible to apply in most > scenarios > # Aggregate push down does not support multiple partitions of data sources > # Spark's additional aggregate will cause some overhead > # Limit push down is not supported > # Top n push down is not supported > # Aggregate push down does not support group by expressions > # Offset push down is not supported > # Paging push down is not supported -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39398) when doing iteration compute in graphx, checkpoint need support storagelevel
[ https://issues.apache.org/jira/browse/SPARK-39398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39398: Assignee: (was: Apache Spark) > when doing iteration compute in graphx, checkpoint need support storagelevel > - > > Key: SPARK-39398 > URL: https://issues.apache.org/jira/browse/SPARK-39398 > Project: Spark > Issue Type: Bug > Components: GraphX >Affects Versions: 3.2.1 >Reporter: wangwenli >Priority: Major > > this issue related to spark-30502, in that issue ,it olny fix partial ml > algorithm. > in graphx pregel, inside iteration compute, the message checkpointer also > should support setting storage level, other than the default memory only level -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39398) when doing iteration compute in graphx, checkpoint need support storagelevel
[ https://issues.apache.org/jira/browse/SPARK-39398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39398: Assignee: Apache Spark > when doing iteration compute in graphx, checkpoint need support storagelevel > - > > Key: SPARK-39398 > URL: https://issues.apache.org/jira/browse/SPARK-39398 > Project: Spark > Issue Type: Bug > Components: GraphX >Affects Versions: 3.2.1 >Reporter: wangwenli >Assignee: Apache Spark >Priority: Major > > this issue related to spark-30502, in that issue ,it olny fix partial ml > algorithm. > in graphx pregel, inside iteration compute, the message checkpointer also > should support setting storage level, other than the default memory only level -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39398) when doing iteration compute in graphx, checkpoint need support storagelevel
[ https://issues.apache.org/jira/browse/SPARK-39398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551497#comment-17551497 ] Apache Spark commented on SPARK-39398: -- User 'wwli05' has created a pull request for this issue: https://github.com/apache/spark/pull/36806 > when doing iteration compute in graphx, checkpoint need support storagelevel > - > > Key: SPARK-39398 > URL: https://issues.apache.org/jira/browse/SPARK-39398 > Project: Spark > Issue Type: Bug > Components: GraphX >Affects Versions: 3.2.1 >Reporter: wangwenli >Priority: Major > > this issue related to spark-30502, in that issue ,it olny fix partial ml > algorithm. > in graphx pregel, inside iteration compute, the message checkpointer also > should support setting storage level, other than the default memory only level -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39411) Release candidates do not have the correct version for PySpark
[ https://issues.apache.org/jira/browse/SPARK-39411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-39411: - Fix Version/s: 3.3.1 > Release candidates do not have the correct version for PySpark > -- > > Key: SPARK-39411 > URL: https://issues.apache.org/jira/browse/SPARK-39411 > Project: Spark > Issue Type: Bug > Components: Build, PySpark >Affects Versions: 3.3.1 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Critical > Fix For: 3.4.0, 3.3.1 > > > https://github.com/apache/spark/blob/v3.3.0-rc5/dev/create-release/release-tag.sh#L88 > fails to replace the version in > https://github.com/apache/spark/blob/v3.3.0-rc5/python/pyspark/version.py#L19 > because now we have {code}: str ={code} hint ... -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39404) Unable to query _metadata in streaming if getBatch returns multiple logical nodes in the DataFrame
[ https://issues.apache.org/jira/browse/SPARK-39404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-39404. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36801 [https://github.com/apache/spark/pull/36801] > Unable to query _metadata in streaming if getBatch returns multiple logical > nodes in the DataFrame > -- > > Key: SPARK-39404 > URL: https://issues.apache.org/jira/browse/SPARK-39404 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.2.1 >Reporter: Yaohua Zhao >Assignee: Yaohua Zhao >Priority: Major > Fix For: 3.4.0 > > > Here: > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L585] > > We should probably `transform` instead of `match` -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39404) Unable to query _metadata in streaming if getBatch returns multiple logical nodes in the DataFrame
[ https://issues.apache.org/jira/browse/SPARK-39404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-39404: Assignee: Yaohua Zhao > Unable to query _metadata in streaming if getBatch returns multiple logical > nodes in the DataFrame > -- > > Key: SPARK-39404 > URL: https://issues.apache.org/jira/browse/SPARK-39404 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.2.1 >Reporter: Yaohua Zhao >Assignee: Yaohua Zhao >Priority: Major > > Here: > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L585] > > We should probably `transform` instead of `match` -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-23207) Shuffle+Repartition on an DataFrame could lead to incorrect answers
[ https://issues.apache.org/jira/browse/SPARK-23207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551465#comment-17551465 ] Igor Berman edited comment on SPARK-23207 at 6/8/22 8:36 AM: - We are still facing this issue in production with v3.1.2 at very large workloads. This happens very rarely, but still happens. Current trials to reproduce this problem with above reproduction failed, so at this point no reproduction, will update if we will find one we are running on mesos and with dynamic allocation was (Author: igor.berman): We are still facing this issue in production with v3.1.2 at very large workloads. This happens very rarely, but still happens. Current trials to reproduce this problem with above reproduction failed, so at this point no reproduction, will update if we will find one > Shuffle+Repartition on an DataFrame could lead to incorrect answers > --- > > Key: SPARK-23207 > URL: https://issues.apache.org/jira/browse/SPARK-23207 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0, 2.0.0, 2.1.0, 2.2.0, 2.3.0 >Reporter: Xingbo Jiang >Assignee: Xingbo Jiang >Priority: Blocker > Labels: correctness > Fix For: 2.1.4, 2.2.3, 2.3.0 > > > Currently shuffle repartition uses RoundRobinPartitioning, the generated > result is nondeterministic since the sequence of input rows are not > determined. > The bug can be triggered when there is a repartition call following a shuffle > (which would lead to non-deterministic row ordering), as the pattern shows > below: > upstream stage -> repartition stage -> result stage > (-> indicate a shuffle) > When one of the executors process goes down, some tasks on the repartition > stage will be retried and generate inconsistent ordering, and some tasks of > the result stage will be retried generating different data. > The following code returns 931532, instead of 100: > {code:java} > import scala.sys.process._ > import org.apache.spark.TaskContext > val res = spark.range(0, 1000 * 1000, 1).repartition(200).map { x => > x > }.repartition(200).map { x => > if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 2) { > throw new Exception("pkill -f java".!!) > } > x > } > res.distinct().count() > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23207) Shuffle+Repartition on an DataFrame could lead to incorrect answers
[ https://issues.apache.org/jira/browse/SPARK-23207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551465#comment-17551465 ] Igor Berman commented on SPARK-23207: - We are still facing this issue in production with v3.1.2 at very large workloads. This happens very rarely, but still happens. Current trials to reproduce this problem with above reproduction failed, so at this point no reproduction, will update if we will find one > Shuffle+Repartition on an DataFrame could lead to incorrect answers > --- > > Key: SPARK-23207 > URL: https://issues.apache.org/jira/browse/SPARK-23207 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0, 2.0.0, 2.1.0, 2.2.0, 2.3.0 >Reporter: Xingbo Jiang >Assignee: Xingbo Jiang >Priority: Blocker > Labels: correctness > Fix For: 2.1.4, 2.2.3, 2.3.0 > > > Currently shuffle repartition uses RoundRobinPartitioning, the generated > result is nondeterministic since the sequence of input rows are not > determined. > The bug can be triggered when there is a repartition call following a shuffle > (which would lead to non-deterministic row ordering), as the pattern shows > below: > upstream stage -> repartition stage -> result stage > (-> indicate a shuffle) > When one of the executors process goes down, some tasks on the repartition > stage will be retried and generate inconsistent ordering, and some tasks of > the result stage will be retried generating different data. > The following code returns 931532, instead of 100: > {code:java} > import scala.sys.process._ > import org.apache.spark.TaskContext > val res = spark.range(0, 1000 * 1000, 1).repartition(200).map { x => > x > }.repartition(200).map { x => > if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 2) { > throw new Exception("pkill -f java".!!) > } > x > } > res.distinct().count() > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39412) IllegalStateException from connector does not work well with error class framework
[ https://issues.apache.org/jira/browse/SPARK-39412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39412: Assignee: (was: Apache Spark) > IllegalStateException from connector does not work well with error class > framework > -- > > Key: SPARK-39412 > URL: https://issues.apache.org/jira/browse/SPARK-39412 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.3.0 >Reporter: Jungtaek Lim >Priority: Blocker > Attachments: kafka-dataloss-error-msg-in-spark-3-2.log, > kafka-dataloss-error-msg-in-spark-3-3-or-master.log > > > With SPARK-39346, Spark SQL binds several exceptions to the internal error, > and produces different guidance on dealing with the exception. This assumes > these exceptions are only used for noticing internal bugs. > This applies to "connectors" as well, and introduces side-effect on the error > log. For Kafka data source, it is a breaking and unacceptable change, because > there is an important use case Kafka data source determines a case of > "dataloss", and throws IllegalStateException with instruction message on > workaround. > I mentioned this as "important" use case, because it can even happen with > some valid scenarios - streaming query has some maintenance period and > Kafka's retention on topic removes some records in the meanwhile. > Two problems arise: > 1) This does not mean Spark has a bug and end users have to report, hence the > guidance message on internal error is misleading. > 2) Most importantly, instruction message is shown after a long stack trace. > With the modification of existing test suite, I see the message being > appeared in "line 90" of the error log. > We should roll the right error message back, at least for Kafka's case. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39412) IllegalStateException from connector does not work well with error class framework
[ https://issues.apache.org/jira/browse/SPARK-39412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39412: Assignee: Apache Spark > IllegalStateException from connector does not work well with error class > framework > -- > > Key: SPARK-39412 > URL: https://issues.apache.org/jira/browse/SPARK-39412 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.3.0 >Reporter: Jungtaek Lim >Assignee: Apache Spark >Priority: Blocker > Attachments: kafka-dataloss-error-msg-in-spark-3-2.log, > kafka-dataloss-error-msg-in-spark-3-3-or-master.log > > > With SPARK-39346, Spark SQL binds several exceptions to the internal error, > and produces different guidance on dealing with the exception. This assumes > these exceptions are only used for noticing internal bugs. > This applies to "connectors" as well, and introduces side-effect on the error > log. For Kafka data source, it is a breaking and unacceptable change, because > there is an important use case Kafka data source determines a case of > "dataloss", and throws IllegalStateException with instruction message on > workaround. > I mentioned this as "important" use case, because it can even happen with > some valid scenarios - streaming query has some maintenance period and > Kafka's retention on topic removes some records in the meanwhile. > Two problems arise: > 1) This does not mean Spark has a bug and end users have to report, hence the > guidance message on internal error is misleading. > 2) Most importantly, instruction message is shown after a long stack trace. > With the modification of existing test suite, I see the message being > appeared in "line 90" of the error log. > We should roll the right error message back, at least for Kafka's case. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39413) Capitalize sql keywords in JDBCV2Suite
[ https://issues.apache.org/jira/browse/SPARK-39413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39413: Assignee: (was: Apache Spark) > Capitalize sql keywords in JDBCV2Suite > -- > > Key: SPARK-39413 > URL: https://issues.apache.org/jira/browse/SPARK-39413 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > > JDBCV2Suite exists some test case which uses sql keywords without capitalized. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39413) Capitalize sql keywords in JDBCV2Suite
[ https://issues.apache.org/jira/browse/SPARK-39413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39413: Assignee: Apache Spark > Capitalize sql keywords in JDBCV2Suite > -- > > Key: SPARK-39413 > URL: https://issues.apache.org/jira/browse/SPARK-39413 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > > JDBCV2Suite exists some test case which uses sql keywords without capitalized. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39413) Capitalize sql keywords in JDBCV2Suite
[ https://issues.apache.org/jira/browse/SPARK-39413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551464#comment-17551464 ] Apache Spark commented on SPARK-39413: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/36805 > Capitalize sql keywords in JDBCV2Suite > -- > > Key: SPARK-39413 > URL: https://issues.apache.org/jira/browse/SPARK-39413 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > > JDBCV2Suite exists some test case which uses sql keywords without capitalized. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39412) IllegalStateException from connector does not work well with error class framework
[ https://issues.apache.org/jira/browse/SPARK-39412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551463#comment-17551463 ] Apache Spark commented on SPARK-39412: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/36804 > IllegalStateException from connector does not work well with error class > framework > -- > > Key: SPARK-39412 > URL: https://issues.apache.org/jira/browse/SPARK-39412 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.3.0 >Reporter: Jungtaek Lim >Priority: Blocker > Attachments: kafka-dataloss-error-msg-in-spark-3-2.log, > kafka-dataloss-error-msg-in-spark-3-3-or-master.log > > > With SPARK-39346, Spark SQL binds several exceptions to the internal error, > and produces different guidance on dealing with the exception. This assumes > these exceptions are only used for noticing internal bugs. > This applies to "connectors" as well, and introduces side-effect on the error > log. For Kafka data source, it is a breaking and unacceptable change, because > there is an important use case Kafka data source determines a case of > "dataloss", and throws IllegalStateException with instruction message on > workaround. > I mentioned this as "important" use case, because it can even happen with > some valid scenarios - streaming query has some maintenance period and > Kafka's retention on topic removes some records in the meanwhile. > Two problems arise: > 1) This does not mean Spark has a bug and end users have to report, hence the > guidance message on internal error is misleading. > 2) Most importantly, instruction message is shown after a long stack trace. > With the modification of existing test suite, I see the message being > appeared in "line 90" of the error log. > We should roll the right error message back, at least for Kafka's case. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39411) Release candidates do not have the correct version for PySpark
[ https://issues.apache.org/jira/browse/SPARK-39411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-39411. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36803 [https://github.com/apache/spark/pull/36803] > Release candidates do not have the correct version for PySpark > -- > > Key: SPARK-39411 > URL: https://issues.apache.org/jira/browse/SPARK-39411 > Project: Spark > Issue Type: Bug > Components: Build, PySpark >Affects Versions: 3.3.1 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Critical > Fix For: 3.4.0 > > > https://github.com/apache/spark/blob/v3.3.0-rc5/dev/create-release/release-tag.sh#L88 > fails to replace the version in > https://github.com/apache/spark/blob/v3.3.0-rc5/python/pyspark/version.py#L19 > because now we have {code}: str ={code} hint ... -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org