[jira] [Resolved] (SPARK-39417) Handle Null partition values in PartitioningUtils

2022-06-08 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-39417.

Fix Version/s: 3.3.0
   3.4.0
   Resolution: Fixed

> Handle Null partition values in PartitioningUtils
> -
>
> Key: SPARK-39417
> URL: https://issues.apache.org/jira/browse/SPARK-39417
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Prashant Singh
>Assignee: Prashant Singh
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
>
> partitions with null values we get a NPE on partition discovery, earlier we 
> use to get `DEFAULT_PARTITION_NAME`
>  
> {quote} [info]   java.lang.NullPointerException:
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362)
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355)
> [info]   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943){quote}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39417) Handle Null partition values in PartitioningUtils

2022-06-08 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-39417:
---

Assignee: Prashant Singh

> Handle Null partition values in PartitioningUtils
> -
>
> Key: SPARK-39417
> URL: https://issues.apache.org/jira/browse/SPARK-39417
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Prashant Singh
>Assignee: Prashant Singh
>Priority: Major
>
> partitions with null values we get a NPE on partition discovery, earlier we 
> use to get `DEFAULT_PARTITION_NAME`
>  
> {quote} [info]   java.lang.NullPointerException:
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362)
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355)
> [info]   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943){quote}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39415) Local mode supports HadoopDelegationTokenManager

2022-06-08 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl resolved SPARK-39415.

Resolution: Duplicate

> Local mode supports HadoopDelegationTokenManager
> 
>
> Key: SPARK-39415
> URL: https://issues.apache.org/jira/browse/SPARK-39415
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: dzcxzl
>Priority: Minor
>
> Now in the kerberos environment, using spark-submit --master=local 
> --proxy-user xxx cannot access Hive Meta Store, and using --keytab will not 
> automatically relogin.
> {code:java}
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1743)
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:483)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39421) Sphinx build fails with "node class 'meta' is already registered, its visitors will be overridden"

2022-06-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-39421.
--
Fix Version/s: 3.3.0
   3.2.2
   3.4.0
 Assignee: Hyukjin Kwon
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/36813

> Sphinx build fails with "node class 'meta' is already registered, its 
> visitors will be overridden"
> --
>
> Key: SPARK-39421
> URL: https://issues.apache.org/jira/browse/SPARK-39421
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.0.3, 3.1.2, 3.2.1, 3.3.0, 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.4.0
>
>
> {code}
> Moving to python/docs directory and building sphinx.
> Running Sphinx v3.0.4
> WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It 
> is required to set this environment variable to '1' in both driver and 
> executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you 
> but it does not work if there is a Spark context already launched.
> /__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: 
> Warning: Latest version of pandas(>=1.4.0) is required to generate the 
> documentation; however, your version was 1.3.5
>   warnings.warn(
> Warning, treated as error:
> node class 'meta' is already registered, its visitors will be overridden
> make: *** [Makefile:35: html] Error 2
> 
>   Jekyll 4.2.1   Please append `--trace` to the `build` command 
>  for any additional information or backtrace. 
> 
> {code}
> Sphinx build fails apparently with the latest docutils (see also 
> https://issues.apache.org/jira/browse/FLINK-24662). we should pin the version.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39421) Sphinx build fails with "node class 'meta' is already registered, its visitors will be overridden"

2022-06-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-39421:
-
Affects Version/s: (was: 3.1.2)
   (was: 3.0.3)

> Sphinx build fails with "node class 'meta' is already registered, its 
> visitors will be overridden"
> --
>
> Key: SPARK-39421
> URL: https://issues.apache.org/jira/browse/SPARK-39421
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.2.1, 3.3.0, 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.4.0
>
>
> {code}
> Moving to python/docs directory and building sphinx.
> Running Sphinx v3.0.4
> WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It 
> is required to set this environment variable to '1' in both driver and 
> executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you 
> but it does not work if there is a Spark context already launched.
> /__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: 
> Warning: Latest version of pandas(>=1.4.0) is required to generate the 
> documentation; however, your version was 1.3.5
>   warnings.warn(
> Warning, treated as error:
> node class 'meta' is already registered, its visitors will be overridden
> make: *** [Makefile:35: html] Error 2
> 
>   Jekyll 4.2.1   Please append `--trace` to the `build` command 
>  for any additional information or backtrace. 
> 
> {code}
> Sphinx build fails apparently with the latest docutils (see also 
> https://issues.apache.org/jira/browse/FLINK-24662). we should pin the version.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39425) Add migration guide for PS behavior changes

2022-06-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551954#comment-17551954
 ] 

Apache Spark commented on SPARK-39425:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36816

> Add migration guide for PS behavior changes
> ---
>
> Key: SPARK-39425
> URL: https://issues.apache.org/jira/browse/SPARK-39425
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39425) Add migration guide for PS behavior changes

2022-06-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39425:


Assignee: Apache Spark

> Add migration guide for PS behavior changes
> ---
>
> Key: SPARK-39425
> URL: https://issues.apache.org/jira/browse/SPARK-39425
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39425) Add migration guide for PS behavior changes

2022-06-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551953#comment-17551953
 ] 

Apache Spark commented on SPARK-39425:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36816

> Add migration guide for PS behavior changes
> ---
>
> Key: SPARK-39425
> URL: https://issues.apache.org/jira/browse/SPARK-39425
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39425) Add migration guide for PS behavior changes

2022-06-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39425:


Assignee: (was: Apache Spark)

> Add migration guide for PS behavior changes
> ---
>
> Key: SPARK-39425
> URL: https://issues.apache.org/jira/browse/SPARK-39425
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39426) Subquery star select creates broken plan in case of self join

2022-06-08 Thread Denis (Jira)
Denis created SPARK-39426:
-

 Summary: Subquery star select creates broken plan in case of self 
join
 Key: SPARK-39426
 URL: https://issues.apache.org/jira/browse/SPARK-39426
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.1
Reporter: Denis


Subquery star select creates broken plan in case of self join

How to reproduce: 
{code:java}
import spark.implicits._
spark.sparkContext.setCheckpointDir(Files.createTempDirectory("some-prefix").toFile.toString)
val frame = Seq(1).toDF("id").checkpoint()
val joined = frame
.join(frame, Seq("id"), "left")
.select("id")

joined
.join(joined, Seq("id"), "left")
.as("a")
.select("a.*"){code}
This query throws exception: 
{code:java}
Exception in thread "main" org.apache.spark.sql.AnalysisException: Resolved 
attribute(s) id#7 missing from id#10,id#11 in operator !Project [id#7, id#10]. 
Attribute(s) with the same name appear in the operation: id. Please check if 
the right attribute(s) are used.;
Project [id#10, id#4]
+- SubqueryAlias a
   +- Project [id#10, id#4]
      +- Join LeftOuter, (id#4 = id#10)
         :- Project [id#4]
         :  +- Project [id#7, id#4]
         :     +- Join LeftOuter, (id#4 = id#7)
         :        :- LogicalRDD [id#4], false
         :        +- LogicalRDD [id#7], false
         +- Project [id#10]
            +- !Project [id#7, id#10]
               +- Join LeftOuter, (id#10 = id#11)
                  :- LogicalRDD [id#10], false
                  +- LogicalRDD [id#11], false    at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:51)
    at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:50)
    at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:182)
    at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:471)
    at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94)
    at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263)
    at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:262)
    at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:262)
    at scala.collection.Iterator.foreach(Iterator.scala:943)
    at scala.collection.Iterator.foreach$(Iterator.scala:943)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
    at scala.collection.IterableLike.foreach(IterableLike.scala:74)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
    at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:262)
    at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:262)
    at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:262)
    at scala.collection.Iterator.foreach(Iterator.scala:943)
    at scala.collection.Iterator.foreach$(Iterator.scala:943)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
    at scala.collection.IterableLike.foreach(IterableLike.scala:74)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
    at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:262)
    at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:262)
    at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:262)
    at scala.collection.Iterator.foreach(Iterator.scala:943)
    at scala.collection.Iterator.foreach$(Iterator.scala:943)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
    at scala.collection.IterableLike.foreach(IterableLike.scala:74)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
    at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:262)
    at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:262)
    at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:262)
    at scala.collection.Iterator.foreach(Iterator.scala:943)
    at scala.collection.Iterator.foreach$(Iterator.scala:943)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
    at scala.collection.IterableLike.foreach(IterableLike.scala:74)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
    at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:262)
    at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode

[jira] [Commented] (SPARK-39424) `Run documentation build` failed in master branch GA

2022-06-08 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551934#comment-17551934
 ] 

Yang Jie commented on SPARK-39424:
--

Thanks ~

> `Run documentation build` failed in master branch GA
> 
>
> Key: SPARK-39424
> URL: https://issues.apache.org/jira/browse/SPARK-39424
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> /__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: 
> Warning: Latest version of pandas(>=1.4.0) is required to generate the 
> documentation; however, your version was 1.3.5
>   warnings.warn(
> Warning, treated as error:
> node class 'meta' is already registered, its visitors will be overridden
> make: *** [Makefile:35: html] Error 2
>                     
>       Jekyll 4.2.1   Please append `--trace` to the `build` command 
>                      for any additional information or backtrace. 
>                     
> /__w/spark/spark/docs/_plugins/copy_api_dirs.rb:130:in `': 
> Python doc generation failed (RuntimeError)
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
>  `require'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
>  `block in require_with_graceful_fail'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
>  `each'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
>  `require_with_graceful_fail'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in
>  `block in require_plugin_files'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
>  `each'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
>  `require_plugin_files'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in
>  `conscientious_require'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in
>  `setup'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:36:in
>  `initialize'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in
>  `new'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in
>  `process'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in
>  `block in process_with_graceful_fail'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in
>  `each'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in
>  `process_with_graceful_fail'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:18:in
>  `block (2 levels) in init_with_program'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in
>  `block in execute'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in
>  `each'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in
>  `execute'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/program.rb:44:in
>  `go'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary.rb:21:in
>  `program'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/exe/jekyll:15:in
>  `'
>     from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in 
> `load'
>     from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in 
> `'
> Error: Process completed with exit code 1. {code}
> The latest builds seem to have failed:
>  * [https://github.com/apache/spark/runs/6803919840?check_suite_focus=true]
>  * [https://github.com/apache/spark/runs/6799560292?check_suite_focus=true]
>  * [https://github.com/apache/spark/runs/6801448545?check_suite_focus=true]
>  * [https://github.com/apache/spark/runs/6803919840?check_suite_focus=true]
>  
>  



--
This messag

[jira] [Resolved] (SPARK-39424) `Run documentation build` failed in master branch GA

2022-06-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-39424.
--
Resolution: Duplicate

Thanks! I actually already created a JIRA and fix :-)


> `Run documentation build` failed in master branch GA
> 
>
> Key: SPARK-39424
> URL: https://issues.apache.org/jira/browse/SPARK-39424
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> /__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: 
> Warning: Latest version of pandas(>=1.4.0) is required to generate the 
> documentation; however, your version was 1.3.5
>   warnings.warn(
> Warning, treated as error:
> node class 'meta' is already registered, its visitors will be overridden
> make: *** [Makefile:35: html] Error 2
>                     
>       Jekyll 4.2.1   Please append `--trace` to the `build` command 
>                      for any additional information or backtrace. 
>                     
> /__w/spark/spark/docs/_plugins/copy_api_dirs.rb:130:in `': 
> Python doc generation failed (RuntimeError)
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
>  `require'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
>  `block in require_with_graceful_fail'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
>  `each'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
>  `require_with_graceful_fail'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in
>  `block in require_plugin_files'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
>  `each'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
>  `require_plugin_files'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in
>  `conscientious_require'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in
>  `setup'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:36:in
>  `initialize'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in
>  `new'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in
>  `process'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in
>  `block in process_with_graceful_fail'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in
>  `each'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in
>  `process_with_graceful_fail'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:18:in
>  `block (2 levels) in init_with_program'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in
>  `block in execute'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in
>  `each'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in
>  `execute'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/program.rb:44:in
>  `go'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary.rb:21:in
>  `program'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/exe/jekyll:15:in
>  `'
>     from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in 
> `load'
>     from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in 
> `'
> Error: Process completed with exit code 1. {code}
> The latest builds seem to have failed:
>  * [https://github.com/apache/spark/runs/6803919840?check_suite_focus=true]
>  * [https://github.com/apache/spark/runs/6799560292?check_suite_focus=true]
>  * [https://github.com/apache/spark/runs/6801448545?check_suite_focus=true]
>  * [https://github.com/apache/spark/runs/6803919840?check_suite_focus=true]

[jira] [Created] (SPARK-39425) Add migration guide for PS behavior changes

2022-06-08 Thread Yikun Jiang (Jira)
Yikun Jiang created SPARK-39425:
---

 Summary: Add migration guide for PS behavior changes
 Key: SPARK-39425
 URL: https://issues.apache.org/jira/browse/SPARK-39425
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, Pandas API on Spark
Affects Versions: 3.4.0
Reporter: Yikun Jiang






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39424) `Run documentation build` failed in master branch GA

2022-06-08 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551925#comment-17551925
 ] 

Yang Jie commented on SPARK-39424:
--

cc [~hyukjin.kwon] 

> `Run documentation build` failed in master branch GA
> 
>
> Key: SPARK-39424
> URL: https://issues.apache.org/jira/browse/SPARK-39424
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> /__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: 
> Warning: Latest version of pandas(>=1.4.0) is required to generate the 
> documentation; however, your version was 1.3.5
>   warnings.warn(
> Warning, treated as error:
> node class 'meta' is already registered, its visitors will be overridden
> make: *** [Makefile:35: html] Error 2
>                     
>       Jekyll 4.2.1   Please append `--trace` to the `build` command 
>                      for any additional information or backtrace. 
>                     
> /__w/spark/spark/docs/_plugins/copy_api_dirs.rb:130:in `': 
> Python doc generation failed (RuntimeError)
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
>  `require'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
>  `block in require_with_graceful_fail'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
>  `each'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
>  `require_with_graceful_fail'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in
>  `block in require_plugin_files'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
>  `each'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
>  `require_plugin_files'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in
>  `conscientious_require'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in
>  `setup'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:36:in
>  `initialize'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in
>  `new'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in
>  `process'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in
>  `block in process_with_graceful_fail'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in
>  `each'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in
>  `process_with_graceful_fail'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:18:in
>  `block (2 levels) in init_with_program'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in
>  `block in execute'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in
>  `each'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in
>  `execute'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/program.rb:44:in
>  `go'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary.rb:21:in
>  `program'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/exe/jekyll:15:in
>  `'
>     from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in 
> `load'
>     from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in 
> `'
> Error: Process completed with exit code 1. {code}
> The latest builds seem to have failed:
>  * [https://github.com/apache/spark/runs/6803919840?check_suite_focus=true]
>  * [https://github.com/apache/spark/runs/6799560292?check_suite_focus=true]
>  * [https://github.com/apache/spark/runs/6801448545?check_suite_focus=true]
>  * [https://github.com/apache/spark/runs/6803919840?check_suite_focus=true]
>  
>  



--

[jira] [Created] (SPARK-39424) `Run documentation build` failed in master branch GA

2022-06-08 Thread Yang Jie (Jira)
Yang Jie created SPARK-39424:


 Summary: `Run documentation build` failed in master branch GA
 Key: SPARK-39424
 URL: https://issues.apache.org/jira/browse/SPARK-39424
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.4.0
Reporter: Yang Jie


{code:java}
/__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: 
Warning: Latest version of pandas(>=1.4.0) is required to generate the 
documentation; however, your version was 1.3.5
  warnings.warn(
Warning, treated as error:
node class 'meta' is already registered, its visitors will be overridden
make: *** [Makefile:35: html] Error 2
                    
      Jekyll 4.2.1   Please append `--trace` to the `build` command 
                     for any additional information or backtrace. 
                    
/__w/spark/spark/docs/_plugins/copy_api_dirs.rb:130:in `': 
Python doc generation failed (RuntimeError)
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
 `require'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
 `block in require_with_graceful_fail'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
 `each'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
 `require_with_graceful_fail'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in
 `block in require_plugin_files'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
 `each'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
 `require_plugin_files'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in
 `conscientious_require'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in
 `setup'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:36:in
 `initialize'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in
 `new'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in
 `process'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in
 `block in process_with_graceful_fail'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in
 `each'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in
 `process_with_graceful_fail'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:18:in
 `block (2 levels) in init_with_program'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in
 `block in execute'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in
 `each'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in
 `execute'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/program.rb:44:in
 `go'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary.rb:21:in
 `program'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/exe/jekyll:15:in
 `'
    from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in 
`load'
    from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in 
`'
Error: Process completed with exit code 1. {code}
The latest builds seem to have failed:
 * [https://github.com/apache/spark/runs/6803919840?check_suite_focus=true]
 * [https://github.com/apache/spark/runs/6799560292?check_suite_focus=true]
 * [https://github.com/apache/spark/runs/6801448545?check_suite_focus=true]
 * [https://github.com/apache/spark/runs/6803919840?check_suite_focus=true]

 

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39236) Make CreateTable API and ListTables API compatible

2022-06-08 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-39236.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36586
[https://github.com/apache/spark/pull/36586]

> Make CreateTable API and ListTables API compatible 
> ---
>
> Key: SPARK-39236
> URL: https://issues.apache.org/jira/browse/SPARK-39236
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.0
>
>
> https://github.com/apache/spark/blob/c6dccc7dd412a95007f5bb2584d69b85ff9ebf8e/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala#L364
> https://github.com/apache/spark/blob/c6dccc7dd412a95007f5bb2584d69b85ff9ebf8e/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala#L99



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39236) Make CreateTable API and ListTables API compatible

2022-06-08 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-39236:
---

Assignee: Rui Wang

> Make CreateTable API and ListTables API compatible 
> ---
>
> Key: SPARK-39236
> URL: https://issues.apache.org/jira/browse/SPARK-39236
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>
> https://github.com/apache/spark/blob/c6dccc7dd412a95007f5bb2584d69b85ff9ebf8e/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala#L364
> https://github.com/apache/spark/blob/c6dccc7dd412a95007f5bb2584d69b85ff9ebf8e/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala#L99



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39423) Spark Sql create table using jbdc add preSql option

2022-06-08 Thread WeiNan Zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WeiNan Zhao updated SPARK-39423:

Description: 
In my recent process of using Spark Sql, I tried to use create using jdbc to 
create a spark table that I need to use, but I may need to consider that before 
inserting, I need to delete some of the previous data, so I want to be able to 
expose This option, the use example can refer to the following picture.

And I can submit a pullrequest to solve this problem. please assign to me. 
Thanks.

!image-2022-06-09-10-48-05-347.png!

  was:
In my recent process of using Spark Sql, I tried to use create using jdbc to 
create a spark table that I need to use, but I may need to consider that before 
inserting, I need to delete some of the previous data, so I want to be able to 
expose This option, the use example can refer to the following picture.

 

!image-2022-06-09-10-47-25-558.png!


> Spark Sql create table using jbdc add preSql option
> ---
>
> Key: SPARK-39423
> URL: https://issues.apache.org/jira/browse/SPARK-39423
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: WeiNan Zhao
>Priority: Major
> Attachments: image-2022-06-09-10-48-05-347.png
>
>
> In my recent process of using Spark Sql, I tried to use create using jdbc to 
> create a spark table that I need to use, but I may need to consider that 
> before inserting, I need to delete some of the previous data, so I want to be 
> able to expose This option, the use example can refer to the following 
> picture.
> And I can submit a pullrequest to solve this problem. please assign to me. 
> Thanks.
> !image-2022-06-09-10-48-05-347.png!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39423) Spark Sql create table using jbdc add preSql option

2022-06-08 Thread WeiNan Zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WeiNan Zhao updated SPARK-39423:

Attachment: image-2022-06-09-10-48-05-347.png

> Spark Sql create table using jbdc add preSql option
> ---
>
> Key: SPARK-39423
> URL: https://issues.apache.org/jira/browse/SPARK-39423
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: WeiNan Zhao
>Priority: Major
> Attachments: image-2022-06-09-10-48-05-347.png
>
>
> In my recent process of using Spark Sql, I tried to use create using jdbc to 
> create a spark table that I need to use, but I may need to consider that 
> before inserting, I need to delete some of the previous data, so I want to be 
> able to expose This option, the use example can refer to the following 
> picture.
>  
> !image-2022-06-09-10-47-25-558.png!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39423) Spark Sql create table using jbdc add preSql option

2022-06-08 Thread WeiNan Zhao (Jira)
WeiNan Zhao created SPARK-39423:
---

 Summary: Spark Sql create table using jbdc add preSql option
 Key: SPARK-39423
 URL: https://issues.apache.org/jira/browse/SPARK-39423
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.1
Reporter: WeiNan Zhao


In my recent process of using Spark Sql, I tried to use create using jdbc to 
create a spark table that I need to use, but I may need to consider that before 
inserting, I need to delete some of the previous data, so I want to be able to 
expose This option, the use example can refer to the following picture.

 

!image-2022-06-09-10-47-25-558.png!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37670) Support predicate pushdown and column pruning for de-duped CTEs

2022-06-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551914#comment-17551914
 ] 

Apache Spark commented on SPARK-37670:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36815

> Support predicate pushdown and column pruning for de-duped CTEs
> ---
>
> Key: SPARK-37670
> URL: https://issues.apache.org/jira/browse/SPARK-37670
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wei Xue
>Assignee: Wei Xue
>Priority: Major
> Fix For: 3.3.0, 3.2.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39422) SHOW CREATE TABLE should suggest 'AS SERDE' for Hive tables with unsupported serde configurations

2022-06-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551909#comment-17551909
 ] 

Apache Spark commented on SPARK-39422:
--

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/36814

> SHOW CREATE TABLE should suggest 'AS SERDE' for Hive tables with unsupported 
> serde configurations
> -
>
> Key: SPARK-39422
> URL: https://issues.apache.org/jira/browse/SPARK-39422
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Minor
>
> If you run `SHOW CREATE TABLE` against a Hive table which uses an unsupported 
> Serde configuration, Spark will return an error message like
> {code:java}
> org.apache.spark.sql.AnalysisException: Failed to execute SHOW CREATE TABLE 
> against table rcFileTable, which is created by Hive and uses the following 
> unsupported serde configuration
>  SERDE: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe 
> INPUTFORMAT: org.apache.hadoop.hive.ql.io.RCFileInputFormat OUTPUTFORMAT: 
> org.apache.hadoop.hive.ql.io.RCFileOutputFormat {code}
> which is confusing to end users.
> In this situation, I think the error should suggest `SHOW CREATE TABLE ... AS 
> SERDE` to users (similar to other error messages in this code path).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39422) SHOW CREATE TABLE should suggest 'AS SERDE' for Hive tables with unsupported serde configurations

2022-06-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551907#comment-17551907
 ] 

Apache Spark commented on SPARK-39422:
--

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/36814

> SHOW CREATE TABLE should suggest 'AS SERDE' for Hive tables with unsupported 
> serde configurations
> -
>
> Key: SPARK-39422
> URL: https://issues.apache.org/jira/browse/SPARK-39422
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Minor
>
> If you run `SHOW CREATE TABLE` against a Hive table which uses an unsupported 
> Serde configuration, Spark will return an error message like
> {code:java}
> org.apache.spark.sql.AnalysisException: Failed to execute SHOW CREATE TABLE 
> against table rcFileTable, which is created by Hive and uses the following 
> unsupported serde configuration
>  SERDE: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe 
> INPUTFORMAT: org.apache.hadoop.hive.ql.io.RCFileInputFormat OUTPUTFORMAT: 
> org.apache.hadoop.hive.ql.io.RCFileOutputFormat {code}
> which is confusing to end users.
> In this situation, I think the error should suggest `SHOW CREATE TABLE ... AS 
> SERDE` to users (similar to other error messages in this code path).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39422) SHOW CREATE TABLE should suggest 'AS SERDE' for Hive tables with unsupported serde configurations

2022-06-08 Thread Josh Rosen (Jira)
Josh Rosen created SPARK-39422:
--

 Summary: SHOW CREATE TABLE should suggest 'AS SERDE' for Hive 
tables with unsupported serde configurations
 Key: SPARK-39422
 URL: https://issues.apache.org/jira/browse/SPARK-39422
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
 Environment: If you run `SHOW CREATE TABLE` against a Hive table which 
uses an unsupported Serde configuration, Spark will return an error message like
{code:java}
org.apache.spark.sql.AnalysisException: Failed to execute SHOW CREATE TABLE 
against table rcFileTable, which is created by Hive and uses the following 
unsupported serde configuration
 SERDE: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe 
INPUTFORMAT: org.apache.hadoop.hive.ql.io.RCFileInputFormat OUTPUTFORMAT: 
org.apache.hadoop.hive.ql.io.RCFileOutputFormat {code}
which is confusing to end users.

In this situation, I think the error should suggest `SHOW CREATE TABLE ... AS 
SERDE` to users (similar to other error messages in this code path).
Reporter: Josh Rosen






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39422) SHOW CREATE TABLE should suggest 'AS SERDE' for Hive tables with unsupported serde configurations

2022-06-08 Thread Josh Rosen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-39422:
---
Priority: Minor  (was: Major)

> SHOW CREATE TABLE should suggest 'AS SERDE' for Hive tables with unsupported 
> serde configurations
> -
>
> Key: SPARK-39422
> URL: https://issues.apache.org/jira/browse/SPARK-39422
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Minor
>
> If you run `SHOW CREATE TABLE` against a Hive table which uses an unsupported 
> Serde configuration, Spark will return an error message like
> {code:java}
> org.apache.spark.sql.AnalysisException: Failed to execute SHOW CREATE TABLE 
> against table rcFileTable, which is created by Hive and uses the following 
> unsupported serde configuration
>  SERDE: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe 
> INPUTFORMAT: org.apache.hadoop.hive.ql.io.RCFileInputFormat OUTPUTFORMAT: 
> org.apache.hadoop.hive.ql.io.RCFileOutputFormat {code}
> which is confusing to end users.
> In this situation, I think the error should suggest `SHOW CREATE TABLE ... AS 
> SERDE` to users (similar to other error messages in this code path).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39422) SHOW CREATE TABLE should suggest 'AS SERDE' for Hive tables with unsupported serde configurations

2022-06-08 Thread Josh Rosen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-39422:
---
Description: 
If you run `SHOW CREATE TABLE` against a Hive table which uses an unsupported 
Serde configuration, Spark will return an error message like
{code:java}
org.apache.spark.sql.AnalysisException: Failed to execute SHOW CREATE TABLE 
against table rcFileTable, which is created by Hive and uses the following 
unsupported serde configuration
 SERDE: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe 
INPUTFORMAT: org.apache.hadoop.hive.ql.io.RCFileInputFormat OUTPUTFORMAT: 
org.apache.hadoop.hive.ql.io.RCFileOutputFormat {code}
which is confusing to end users.

In this situation, I think the error should suggest `SHOW CREATE TABLE ... AS 
SERDE` to users (similar to other error messages in this code path).

> SHOW CREATE TABLE should suggest 'AS SERDE' for Hive tables with unsupported 
> serde configurations
> -
>
> Key: SPARK-39422
> URL: https://issues.apache.org/jira/browse/SPARK-39422
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Major
>
> If you run `SHOW CREATE TABLE` against a Hive table which uses an unsupported 
> Serde configuration, Spark will return an error message like
> {code:java}
> org.apache.spark.sql.AnalysisException: Failed to execute SHOW CREATE TABLE 
> against table rcFileTable, which is created by Hive and uses the following 
> unsupported serde configuration
>  SERDE: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe 
> INPUTFORMAT: org.apache.hadoop.hive.ql.io.RCFileInputFormat OUTPUTFORMAT: 
> org.apache.hadoop.hive.ql.io.RCFileOutputFormat {code}
> which is confusing to end users.
> In this situation, I think the error should suggest `SHOW CREATE TABLE ... AS 
> SERDE` to users (similar to other error messages in this code path).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39422) SHOW CREATE TABLE should suggest 'AS SERDE' for Hive tables with unsupported serde configurations

2022-06-08 Thread Josh Rosen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen reassigned SPARK-39422:
--

Assignee: Josh Rosen

> SHOW CREATE TABLE should suggest 'AS SERDE' for Hive tables with unsupported 
> serde configurations
> -
>
> Key: SPARK-39422
> URL: https://issues.apache.org/jira/browse/SPARK-39422
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
> Environment: If you run `SHOW CREATE TABLE` against a Hive table 
> which uses an unsupported Serde configuration, Spark will return an error 
> message like
> {code:java}
> org.apache.spark.sql.AnalysisException: Failed to execute SHOW CREATE TABLE 
> against table rcFileTable, which is created by Hive and uses the following 
> unsupported serde configuration
>  SERDE: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe 
> INPUTFORMAT: org.apache.hadoop.hive.ql.io.RCFileInputFormat OUTPUTFORMAT: 
> org.apache.hadoop.hive.ql.io.RCFileOutputFormat {code}
> which is confusing to end users.
> In this situation, I think the error should suggest `SHOW CREATE TABLE ... AS 
> SERDE` to users (similar to other error messages in this code path).
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39422) SHOW CREATE TABLE should suggest 'AS SERDE' for Hive tables with unsupported serde configurations

2022-06-08 Thread Josh Rosen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-39422:
---
Environment: (was: If you run `SHOW CREATE TABLE` against a Hive table 
which uses an unsupported Serde configuration, Spark will return an error 
message like
{code:java}
org.apache.spark.sql.AnalysisException: Failed to execute SHOW CREATE TABLE 
against table rcFileTable, which is created by Hive and uses the following 
unsupported serde configuration
 SERDE: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe 
INPUTFORMAT: org.apache.hadoop.hive.ql.io.RCFileInputFormat OUTPUTFORMAT: 
org.apache.hadoop.hive.ql.io.RCFileOutputFormat {code}
which is confusing to end users.

In this situation, I think the error should suggest `SHOW CREATE TABLE ... AS 
SERDE` to users (similar to other error messages in this code path).)

> SHOW CREATE TABLE should suggest 'AS SERDE' for Hive tables with unsupported 
> serde configurations
> -
>
> Key: SPARK-39422
> URL: https://issues.apache.org/jira/browse/SPARK-39422
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39349) Add a CheckError() method to SparkFunSuite

2022-06-08 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-39349:
---

Assignee: Serge Rielau

> Add a CheckError() method to SparkFunSuite
> --
>
> Key: SPARK-39349
> URL: https://issues.apache.org/jira/browse/SPARK-39349
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.1
>Reporter: Serge Rielau
>Assignee: Serge Rielau
>Priority: Major
>
> We want to standardize on a generic way to QA error messages without impeding 
> the ability to enhance/rework error messages.
> CheckError() allows for efficient asserting on the "payload":
>  * Errorclass, subclass
>  * SQLState
>  * Parameters (both names and values)
>  
> It does not test the actual English text. Which is the feature



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39349) Add a CheckError() method to SparkFunSuite

2022-06-08 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-39349.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36693
[https://github.com/apache/spark/pull/36693]

> Add a CheckError() method to SparkFunSuite
> --
>
> Key: SPARK-39349
> URL: https://issues.apache.org/jira/browse/SPARK-39349
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.1
>Reporter: Serge Rielau
>Assignee: Serge Rielau
>Priority: Major
> Fix For: 3.4.0
>
>
> We want to standardize on a generic way to QA error messages without impeding 
> the ability to enhance/rework error messages.
> CheckError() allows for efficient asserting on the "payload":
>  * Errorclass, subclass
>  * SQLState
>  * Parameters (both names and values)
>  
> It does not test the actual English text. Which is the feature



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39410) Exclude rules in analyzer

2022-06-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-39410.
--
Resolution: Invalid

>  Exclude rules in analyzer
> --
>
> Key: SPARK-39410
> URL: https://issues.apache.org/jira/browse/SPARK-39410
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: shi yuhang
>Priority: Major
>
> I have found that we can use `spark.sql.optimizer.excludedRules` to exclude 
> rules in the optimizer. I'd like to have a similar capability in the analyzer.
> I don't know if it is possible or if it breaks the design of catalyst?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39410) Exclude rules in analyzer

2022-06-08 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551887#comment-17551887
 ] 

Hyukjin Kwon commented on SPARK-39410:
--

Analyzer rules cannot be excluded for Spark SQL to work. Optimizer rules can be 
because conceptually it should work without all Optimizer rules.

>  Exclude rules in analyzer
> --
>
> Key: SPARK-39410
> URL: https://issues.apache.org/jira/browse/SPARK-39410
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: shi yuhang
>Priority: Major
>
> I have found that we can use `spark.sql.optimizer.excludedRules` to exclude 
> rules in the optimizer. I'd like to have a similar capability in the analyzer.
> I don't know if it is possible or if it breaks the design of catalyst?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39420) Support ANALYZE TABLE on v2 tables

2022-06-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-39420:
-
Priority: Major  (was: Blocker)

> Support ANALYZE TABLE on v2 tables
> --
>
> Key: SPARK-39420
> URL: https://issues.apache.org/jira/browse/SPARK-39420
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 3.2.1
>Reporter: Felipe
>Priority: Major
>
> According to [https://github.com/delta-io/delta/pull/840,] to implement 
> ANALYZE TABLE in Delta, we need to add the missing APIs in Spark to allow a 
> data source to report the file set to calculate the stats.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39421) Sphinx build fails with "node class 'meta' is already registered, its visitors will be overridden"

2022-06-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39421:


Assignee: (was: Apache Spark)

> Sphinx build fails with "node class 'meta' is already registered, its 
> visitors will be overridden"
> --
>
> Key: SPARK-39421
> URL: https://issues.apache.org/jira/browse/SPARK-39421
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.0.3, 3.1.2, 3.2.1, 3.3.0, 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> Moving to python/docs directory and building sphinx.
> Running Sphinx v3.0.4
> WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It 
> is required to set this environment variable to '1' in both driver and 
> executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you 
> but it does not work if there is a Spark context already launched.
> /__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: 
> Warning: Latest version of pandas(>=1.4.0) is required to generate the 
> documentation; however, your version was 1.3.5
>   warnings.warn(
> Warning, treated as error:
> node class 'meta' is already registered, its visitors will be overridden
> make: *** [Makefile:35: html] Error 2
> 
>   Jekyll 4.2.1   Please append `--trace` to the `build` command 
>  for any additional information or backtrace. 
> 
> {code}
> Sphinx build fails apparently with the latest docutils (see also 
> https://issues.apache.org/jira/browse/FLINK-24662). we should pin the version.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39421) Sphinx build fails with "node class 'meta' is already registered, its visitors will be overridden"

2022-06-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551886#comment-17551886
 ] 

Apache Spark commented on SPARK-39421:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/36813

> Sphinx build fails with "node class 'meta' is already registered, its 
> visitors will be overridden"
> --
>
> Key: SPARK-39421
> URL: https://issues.apache.org/jira/browse/SPARK-39421
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.0.3, 3.1.2, 3.2.1, 3.3.0, 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> Moving to python/docs directory and building sphinx.
> Running Sphinx v3.0.4
> WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It 
> is required to set this environment variable to '1' in both driver and 
> executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you 
> but it does not work if there is a Spark context already launched.
> /__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: 
> Warning: Latest version of pandas(>=1.4.0) is required to generate the 
> documentation; however, your version was 1.3.5
>   warnings.warn(
> Warning, treated as error:
> node class 'meta' is already registered, its visitors will be overridden
> make: *** [Makefile:35: html] Error 2
> 
>   Jekyll 4.2.1   Please append `--trace` to the `build` command 
>  for any additional information or backtrace. 
> 
> {code}
> Sphinx build fails apparently with the latest docutils (see also 
> https://issues.apache.org/jira/browse/FLINK-24662). we should pin the version.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39421) Sphinx build fails with "node class 'meta' is already registered, its visitors will be overridden"

2022-06-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39421:


Assignee: Apache Spark

> Sphinx build fails with "node class 'meta' is already registered, its 
> visitors will be overridden"
> --
>
> Key: SPARK-39421
> URL: https://issues.apache.org/jira/browse/SPARK-39421
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.0.3, 3.1.2, 3.2.1, 3.3.0, 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> {code}
> Moving to python/docs directory and building sphinx.
> Running Sphinx v3.0.4
> WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It 
> is required to set this environment variable to '1' in both driver and 
> executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you 
> but it does not work if there is a Spark context already launched.
> /__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: 
> Warning: Latest version of pandas(>=1.4.0) is required to generate the 
> documentation; however, your version was 1.3.5
>   warnings.warn(
> Warning, treated as error:
> node class 'meta' is already registered, its visitors will be overridden
> make: *** [Makefile:35: html] Error 2
> 
>   Jekyll 4.2.1   Please append `--trace` to the `build` command 
>  for any additional information or backtrace. 
> 
> {code}
> Sphinx build fails apparently with the latest docutils (see also 
> https://issues.apache.org/jira/browse/FLINK-24662). we should pin the version.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39421) Sphinx build fails with "node class 'meta' is already registered, its visitors will be overridden"

2022-06-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-39421:
-
Affects Version/s: 3.2.1
   3.1.2
   3.0.3
   3.3.0

> Sphinx build fails with "node class 'meta' is already registered, its 
> visitors will be overridden"
> --
>
> Key: SPARK-39421
> URL: https://issues.apache.org/jira/browse/SPARK-39421
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.0.3, 3.1.2, 3.2.1, 3.3.0, 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> Moving to python/docs directory and building sphinx.
> Running Sphinx v3.0.4
> WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It 
> is required to set this environment variable to '1' in both driver and 
> executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you 
> but it does not work if there is a Spark context already launched.
> /__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: 
> Warning: Latest version of pandas(>=1.4.0) is required to generate the 
> documentation; however, your version was 1.3.5
>   warnings.warn(
> Warning, treated as error:
> node class 'meta' is already registered, its visitors will be overridden
> make: *** [Makefile:35: html] Error 2
> 
>   Jekyll 4.2.1   Please append `--trace` to the `build` command 
>  for any additional information or backtrace. 
> 
> {code}
> Sphinx build fails apparently with the latest docutils (see also 
> https://issues.apache.org/jira/browse/FLINK-24662). we should pin the version.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39421) Sphinx build fails with "node class 'meta' is already registered, its visitors will be overridden"

2022-06-08 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-39421:


 Summary: Sphinx build fails with "node class 'meta' is already 
registered, its visitors will be overridden"
 Key: SPARK-39421
 URL: https://issues.apache.org/jira/browse/SPARK-39421
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 3.4.0
 Environment: {code}
Moving to python/docs directory and building sphinx.
Running Sphinx v3.0.4
WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is 
required to set this environment variable to '1' in both driver and executor 
sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you but it 
does not work if there is a Spark context already launched.
/__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: 
Warning: Latest version of pandas(>=1.4.0) is required to generate the 
documentation; however, your version was 1.3.5
  warnings.warn(
Warning, treated as error:
node class 'meta' is already registered, its visitors will be overridden
make: *** [Makefile:35: html] Error 2

  Jekyll 4.2.1   Please append `--trace` to the `build` command 
 for any additional information or backtrace. 

{code}

Sphinx build fails apparently with the latest docutils (see also 
https://issues.apache.org/jira/browse/FLINK-24662). we should pin the version.
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39421) Sphinx build fails with "node class 'meta' is already registered, its visitors will be overridden"

2022-06-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-39421:
-
Environment: (was: {code}
Moving to python/docs directory and building sphinx.
Running Sphinx v3.0.4
WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is 
required to set this environment variable to '1' in both driver and executor 
sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you but it 
does not work if there is a Spark context already launched.
/__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: 
Warning: Latest version of pandas(>=1.4.0) is required to generate the 
documentation; however, your version was 1.3.5
  warnings.warn(
Warning, treated as error:
node class 'meta' is already registered, its visitors will be overridden
make: *** [Makefile:35: html] Error 2

  Jekyll 4.2.1   Please append `--trace` to the `build` command 
 for any additional information or backtrace. 

{code}

Sphinx build fails apparently with the latest docutils (see also 
https://issues.apache.org/jira/browse/FLINK-24662). we should pin the version.)

> Sphinx build fails with "node class 'meta' is already registered, its 
> visitors will be overridden"
> --
>
> Key: SPARK-39421
> URL: https://issues.apache.org/jira/browse/SPARK-39421
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39421) Sphinx build fails with "node class 'meta' is already registered, its visitors will be overridden"

2022-06-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-39421:
-
Description: 
{code}
Moving to python/docs directory and building sphinx.
Running Sphinx v3.0.4
WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is 
required to set this environment variable to '1' in both driver and executor 
sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you but it 
does not work if there is a Spark context already launched.
/__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: 
Warning: Latest version of pandas(>=1.4.0) is required to generate the 
documentation; however, your version was 1.3.5
  warnings.warn(
Warning, treated as error:
node class 'meta' is already registered, its visitors will be overridden
make: *** [Makefile:35: html] Error 2

  Jekyll 4.2.1   Please append `--trace` to the `build` command 
 for any additional information or backtrace. 

{code}

Sphinx build fails apparently with the latest docutils (see also 
https://issues.apache.org/jira/browse/FLINK-24662). we should pin the version.

> Sphinx build fails with "node class 'meta' is already registered, its 
> visitors will be overridden"
> --
>
> Key: SPARK-39421
> URL: https://issues.apache.org/jira/browse/SPARK-39421
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> Moving to python/docs directory and building sphinx.
> Running Sphinx v3.0.4
> WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It 
> is required to set this environment variable to '1' in both driver and 
> executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you 
> but it does not work if there is a Spark context already launched.
> /__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: 
> Warning: Latest version of pandas(>=1.4.0) is required to generate the 
> documentation; however, your version was 1.3.5
>   warnings.warn(
> Warning, treated as error:
> node class 'meta' is already registered, its visitors will be overridden
> make: *** [Makefile:35: html] Error 2
> 
>   Jekyll 4.2.1   Please append `--trace` to the `build` command 
>  for any additional information or backtrace. 
> 
> {code}
> Sphinx build fails apparently with the latest docutils (see also 
> https://issues.apache.org/jira/browse/FLINK-24662). we should pin the version.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39420) Support ANALYZE TABLE on v2 tables

2022-06-08 Thread Felipe (Jira)
Felipe created SPARK-39420:
--

 Summary: Support ANALYZE TABLE on v2 tables
 Key: SPARK-39420
 URL: https://issues.apache.org/jira/browse/SPARK-39420
 Project: Spark
  Issue Type: Improvement
  Components: Optimizer
Affects Versions: 3.2.1
Reporter: Felipe


According to [https://github.com/delta-io/delta/pull/840,] to implement ANALYZE 
TABLE in Delta, we need to add the missing APIs in Spark to allow a data source 
to report the file set to calculate the stats.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39418) DECODE docs refer to Oracle instead of Spark

2022-06-08 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-39418.
-
Resolution: Duplicate

> DECODE docs refer to Oracle instead of Spark
> 
>
> Key: SPARK-39418
> URL: https://issues.apache.org/jira/browse/SPARK-39418
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.2.0
>Reporter: Serge Rielau
>Priority: Critical
>
> [https://spark.apache.org/docs/latest/api/sql/index.html#decode]
> If no match is found, then {color:#de350b}Oracle{color} returns default. If 
> default is omitted, returns null.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39418) DECODE docs refer to Oracle instead of Spark

2022-06-08 Thread Wenchen Fan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551877#comment-17551877
 ] 

Wenchen Fan commented on SPARK-39418:
-

yea this has been fixed by https://issues.apache.org/jira/browse/SPARK-39286

> DECODE docs refer to Oracle instead of Spark
> 
>
> Key: SPARK-39418
> URL: https://issues.apache.org/jira/browse/SPARK-39418
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.2.0
>Reporter: Serge Rielau
>Priority: Critical
>
> [https://spark.apache.org/docs/latest/api/sql/index.html#decode]
> If no match is found, then {color:#de350b}Oracle{color} returns default. If 
> default is omitted, returns null.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39400) spark-sql remain hive resource download dir after exit

2022-06-08 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-39400.
-
Resolution: Fixed

Issue resolved by pull request 36786
[https://github.com/apache/spark/pull/36786]

> spark-sql remain hive resource download dir after exit
> --
>
> Key: SPARK-39400
> URL: https://issues.apache.org/jira/browse/SPARK-39400
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> drwxrwxr-x  2 yi.zhu   yi.zhu4096 Jun  7 18:06 
> da92eec4-2db1-4941-9e53-b28c38e25e31_resources
> drwxrwxr-x  2 yi.zhu   yi.zhu4096 Jun  7 18:14 
> dad364e8-ed1d-4ced-a6df-4897361c69b1_resources
> drwxrwxr-x  2 yi.zhu   yi.zhu4096 Jun  7 18:13 
> ee0a2ee7-ff3e-4346-9181-e8e491b1ca15_resources
> drwxr-xr-x  2 yi.zhu   yi.zhu4096 Jun  7 18:16 
> hsperfdata_yi.zhu
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39400) spark-sql remain hive resource download dir after exit

2022-06-08 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reassigned SPARK-39400:
---

Assignee: angerszhu

> spark-sql remain hive resource download dir after exit
> --
>
> Key: SPARK-39400
> URL: https://issues.apache.org/jira/browse/SPARK-39400
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> drwxrwxr-x  2 yi.zhu   yi.zhu4096 Jun  7 18:06 
> da92eec4-2db1-4941-9e53-b28c38e25e31_resources
> drwxrwxr-x  2 yi.zhu   yi.zhu4096 Jun  7 18:14 
> dad364e8-ed1d-4ced-a6df-4897361c69b1_resources
> drwxrwxr-x  2 yi.zhu   yi.zhu4096 Jun  7 18:13 
> ee0a2ee7-ff3e-4346-9181-e8e491b1ca15_resources
> drwxr-xr-x  2 yi.zhu   yi.zhu4096 Jun  7 18:16 
> hsperfdata_yi.zhu
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39419) When the comparator of ArraySort returns null, it should fail.

2022-06-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39419:


Assignee: (was: Apache Spark)

> When the comparator of ArraySort returns null, it should fail.
> --
>
> Key: SPARK-39419
> URL: https://issues.apache.org/jira/browse/SPARK-39419
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> When the comparator of {{ArraySort}} returns {{null}}, currently it handles 
> it as {{0}} (equal).
> According to the doc, 
> {quote}
> It returns -1, 0, or 1 as the first element is less than, equal to, or 
> greater than the second element. If the comparator function returns other 
> values (including null), the function will fail and raise an error.
> {quote}
> It's fine to return non -1, 0, 1 integers to follow the Java convention 
> (still need to update the doc, though), but it should throw an exception for 
> {{null}} result.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39419) When the comparator of ArraySort returns null, it should fail.

2022-06-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551867#comment-17551867
 ] 

Apache Spark commented on SPARK-39419:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/36812

> When the comparator of ArraySort returns null, it should fail.
> --
>
> Key: SPARK-39419
> URL: https://issues.apache.org/jira/browse/SPARK-39419
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> When the comparator of {{ArraySort}} returns {{null}}, currently it handles 
> it as {{0}} (equal).
> According to the doc, 
> {quote}
> It returns -1, 0, or 1 as the first element is less than, equal to, or 
> greater than the second element. If the comparator function returns other 
> values (including null), the function will fail and raise an error.
> {quote}
> It's fine to return non -1, 0, 1 integers to follow the Java convention 
> (still need to update the doc, though), but it should throw an exception for 
> {{null}} result.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39419) When the comparator of ArraySort returns null, it should fail.

2022-06-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551866#comment-17551866
 ] 

Apache Spark commented on SPARK-39419:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/36812

> When the comparator of ArraySort returns null, it should fail.
> --
>
> Key: SPARK-39419
> URL: https://issues.apache.org/jira/browse/SPARK-39419
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> When the comparator of {{ArraySort}} returns {{null}}, currently it handles 
> it as {{0}} (equal).
> According to the doc, 
> {quote}
> It returns -1, 0, or 1 as the first element is less than, equal to, or 
> greater than the second element. If the comparator function returns other 
> values (including null), the function will fail and raise an error.
> {quote}
> It's fine to return non -1, 0, 1 integers to follow the Java convention 
> (still need to update the doc, though), but it should throw an exception for 
> {{null}} result.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39419) When the comparator of ArraySort returns null, it should fail.

2022-06-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39419:


Assignee: Apache Spark

> When the comparator of ArraySort returns null, it should fail.
> --
>
> Key: SPARK-39419
> URL: https://issues.apache.org/jira/browse/SPARK-39419
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>
> When the comparator of {{ArraySort}} returns {{null}}, currently it handles 
> it as {{0}} (equal).
> According to the doc, 
> {quote}
> It returns -1, 0, or 1 as the first element is less than, equal to, or 
> greater than the second element. If the comparator function returns other 
> values (including null), the function will fail and raise an error.
> {quote}
> It's fine to return non -1, 0, 1 integers to follow the Java convention 
> (still need to update the doc, though), but it should throw an exception for 
> {{null}} result.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39419) When the comparator of ArraySort returns null, it should fail.

2022-06-08 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-39419:
-

 Summary: When the comparator of ArraySort returns null, it should 
fail.
 Key: SPARK-39419
 URL: https://issues.apache.org/jira/browse/SPARK-39419
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.0
Reporter: Takuya Ueshin


When the comparator of {{ArraySort}} returns {{null}}, currently it handles it 
as {{0}} (equal).

According to the doc, 

{quote}
It returns -1, 0, or 1 as the first element is less than, equal to, or greater 
than the second element. If the comparator function returns other values 
(including null), the function will fail and raise an error.
{quote}

It's fine to return non -1, 0, 1 integers to follow the Java convention (still 
need to update the doc, though), but it should throw an exception for {{null}} 
result.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39418) DECODE docs refer to Oracle instead of Spark

2022-06-08 Thread Bruce Robbins (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551801#comment-17551801
 ] 

Bruce Robbins commented on SPARK-39418:
---

Possibly a dup of SPARK-39286?

> DECODE docs refer to Oracle instead of Spark
> 
>
> Key: SPARK-39418
> URL: https://issues.apache.org/jira/browse/SPARK-39418
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.2.0
>Reporter: Serge Rielau
>Priority: Critical
>
> [https://spark.apache.org/docs/latest/api/sql/index.html#decode]
> If no match is found, then {color:#de350b}Oracle{color} returns default. If 
> default is omitted, returns null.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39418) DECODE docs refer to Oracle instead of Spark

2022-06-08 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-39418:


 Summary: DECODE docs refer to Oracle instead of Spark
 Key: SPARK-39418
 URL: https://issues.apache.org/jira/browse/SPARK-39418
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 3.2.0
Reporter: Serge Rielau


[https://spark.apache.org/docs/latest/api/sql/index.html#decode]

If no match is found, then {color:#de350b}Oracle{color} returns default. If 
default is omitted, returns null.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39393) Parquet data source only supports push-down predicate filters for non-repeated primitive types

2022-06-08 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-39393.

Fix Version/s: 3.1.3
   3.3.0
   3.2.2
   3.4.0
 Assignee: Amin Borjian
   Resolution: Fixed

> Parquet data source only supports push-down predicate filters for 
> non-repeated primitive types
> --
>
> Key: SPARK-39393
> URL: https://issues.apache.org/jira/browse/SPARK-39393
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.2.1
>Reporter: Amin Borjian
>Assignee: Amin Borjian
>Priority: Major
>  Labels: parquet
> Fix For: 3.1.3, 3.3.0, 3.2.2, 3.4.0
>
>
> I use an example to illustrate the problem. The reason for the problem and 
> the problem-solving approach are stated below.
> Assume follow Protocol buffer schema:
> {code:java}
> message Model {
>  string name = 1;
>  repeated string keywords = 2;
> }
> {code}
> Suppose a parquet file is created from a set of records in the above format 
> with the help of the {{parquet-protobuf}} library.
> Using Spark version 3.0.2 or older, we could run the following query using 
> {{{}spark-shell{}}}:
> {code:java}
> val data = spark.read.parquet("/path/to/parquet")
> data.registerTempTable("models")
> spark.sql("select * from models where array_contains(keywords, 
> 'X')").show(false)
> {code}
> But after updating Spark, we get the following error:
> {code:java}
> Caused by: java.lang.IllegalArgumentException: FilterPredicates do not 
> currently support repeated columns. Column keywords is repeated.
>   at 
> org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:176)
>   at 
> org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:149)
>   at 
> org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:89)
>   at 
> org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:56)
>   at 
> org.apache.parquet.filter2.predicate.Operators$NotEq.accept(Operators.java:192)
>   at 
> org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:61)
>   at 
> org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:95)
>   at 
> org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:45)
>   at 
> org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:149)
>   at 
> org.apache.parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:72)
>   at 
> org.apache.parquet.hadoop.ParquetFileReader.filterRowGroups(ParquetFileReader.java:870)
>   at 
> org.apache.parquet.hadoop.ParquetFileReader.(ParquetFileReader.java:789)
>   at 
> org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:657)
>   at 
> org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:162)
>   at 
> org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:373)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127)
> ...
> {code}
> At first it seems the problem is the parquet library. But in fact, our 
> problem is because of this line that has been around since 2014 (based on Git 
> history):
> [Parquet Schema Compatibility 
> Validator|https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/filter2/predicate/SchemaCompatibilityValidator.java#L194]
> After some check, I notice that the cause of the problem is due to a change 
> in the data filtering conditions:
> {code:java}
> spark.sql("select * from log where array_contains(keywords, 
> 'X')").explain(true);
> // Spark 3.0.2 and older
> == Physical Plan ==
> ... 
> +- FileScan parquet [link#0,keywords#1]
>   DataFilters: [array_contains(keywords#1, Google)]
>   PushedFilters: []
>   ...
> // Spark 3.1.0 and newer
> == Physical Plan == ... 
> +- FileScan parquet [link#0,keywords#1]
>   DataFilters: [isnotnull(keywords#1),  array_contains(keywords#1, Google)]
>   PushedFilters: [IsNotNull(keywords)]
>   ...{code}
> It's good that the filtering section has become smarter. Unfortunately, due 
> to unfamiliarity with code base, I could not find the exact location of the 
> change and rela

[jira] [Updated] (SPARK-39417) Handle Null partition values in PartitioningUtils

2022-06-08 Thread Josh Rosen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-39417:
---
Target Version/s: 3.3.0

> Handle Null partition values in PartitioningUtils
> -
>
> Key: SPARK-39417
> URL: https://issues.apache.org/jira/browse/SPARK-39417
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Prashant Singh
>Priority: Major
>
> partitions with null values we get a NPE on partition discovery, earlier we 
> use to get `DEFAULT_PARTITION_NAME`
>  
> {quote} [info]   java.lang.NullPointerException:
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362)
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355)
> [info]   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943){quote}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39417) Handle Null partition values in PartitioningUtils

2022-06-08 Thread Josh Rosen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-39417:
---
Fix Version/s: (was: 3.3.0)

> Handle Null partition values in PartitioningUtils
> -
>
> Key: SPARK-39417
> URL: https://issues.apache.org/jira/browse/SPARK-39417
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Prashant Singh
>Priority: Major
>
> partitions with null values we get a NPE on partition discovery, earlier we 
> use to get `DEFAULT_PARTITION_NAME`
>  
> {quote} [info]   java.lang.NullPointerException:
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362)
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355)
> [info]   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943){quote}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39412) IllegalStateException from connector does not work well with error class framework

2022-06-08 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-39412.
--
Fix Version/s: 3.3.1
   3.4.0
   Resolution: Fixed

Issue resolved by pull request 36804
[https://github.com/apache/spark/pull/36804]

> IllegalStateException from connector does not work well with error class 
> framework
> --
>
> Key: SPARK-39412
> URL: https://issues.apache.org/jira/browse/SPARK-39412
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Jungtaek Lim
>Assignee: Max Gekk
>Priority: Blocker
> Fix For: 3.3.1, 3.4.0
>
> Attachments: kafka-dataloss-error-msg-in-spark-3-2.log, 
> kafka-dataloss-error-msg-in-spark-3-3-or-master.log
>
>
> With SPARK-39346, Spark SQL binds several exceptions to the internal error, 
> and produces different guidance on dealing with the exception. This assumes 
> these exceptions are only used for noticing internal bugs.
> This applies to "connectors" as well, and introduces side-effect on the error 
> log. For Kafka data source, it is a breaking and unacceptable change, because 
> there is an important use case Kafka data source determines a case of 
> "dataloss", and throws IllegalStateException with instruction message on 
> workaround.
> I mentioned this as "important" use case, because it can even happen with 
> some valid scenarios - streaming query has some maintenance period and 
> Kafka's retention on topic removes some records in the meanwhile.
> Two problems arise:
> 1) This does not mean Spark has a bug and end users have to report, hence the 
> guidance message on internal error is misleading.
> 2) Most importantly, instruction message is shown after a long stack trace. 
> With the modification of existing test suite, I see the message being 
> appeared in "line 90" of the error log.
> We should roll the right error message back, at least for Kafka's case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39412) IllegalStateException from connector does not work well with error class framework

2022-06-08 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-39412:


Assignee: Max Gekk

> IllegalStateException from connector does not work well with error class 
> framework
> --
>
> Key: SPARK-39412
> URL: https://issues.apache.org/jira/browse/SPARK-39412
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Jungtaek Lim
>Assignee: Max Gekk
>Priority: Blocker
> Attachments: kafka-dataloss-error-msg-in-spark-3-2.log, 
> kafka-dataloss-error-msg-in-spark-3-3-or-master.log
>
>
> With SPARK-39346, Spark SQL binds several exceptions to the internal error, 
> and produces different guidance on dealing with the exception. This assumes 
> these exceptions are only used for noticing internal bugs.
> This applies to "connectors" as well, and introduces side-effect on the error 
> log. For Kafka data source, it is a breaking and unacceptable change, because 
> there is an important use case Kafka data source determines a case of 
> "dataloss", and throws IllegalStateException with instruction message on 
> workaround.
> I mentioned this as "important" use case, because it can even happen with 
> some valid scenarios - streaming query has some maintenance period and 
> Kafka's retention on topic removes some records in the meanwhile.
> Two problems arise:
> 1) This does not mean Spark has a bug and end users have to report, hence the 
> guidance message on internal error is misleading.
> 2) Most importantly, instruction message is shown after a long stack trace. 
> With the modification of existing test suite, I see the message being 
> appeared in "line 90" of the error log.
> We should roll the right error message back, at least for Kafka's case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39417) Handle Null partition values in PartitioningUtils

2022-06-08 Thread Prashant Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Singh updated SPARK-39417:
---
Affects Version/s: 3.3.0
   (was: 3.2.1)

> Handle Null partition values in PartitioningUtils
> -
>
> Key: SPARK-39417
> URL: https://issues.apache.org/jira/browse/SPARK-39417
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Prashant Singh
>Priority: Major
> Fix For: 3.3.0
>
>
> partitions with null values we get a NPE on partition discovery, earlier we 
> use to get `DEFAULT_PARTITION_NAME`
>  
> {quote} [info]   java.lang.NullPointerException:
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362)
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355)
> [info]   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943){quote}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39417) Handle Null partition values in PartitioningUtils

2022-06-08 Thread Prashant Singh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551731#comment-17551731
 ] 

Prashant Singh commented on SPARK-39417:


Appologies, the problem I think is only in 3.3.0 seems to introduced in 
https://github.com/apache/spark/commit/fc29c91f27d866502f5b6cc4261d4943b57e

Let me correct it.

> Handle Null partition values in PartitioningUtils
> -
>
> Key: SPARK-39417
> URL: https://issues.apache.org/jira/browse/SPARK-39417
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: Prashant Singh
>Priority: Major
> Fix For: 3.3.0
>
>
> partitions with null values we get a NPE on partition discovery, earlier we 
> use to get `DEFAULT_PARTITION_NAME`
>  
> {quote} [info]   java.lang.NullPointerException:
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362)
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355)
> [info]   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943){quote}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39417) Handle Null partition values in PartitioningUtils

2022-06-08 Thread Josh Rosen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551722#comment-17551722
 ] 

Josh Rosen commented on SPARK-39417:


I see that the "affected versions" is currently set to 3.2.1. Does this problem 
actually occur in that version or is it a regression in 3.3.0?

> Handle Null partition values in PartitioningUtils
> -
>
> Key: SPARK-39417
> URL: https://issues.apache.org/jira/browse/SPARK-39417
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: Prashant Singh
>Priority: Major
> Fix For: 3.3.0
>
>
> partitions with null values we get a NPE on partition discovery, earlier we 
> use to get `DEFAULT_PARTITION_NAME`
>  
> {quote} [info]   java.lang.NullPointerException:
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362)
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355)
> [info]   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943){quote}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39417) Handle Null partition values in PartitioningUtils

2022-06-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39417:


Assignee: Apache Spark

> Handle Null partition values in PartitioningUtils
> -
>
> Key: SPARK-39417
> URL: https://issues.apache.org/jira/browse/SPARK-39417
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: Prashant Singh
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.3.0
>
>
> partitions with null values we get a NPE on partition discovery, earlier we 
> use to get `DEFAULT_PARTITION_NAME`
>  
> {quote} [info]   java.lang.NullPointerException:
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362)
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355)
> [info]   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943){quote}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39417) Handle Null partition values in PartitioningUtils

2022-06-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39417:


Assignee: (was: Apache Spark)

> Handle Null partition values in PartitioningUtils
> -
>
> Key: SPARK-39417
> URL: https://issues.apache.org/jira/browse/SPARK-39417
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: Prashant Singh
>Priority: Major
> Fix For: 3.3.0
>
>
> partitions with null values we get a NPE on partition discovery, earlier we 
> use to get `DEFAULT_PARTITION_NAME`
>  
> {quote} [info]   java.lang.NullPointerException:
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362)
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355)
> [info]   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943){quote}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39417) Handle Null partition values in PartitioningUtils

2022-06-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551715#comment-17551715
 ] 

Apache Spark commented on SPARK-39417:
--

User 'singhpk234' has created a pull request for this issue:
https://github.com/apache/spark/pull/36810

> Handle Null partition values in PartitioningUtils
> -
>
> Key: SPARK-39417
> URL: https://issues.apache.org/jira/browse/SPARK-39417
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: Prashant Singh
>Priority: Major
> Fix For: 3.3.0
>
>
> partitions with null values we get a NPE on partition discovery, earlier we 
> use to get `DEFAULT_PARTITION_NAME`
>  
> {quote} [info]   java.lang.NullPointerException:
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362)
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355)
> [info]   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943){quote}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39417) Handle Null partition values in PartitioningUtils

2022-06-08 Thread Prashant Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Singh updated SPARK-39417:
---
Description: 
partitions with null values we get a NPE on partition discovery, earlier we use 
to get `DEFAULT_PARTITION_NAME`

 

{quote} [info]   java.lang.NullPointerException:
[info]   at 
org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362)
[info]   at 
org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355)
[info]   at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
[info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
[info]   at scala.collection.Iterator.foreach$(Iterator.scala:943){quote}

  was:
table with partitions will null values fails with NPE, during partition 
discovery.

 

{quote} [info]   java.lang.NullPointerException:
[info]   at 
org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362)
[info]   at 
org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355)
[info]   at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
[info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
[info]   at scala.collection.Iterator.foreach$(Iterator.scala:943){quote}


> Handle Null partition values in PartitioningUtils
> -
>
> Key: SPARK-39417
> URL: https://issues.apache.org/jira/browse/SPARK-39417
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: Prashant Singh
>Priority: Major
> Fix For: 3.3.0
>
>
> partitions with null values we get a NPE on partition discovery, earlier we 
> use to get `DEFAULT_PARTITION_NAME`
>  
> {quote} [info]   java.lang.NullPointerException:
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362)
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355)
> [info]   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943){quote}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39417) Handle Null partition values in PartitioningUtils

2022-06-08 Thread Prashant Singh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551706#comment-17551706
 ] 

Prashant Singh commented on SPARK-39417:


PR : https://github.com/apache/spark/pull/36810/files

> Handle Null partition values in PartitioningUtils
> -
>
> Key: SPARK-39417
> URL: https://issues.apache.org/jira/browse/SPARK-39417
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: Prashant Singh
>Priority: Major
> Fix For: 3.3.0
>
>
> table with partitions will null values fails with NPE, during partition 
> discovery.
>  
> {quote} [info]   java.lang.NullPointerException:
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362)
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355)
> [info]   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943){quote}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39417) Handle Null partition values in PartitioningUtils

2022-06-08 Thread Prashant Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Singh updated SPARK-39417:
---
Description: 
table with partitions will null values fails with NPE, during partition 
discovery.

 

{quote} [info]   java.lang.NullPointerException:
[info]   at 
org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362)
[info]   at 
org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355)
[info]   at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
[info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
[info]   at scala.collection.Iterator.foreach$(Iterator.scala:943){quote}

  was:
table with partitions will null values fails with NPE, during partition 
discovery.

 

> [info]   java.lang.NullPointerException:
[info]   at 
org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362)
[info]   at 
org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355)
[info]   at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
[info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
[info]   at scala.collection.Iterator.foreach$(Iterator.scala:943)


> Handle Null partition values in PartitioningUtils
> -
>
> Key: SPARK-39417
> URL: https://issues.apache.org/jira/browse/SPARK-39417
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: Prashant Singh
>Priority: Major
> Fix For: 3.3.0
>
>
> table with partitions will null values fails with NPE, during partition 
> discovery.
>  
> {quote} [info]   java.lang.NullPointerException:
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362)
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355)
> [info]   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943){quote}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39417) Handle Null partition values in PartitioningUtils

2022-06-08 Thread Prashant Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Singh updated SPARK-39417:
---
Description: 
table with partitions will null values fails with NPE, during partition 
discovery.

 

> [info]   java.lang.NullPointerException:
[info]   at 
org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362)
[info]   at 
org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355)
[info]   at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
[info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
[info]   at scala.collection.Iterator.foreach$(Iterator.scala:943)

  was:
partitions will null values fails with NPE now.

 

```

[info] - Null partition value *** FAILED *** (142 milliseconds)
[info]   java.lang.NullPointerException:
[info]   at 
org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362)
[info]   at 
org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355)
[info]   at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
[info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
[info]   at scala.collection.Iterator.foreach$(Iterator.scala:943)
```


> Handle Null partition values in PartitioningUtils
> -
>
> Key: SPARK-39417
> URL: https://issues.apache.org/jira/browse/SPARK-39417
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: Prashant Singh
>Priority: Major
> Fix For: 3.3.0
>
>
> table with partitions will null values fails with NPE, during partition 
> discovery.
>  
> > [info]   java.lang.NullPointerException:
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362)
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355)
> [info]   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39417) Handle Null partition values in PartitioningUtils

2022-06-08 Thread Prashant Singh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551701#comment-17551701
 ] 

Prashant Singh commented on SPARK-39417:


adding a PR for this shortly

> Handle Null partition values in PartitioningUtils
> -
>
> Key: SPARK-39417
> URL: https://issues.apache.org/jira/browse/SPARK-39417
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: Prashant Singh
>Priority: Major
> Fix For: 3.3.0
>
>
> partitions will null values fails with NPE now.
>  
> ```
> [info] - Null partition value *** FAILED *** (142 milliseconds)
> [info]   java.lang.NullPointerException:
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362)
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355)
> [info]   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943)
> ```



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39417) Handle Null partition values in PartitioningUtils

2022-06-08 Thread Prashant Singh (Jira)
Prashant Singh created SPARK-39417:
--

 Summary: Handle Null partition values in PartitioningUtils
 Key: SPARK-39417
 URL: https://issues.apache.org/jira/browse/SPARK-39417
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.1
Reporter: Prashant Singh
 Fix For: 3.3.0


partitions will null values fails with NPE now.

 

```

[info] - Null partition value *** FAILED *** (142 milliseconds)
[info]   java.lang.NullPointerException:
[info]   at 
org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362)
[info]   at 
org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355)
[info]   at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
[info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
[info]   at scala.collection.Iterator.foreach$(Iterator.scala:943)
```



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39416) When raising an exception, pass parameters as a map instead of an array

2022-06-08 Thread Serge Rielau (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Rielau updated SPARK-39416:
-
Description: 
We have moved away from c-style parameters in error message texts towards 
symbolic parameters. E.g.

 
{code:java}
"CANNOT_CAST_DATATYPE" : {
  "message" : [
"Cannot cast  to ."
  ],
  "sqlState" : "22005"
},{code}
{{However when we raise an exception we merely pass a simple array and assume 
positional assignment. }}
{code:java}
def cannotCastFromNullTypeError(to: DataType): Throwable = {
  new SparkException(errorClass = "CANNOT_CAST_DATATYPE",
 messageParameters = Array(NullType.typeName, to.typeName), null)
}{code}
 

This has multiple downsides:
 # It's not possible to mention the same parameter twice in an error message.
 # When reworking an error message we cannon shuffle parameters without 
changing the code
 # There is a risk that the error message and the exception go out of synch 
unnoticed given we do not want to check for the message text in the code.

So in this PR we propose the following new usage:
{code:java}
def cannotCastFromNullTypeError(to: DataType): Throwable = {
  new SparkException(errorClass = "CANNOT_CAST_DATATYPE",
messageParameters = Map("sourceType" -> NullType.typeName, "targetType" 
->to.typeName),
context = null)
}{code}
getMessage will then substitute the parameters in the message appropriately.

Moving forward this should be the preferred way to raise exceptions.

  was:
We have moved away from c-style parameters in error message texts towards 
symbolic parameters. E.g.

 
{code:java}
"CANNOT_CAST_DATATYPE" : {
  "message" : [
"Cannot cast  to ."
  ],
  "sqlState" : "22005"
},{code}

{{However when we raise an exception we merely pass a simple array and assume 
positional assignment. }}
{{}}

 

 
{code:java}
def cannotCastFromNullTypeError(to: DataType): Throwable = {
  new SparkException(errorClass = "CANNOT_CAST_DATATYPE",
 messageParameters = Array(NullType.typeName, to.typeName), null)
}{code}
 

This has multiple downsides:
 # It's not possible to mention the same parameter twice in an error message.
 # When reworking an error message we cannon shuffle parameters without 
changing the code
 # There is a risk that the error message and the exception go out of synch 
unnoticed given we do not want to check for the message text in the code.


So in this PR we propose the following new usage:
{code:java}
def cannotCastFromNullTypeError(to: DataType): Throwable = {
  new SparkException(errorClass = "CANNOT_CAST_DATATYPE",
messageParameters = Map("sourceType" -> NullType.typeName, "targetType" 
->to.typeName),
context = null)
}{code}
getMessage will then substitute the parameters in the message appropriately.

Moving forward this should be the preferred way to raise exceptions.


> When raising an exception, pass parameters as a map instead of an array
> ---
>
> Key: SPARK-39416
> URL: https://issues.apache.org/jira/browse/SPARK-39416
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.1
>Reporter: Serge Rielau
>Priority: Major
>
> We have moved away from c-style parameters in error message texts towards 
> symbolic parameters. E.g.
>  
> {code:java}
> "CANNOT_CAST_DATATYPE" : {
>   "message" : [
> "Cannot cast  to ."
>   ],
>   "sqlState" : "22005"
> },{code}
> {{However when we raise an exception we merely pass a simple array and assume 
> positional assignment. }}
> {code:java}
> def cannotCastFromNullTypeError(to: DataType): Throwable = {
>   new SparkException(errorClass = "CANNOT_CAST_DATATYPE",
>  messageParameters = Array(NullType.typeName, to.typeName), null)
> }{code}
>  
> This has multiple downsides:
>  # It's not possible to mention the same parameter twice in an error message.
>  # When reworking an error message we cannon shuffle parameters without 
> changing the code
>  # There is a risk that the error message and the exception go out of synch 
> unnoticed given we do not want to check for the message text in the code.
> So in this PR we propose the following new usage:
> {code:java}
> def cannotCastFromNullTypeError(to: DataType): Throwable = {
>   new SparkException(errorClass = "CANNOT_CAST_DATATYPE",
> messageParameters = Map("sourceType" -> NullType.typeName, "targetType" 
> ->to.typeName),
> context = null)
> }{code}
> getMessage will then substitute the parameters in the message appropriately.
> Moving forward this should be the preferred way to raise exceptions.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h.

[jira] [Created] (SPARK-39416) When raising an exception, pass parameters as a map instead of an array

2022-06-08 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-39416:


 Summary: When raising an exception, pass parameters as a map 
instead of an array
 Key: SPARK-39416
 URL: https://issues.apache.org/jira/browse/SPARK-39416
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.3.1
Reporter: Serge Rielau


We have moved away from c-style parameters in error message texts towards 
symbolic parameters. E.g.

 
{code:java}
"CANNOT_CAST_DATATYPE" : {
  "message" : [
"Cannot cast  to ."
  ],
  "sqlState" : "22005"
},{code}

{{However when we raise an exception we merely pass a simple array and assume 
positional assignment. }}
{{}}

 

 
{code:java}
def cannotCastFromNullTypeError(to: DataType): Throwable = {
  new SparkException(errorClass = "CANNOT_CAST_DATATYPE",
 messageParameters = Array(NullType.typeName, to.typeName), null)
}{code}
 

This has multiple downsides:
 # It's not possible to mention the same parameter twice in an error message.
 # When reworking an error message we cannon shuffle parameters without 
changing the code
 # There is a risk that the error message and the exception go out of synch 
unnoticed given we do not want to check for the message text in the code.


So in this PR we propose the following new usage:
{code:java}
def cannotCastFromNullTypeError(to: DataType): Throwable = {
  new SparkException(errorClass = "CANNOT_CAST_DATATYPE",
messageParameters = Map("sourceType" -> NullType.typeName, "targetType" 
->to.typeName),
context = null)
}{code}
getMessage will then substitute the parameters in the message appropriately.

Moving forward this should be the preferred way to raise exceptions.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39413) Capitalize sql keywords in JDBCV2Suite

2022-06-08 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-39413.

Fix Version/s: 3.4.0
 Assignee: jiaan.geng
   Resolution: Fixed

> Capitalize sql keywords in JDBCV2Suite
> --
>
> Key: SPARK-39413
> URL: https://issues.apache.org/jira/browse/SPARK-39413
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.4.0
>
>
> JDBCV2Suite exists some test case which uses sql keywords without capitalized.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39415) Local mode supports HadoopDelegationTokenManager

2022-06-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551647#comment-17551647
 ] 

Apache Spark commented on SPARK-39415:
--

User 'cxzl25' has created a pull request for this issue:
https://github.com/apache/spark/pull/36808

> Local mode supports HadoopDelegationTokenManager
> 
>
> Key: SPARK-39415
> URL: https://issues.apache.org/jira/browse/SPARK-39415
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: dzcxzl
>Priority: Minor
>
> Now in the kerberos environment, using spark-submit --master=local 
> --proxy-user xxx cannot access Hive Meta Store, and using --keytab will not 
> automatically relogin.
> {code:java}
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1743)
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:483)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39415) Local mode supports HadoopDelegationTokenManager

2022-06-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39415:


Assignee: (was: Apache Spark)

> Local mode supports HadoopDelegationTokenManager
> 
>
> Key: SPARK-39415
> URL: https://issues.apache.org/jira/browse/SPARK-39415
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: dzcxzl
>Priority: Minor
>
> Now in the kerberos environment, using spark-submit --master=local 
> --proxy-user xxx cannot access Hive Meta Store, and using --keytab will not 
> automatically relogin.
> {code:java}
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1743)
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:483)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39415) Local mode supports HadoopDelegationTokenManager

2022-06-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39415:


Assignee: Apache Spark

> Local mode supports HadoopDelegationTokenManager
> 
>
> Key: SPARK-39415
> URL: https://issues.apache.org/jira/browse/SPARK-39415
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: dzcxzl
>Assignee: Apache Spark
>Priority: Minor
>
> Now in the kerberos environment, using spark-submit --master=local 
> --proxy-user xxx cannot access Hive Meta Store, and using --keytab will not 
> automatically relogin.
> {code:java}
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1743)
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:483)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39415) Local mode supports HadoopDelegationTokenManager

2022-06-08 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-39415:
---
Summary: Local mode supports HadoopDelegationTokenManager  (was: Local mode 
supports delegationTokenManager)

> Local mode supports HadoopDelegationTokenManager
> 
>
> Key: SPARK-39415
> URL: https://issues.apache.org/jira/browse/SPARK-39415
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: dzcxzl
>Priority: Minor
>
> Now in the kerberos environment, using spark-submit --master=local 
> --proxy-user xxx cannot access Hive Meta Store, and using --keytab will not 
> automatically relogin.
> {code:java}
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1743)
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:483)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39415) Local mode supports delegationTokenManager

2022-06-08 Thread dzcxzl (Jira)
dzcxzl created SPARK-39415:
--

 Summary: Local mode supports delegationTokenManager
 Key: SPARK-39415
 URL: https://issues.apache.org/jira/browse/SPARK-39415
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.2.1
Reporter: dzcxzl


Now in the kerberos environment, using spark-submit --master=local --proxy-user 
xxx cannot access Hive Meta Store, and using --keytab will not automatically 
relogin.


{code:java}
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]

at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1743)
at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:483)
{code}




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39414) Upgrade Scala to 2.12.16

2022-06-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39414:


Assignee: (was: Apache Spark)

> Upgrade Scala to 2.12.16
> 
>
> Key: SPARK-39414
> URL: https://issues.apache.org/jira/browse/SPARK-39414
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> https://github.com/scala/scala/releases/tag/v2.12.16



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39414) Upgrade Scala to 2.12.16

2022-06-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551596#comment-17551596
 ] 

Apache Spark commented on SPARK-39414:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/36807

> Upgrade Scala to 2.12.16
> 
>
> Key: SPARK-39414
> URL: https://issues.apache.org/jira/browse/SPARK-39414
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> https://github.com/scala/scala/releases/tag/v2.12.16



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39414) Upgrade Scala to 2.12.16

2022-06-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39414:


Assignee: Apache Spark

> Upgrade Scala to 2.12.16
> 
>
> Key: SPARK-39414
> URL: https://issues.apache.org/jira/browse/SPARK-39414
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> https://github.com/scala/scala/releases/tag/v2.12.16



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39414) Upgrade Scala to 2.12.16

2022-06-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551595#comment-17551595
 ] 

Apache Spark commented on SPARK-39414:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/36807

> Upgrade Scala to 2.12.16
> 
>
> Key: SPARK-39414
> URL: https://issues.apache.org/jira/browse/SPARK-39414
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> https://github.com/scala/scala/releases/tag/v2.12.16



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39414) Upgrade Scala to 2.12.16

2022-06-08 Thread Yang Jie (Jira)
Yang Jie created SPARK-39414:


 Summary: Upgrade Scala to 2.12.16
 Key: SPARK-39414
 URL: https://issues.apache.org/jira/browse/SPARK-39414
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.4.0
Reporter: Yang Jie


https://github.com/scala/scala/releases/tag/v2.12.16



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38852) Better Data Source V2 operator pushdown framework

2022-06-08 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-38852:
---
Description: 
Currently, Spark supports push down Filters and Aggregates to data source.
However, the Data Source V2 operator pushdown framework has the following 
shortcomings:

# Only simple filter and aggregate are supported, which makes it impossible to 
apply in most scenarios
# The incompatibility of SQL syntax makes it impossible to apply in most 
scenarios
# Aggregate push down does not support multiple partitions of data sources
# Spark's additional aggregate will cause some overhead
# Limit push down is not supported
# Top n push down is not supported
# Aggregate push down does not support group by expressions
# Aggregate push down does not support not use aggregate functions.
# Offset push down is not supported
# Paging push down is not supported

  was:
Currently, Spark supports push down Filters and Aggregates to data source.
However, the Data Source V2 operator pushdown framework has the following 
shortcomings:

# Only simple filter and aggregate are supported, which makes it impossible to 
apply in most scenarios
# The incompatibility of SQL syntax makes it impossible to apply in most 
scenarios
# Aggregate push down does not support multiple partitions of data sources
# Spark's additional aggregate will cause some overhead
# Limit push down is not supported
# Top n push down is not supported
# Aggregate push down does not support group by expressions
# Offset push down is not supported
# Paging push down is not supported


> Better Data Source V2 operator pushdown framework
> -
>
> Key: SPARK-38852
> URL: https://issues.apache.org/jira/browse/SPARK-38852
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, Spark supports push down Filters and Aggregates to data source.
> However, the Data Source V2 operator pushdown framework has the following 
> shortcomings:
> # Only simple filter and aggregate are supported, which makes it impossible 
> to apply in most scenarios
> # The incompatibility of SQL syntax makes it impossible to apply in most 
> scenarios
> # Aggregate push down does not support multiple partitions of data sources
> # Spark's additional aggregate will cause some overhead
> # Limit push down is not supported
> # Top n push down is not supported
> # Aggregate push down does not support group by expressions
> # Aggregate push down does not support not use aggregate functions.
> # Offset push down is not supported
> # Paging push down is not supported



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38852) Better Data Source V2 operator pushdown framework

2022-06-08 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-38852:
---
Description: 
Currently, Spark supports push down Filters and Aggregates to data source.
However, the Data Source V2 operator pushdown framework has the following 
shortcomings:

# Only simple filter and aggregate are supported, which makes it impossible to 
apply in most scenarios
# The incompatibility of SQL syntax makes it impossible to apply in most 
scenarios
# Aggregate push down does not support multiple partitions of data sources
# Spark's additional aggregate will cause some overhead
# Limit push down is not supported
# Top n push down is not supported
# Aggregate push down does not support group by expressions
# Aggregate push down does not support not use aggregate functions
# Offset push down is not supported
# Paging push down is not supported

  was:
Currently, Spark supports push down Filters and Aggregates to data source.
However, the Data Source V2 operator pushdown framework has the following 
shortcomings:

# Only simple filter and aggregate are supported, which makes it impossible to 
apply in most scenarios
# The incompatibility of SQL syntax makes it impossible to apply in most 
scenarios
# Aggregate push down does not support multiple partitions of data sources
# Spark's additional aggregate will cause some overhead
# Limit push down is not supported
# Top n push down is not supported
# Aggregate push down does not support group by expressions
# Aggregate push down does not support not use aggregate functions.
# Offset push down is not supported
# Paging push down is not supported


> Better Data Source V2 operator pushdown framework
> -
>
> Key: SPARK-38852
> URL: https://issues.apache.org/jira/browse/SPARK-38852
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, Spark supports push down Filters and Aggregates to data source.
> However, the Data Source V2 operator pushdown framework has the following 
> shortcomings:
> # Only simple filter and aggregate are supported, which makes it impossible 
> to apply in most scenarios
> # The incompatibility of SQL syntax makes it impossible to apply in most 
> scenarios
> # Aggregate push down does not support multiple partitions of data sources
> # Spark's additional aggregate will cause some overhead
> # Limit push down is not supported
> # Top n push down is not supported
> # Aggregate push down does not support group by expressions
> # Aggregate push down does not support not use aggregate functions
> # Offset push down is not supported
> # Paging push down is not supported



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38852) Better Data Source V2 operator pushdown framework

2022-06-08 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-38852:
---
Description: 
Currently, Spark supports push down Filters and Aggregates to data source.
However, the Data Source V2 operator pushdown framework has the following 
shortcomings:

# Only simple filter and aggregate are supported, which makes it impossible to 
apply in most scenarios
# The incompatibility of SQL syntax makes it impossible to apply in most 
scenarios
# Aggregate push down does not support multiple partitions of data sources
# Spark's additional aggregate will cause some overhead
# Limit push down is not supported
# Top n push down is not supported
# Aggregate push down does not support group by expressions
# Offset push down is not supported
# Paging push down is not supported

  was:
Currently, Spark supports push down Filters and Aggregates to data source.
However, the Data Source V2 operator pushdown framework has the following 
shortcomings:

# Only simple filter and aggregate are supported, which makes it impossible to 
apply in most scenarios
# The incompatibility of SQL syntax makes it impossible to apply in most 
scenarios
# Aggregate push down does not support multiple partitions of data sources
# Spark's additional aggregate will cause some overhead
# Limit push down is not supported
# Top n push down is not supported
# Aggregate push down does not support group by expressions
# Offset push down is not supported


> Better Data Source V2 operator pushdown framework
> -
>
> Key: SPARK-38852
> URL: https://issues.apache.org/jira/browse/SPARK-38852
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, Spark supports push down Filters and Aggregates to data source.
> However, the Data Source V2 operator pushdown framework has the following 
> shortcomings:
> # Only simple filter and aggregate are supported, which makes it impossible 
> to apply in most scenarios
> # The incompatibility of SQL syntax makes it impossible to apply in most 
> scenarios
> # Aggregate push down does not support multiple partitions of data sources
> # Spark's additional aggregate will cause some overhead
> # Limit push down is not supported
> # Top n push down is not supported
> # Aggregate push down does not support group by expressions
> # Offset push down is not supported
> # Paging push down is not supported



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39398) when doing iteration compute in graphx, checkpoint need support storagelevel

2022-06-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39398:


Assignee: (was: Apache Spark)

>  when doing iteration compute in graphx, checkpoint need support storagelevel
> -
>
> Key: SPARK-39398
> URL: https://issues.apache.org/jira/browse/SPARK-39398
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 3.2.1
>Reporter: wangwenli
>Priority: Major
>
> this issue related to spark-30502, in that issue ,it olny fix partial ml 
> algorithm.
> in graphx pregel, inside iteration compute, the message checkpointer also 
> should support setting storage level, other than the default memory only level



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39398) when doing iteration compute in graphx, checkpoint need support storagelevel

2022-06-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39398:


Assignee: Apache Spark

>  when doing iteration compute in graphx, checkpoint need support storagelevel
> -
>
> Key: SPARK-39398
> URL: https://issues.apache.org/jira/browse/SPARK-39398
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 3.2.1
>Reporter: wangwenli
>Assignee: Apache Spark
>Priority: Major
>
> this issue related to spark-30502, in that issue ,it olny fix partial ml 
> algorithm.
> in graphx pregel, inside iteration compute, the message checkpointer also 
> should support setting storage level, other than the default memory only level



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39398) when doing iteration compute in graphx, checkpoint need support storagelevel

2022-06-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551497#comment-17551497
 ] 

Apache Spark commented on SPARK-39398:
--

User 'wwli05' has created a pull request for this issue:
https://github.com/apache/spark/pull/36806

>  when doing iteration compute in graphx, checkpoint need support storagelevel
> -
>
> Key: SPARK-39398
> URL: https://issues.apache.org/jira/browse/SPARK-39398
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 3.2.1
>Reporter: wangwenli
>Priority: Major
>
> this issue related to spark-30502, in that issue ,it olny fix partial ml 
> algorithm.
> in graphx pregel, inside iteration compute, the message checkpointer also 
> should support setting storage level, other than the default memory only level



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39411) Release candidates do not have the correct version for PySpark

2022-06-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-39411:
-
Fix Version/s: 3.3.1

> Release candidates do not have the correct version for PySpark
> --
>
> Key: SPARK-39411
> URL: https://issues.apache.org/jira/browse/SPARK-39411
> Project: Spark
>  Issue Type: Bug
>  Components: Build, PySpark
>Affects Versions: 3.3.1
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Critical
> Fix For: 3.4.0, 3.3.1
>
>
> https://github.com/apache/spark/blob/v3.3.0-rc5/dev/create-release/release-tag.sh#L88
>  fails to replace the version in 
> https://github.com/apache/spark/blob/v3.3.0-rc5/python/pyspark/version.py#L19 
> because now we have {code}: str ={code} hint ...



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39404) Unable to query _metadata in streaming if getBatch returns multiple logical nodes in the DataFrame

2022-06-08 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-39404.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36801
[https://github.com/apache/spark/pull/36801]

> Unable to query _metadata in streaming if getBatch returns multiple logical 
> nodes in the DataFrame
> --
>
> Key: SPARK-39404
> URL: https://issues.apache.org/jira/browse/SPARK-39404
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.1
>Reporter: Yaohua Zhao
>Assignee: Yaohua Zhao
>Priority: Major
> Fix For: 3.4.0
>
>
> Here: 
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L585]
>  
> We should probably `transform` instead of `match`



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39404) Unable to query _metadata in streaming if getBatch returns multiple logical nodes in the DataFrame

2022-06-08 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-39404:


Assignee: Yaohua Zhao

> Unable to query _metadata in streaming if getBatch returns multiple logical 
> nodes in the DataFrame
> --
>
> Key: SPARK-39404
> URL: https://issues.apache.org/jira/browse/SPARK-39404
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.1
>Reporter: Yaohua Zhao
>Assignee: Yaohua Zhao
>Priority: Major
>
> Here: 
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L585]
>  
> We should probably `transform` instead of `match`



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23207) Shuffle+Repartition on an DataFrame could lead to incorrect answers

2022-06-08 Thread Igor Berman (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-23207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551465#comment-17551465
 ] 

Igor Berman edited comment on SPARK-23207 at 6/8/22 8:36 AM:
-

We are still facing this issue in production with v3.1.2 at very large 
workloads. This happens very rarely, but still happens.
Current trials to reproduce this problem with above reproduction failed, so at 
this point no reproduction, will update if we will find one

we are running on mesos and with dynamic allocation



was (Author: igor.berman):
We are still facing this issue in production with v3.1.2 at very large 
workloads. This happens very rarely, but still happens.
Current trials to reproduce this problem with above reproduction failed, so at 
this point no reproduction, will update if we will find one

> Shuffle+Repartition on an DataFrame could lead to incorrect answers
> ---
>
> Key: SPARK-23207
> URL: https://issues.apache.org/jira/browse/SPARK-23207
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0, 2.0.0, 2.1.0, 2.2.0, 2.3.0
>Reporter: Xingbo Jiang
>Assignee: Xingbo Jiang
>Priority: Blocker
>  Labels: correctness
> Fix For: 2.1.4, 2.2.3, 2.3.0
>
>
> Currently shuffle repartition uses RoundRobinPartitioning, the generated 
> result is nondeterministic since the sequence of input rows are not 
> determined.
> The bug can be triggered when there is a repartition call following a shuffle 
> (which would lead to non-deterministic row ordering), as the pattern shows 
> below:
> upstream stage -> repartition stage -> result stage
> (-> indicate a shuffle)
> When one of the executors process goes down, some tasks on the repartition 
> stage will be retried and generate inconsistent ordering, and some tasks of 
> the result stage will be retried generating different data.
> The following code returns 931532, instead of 100:
> {code:java}
> import scala.sys.process._
> import org.apache.spark.TaskContext
> val res = spark.range(0, 1000 * 1000, 1).repartition(200).map { x =>
>   x
> }.repartition(200).map { x =>
>   if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 2) {
> throw new Exception("pkill -f java".!!)
>   }
>   x
> }
> res.distinct().count()
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23207) Shuffle+Repartition on an DataFrame could lead to incorrect answers

2022-06-08 Thread Igor Berman (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-23207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551465#comment-17551465
 ] 

Igor Berman commented on SPARK-23207:
-

We are still facing this issue in production with v3.1.2 at very large 
workloads. This happens very rarely, but still happens.
Current trials to reproduce this problem with above reproduction failed, so at 
this point no reproduction, will update if we will find one

> Shuffle+Repartition on an DataFrame could lead to incorrect answers
> ---
>
> Key: SPARK-23207
> URL: https://issues.apache.org/jira/browse/SPARK-23207
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0, 2.0.0, 2.1.0, 2.2.0, 2.3.0
>Reporter: Xingbo Jiang
>Assignee: Xingbo Jiang
>Priority: Blocker
>  Labels: correctness
> Fix For: 2.1.4, 2.2.3, 2.3.0
>
>
> Currently shuffle repartition uses RoundRobinPartitioning, the generated 
> result is nondeterministic since the sequence of input rows are not 
> determined.
> The bug can be triggered when there is a repartition call following a shuffle 
> (which would lead to non-deterministic row ordering), as the pattern shows 
> below:
> upstream stage -> repartition stage -> result stage
> (-> indicate a shuffle)
> When one of the executors process goes down, some tasks on the repartition 
> stage will be retried and generate inconsistent ordering, and some tasks of 
> the result stage will be retried generating different data.
> The following code returns 931532, instead of 100:
> {code:java}
> import scala.sys.process._
> import org.apache.spark.TaskContext
> val res = spark.range(0, 1000 * 1000, 1).repartition(200).map { x =>
>   x
> }.repartition(200).map { x =>
>   if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 2) {
> throw new Exception("pkill -f java".!!)
>   }
>   x
> }
> res.distinct().count()
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39412) IllegalStateException from connector does not work well with error class framework

2022-06-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39412:


Assignee: (was: Apache Spark)

> IllegalStateException from connector does not work well with error class 
> framework
> --
>
> Key: SPARK-39412
> URL: https://issues.apache.org/jira/browse/SPARK-39412
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Jungtaek Lim
>Priority: Blocker
> Attachments: kafka-dataloss-error-msg-in-spark-3-2.log, 
> kafka-dataloss-error-msg-in-spark-3-3-or-master.log
>
>
> With SPARK-39346, Spark SQL binds several exceptions to the internal error, 
> and produces different guidance on dealing with the exception. This assumes 
> these exceptions are only used for noticing internal bugs.
> This applies to "connectors" as well, and introduces side-effect on the error 
> log. For Kafka data source, it is a breaking and unacceptable change, because 
> there is an important use case Kafka data source determines a case of 
> "dataloss", and throws IllegalStateException with instruction message on 
> workaround.
> I mentioned this as "important" use case, because it can even happen with 
> some valid scenarios - streaming query has some maintenance period and 
> Kafka's retention on topic removes some records in the meanwhile.
> Two problems arise:
> 1) This does not mean Spark has a bug and end users have to report, hence the 
> guidance message on internal error is misleading.
> 2) Most importantly, instruction message is shown after a long stack trace. 
> With the modification of existing test suite, I see the message being 
> appeared in "line 90" of the error log.
> We should roll the right error message back, at least for Kafka's case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39412) IllegalStateException from connector does not work well with error class framework

2022-06-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39412:


Assignee: Apache Spark

> IllegalStateException from connector does not work well with error class 
> framework
> --
>
> Key: SPARK-39412
> URL: https://issues.apache.org/jira/browse/SPARK-39412
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Jungtaek Lim
>Assignee: Apache Spark
>Priority: Blocker
> Attachments: kafka-dataloss-error-msg-in-spark-3-2.log, 
> kafka-dataloss-error-msg-in-spark-3-3-or-master.log
>
>
> With SPARK-39346, Spark SQL binds several exceptions to the internal error, 
> and produces different guidance on dealing with the exception. This assumes 
> these exceptions are only used for noticing internal bugs.
> This applies to "connectors" as well, and introduces side-effect on the error 
> log. For Kafka data source, it is a breaking and unacceptable change, because 
> there is an important use case Kafka data source determines a case of 
> "dataloss", and throws IllegalStateException with instruction message on 
> workaround.
> I mentioned this as "important" use case, because it can even happen with 
> some valid scenarios - streaming query has some maintenance period and 
> Kafka's retention on topic removes some records in the meanwhile.
> Two problems arise:
> 1) This does not mean Spark has a bug and end users have to report, hence the 
> guidance message on internal error is misleading.
> 2) Most importantly, instruction message is shown after a long stack trace. 
> With the modification of existing test suite, I see the message being 
> appeared in "line 90" of the error log.
> We should roll the right error message back, at least for Kafka's case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39413) Capitalize sql keywords in JDBCV2Suite

2022-06-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39413:


Assignee: (was: Apache Spark)

> Capitalize sql keywords in JDBCV2Suite
> --
>
> Key: SPARK-39413
> URL: https://issues.apache.org/jira/browse/SPARK-39413
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> JDBCV2Suite exists some test case which uses sql keywords without capitalized.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39413) Capitalize sql keywords in JDBCV2Suite

2022-06-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39413:


Assignee: Apache Spark

> Capitalize sql keywords in JDBCV2Suite
> --
>
> Key: SPARK-39413
> URL: https://issues.apache.org/jira/browse/SPARK-39413
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>
> JDBCV2Suite exists some test case which uses sql keywords without capitalized.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39413) Capitalize sql keywords in JDBCV2Suite

2022-06-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551464#comment-17551464
 ] 

Apache Spark commented on SPARK-39413:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/36805

> Capitalize sql keywords in JDBCV2Suite
> --
>
> Key: SPARK-39413
> URL: https://issues.apache.org/jira/browse/SPARK-39413
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> JDBCV2Suite exists some test case which uses sql keywords without capitalized.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39412) IllegalStateException from connector does not work well with error class framework

2022-06-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551463#comment-17551463
 ] 

Apache Spark commented on SPARK-39412:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/36804

> IllegalStateException from connector does not work well with error class 
> framework
> --
>
> Key: SPARK-39412
> URL: https://issues.apache.org/jira/browse/SPARK-39412
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Jungtaek Lim
>Priority: Blocker
> Attachments: kafka-dataloss-error-msg-in-spark-3-2.log, 
> kafka-dataloss-error-msg-in-spark-3-3-or-master.log
>
>
> With SPARK-39346, Spark SQL binds several exceptions to the internal error, 
> and produces different guidance on dealing with the exception. This assumes 
> these exceptions are only used for noticing internal bugs.
> This applies to "connectors" as well, and introduces side-effect on the error 
> log. For Kafka data source, it is a breaking and unacceptable change, because 
> there is an important use case Kafka data source determines a case of 
> "dataloss", and throws IllegalStateException with instruction message on 
> workaround.
> I mentioned this as "important" use case, because it can even happen with 
> some valid scenarios - streaming query has some maintenance period and 
> Kafka's retention on topic removes some records in the meanwhile.
> Two problems arise:
> 1) This does not mean Spark has a bug and end users have to report, hence the 
> guidance message on internal error is misleading.
> 2) Most importantly, instruction message is shown after a long stack trace. 
> With the modification of existing test suite, I see the message being 
> appeared in "line 90" of the error log.
> We should roll the right error message back, at least for Kafka's case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39411) Release candidates do not have the correct version for PySpark

2022-06-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-39411.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36803
[https://github.com/apache/spark/pull/36803]

> Release candidates do not have the correct version for PySpark
> --
>
> Key: SPARK-39411
> URL: https://issues.apache.org/jira/browse/SPARK-39411
> Project: Spark
>  Issue Type: Bug
>  Components: Build, PySpark
>Affects Versions: 3.3.1
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Critical
> Fix For: 3.4.0
>
>
> https://github.com/apache/spark/blob/v3.3.0-rc5/dev/create-release/release-tag.sh#L88
>  fails to replace the version in 
> https://github.com/apache/spark/blob/v3.3.0-rc5/python/pyspark/version.py#L19 
> because now we have {code}: str ={code} hint ...



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >