[jira] [Assigned] (SPARK-37627) Add sorted column in BucketTransform

2021-12-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37627:


Assignee: Apache Spark

> Add sorted column in BucketTransform
> 
>
> Key: SPARK-37627
> URL: https://issues.apache.org/jira/browse/SPARK-37627
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Apache Spark
>Priority: Minor
>
> In V1, we can create table with sorted bucket like the following:
> {code:java}
>   sql("CREATE TABLE tbl(a INT, b INT) USING parquet " +
> "CLUSTERED BY (a) SORTED BY (b) INTO 5 BUCKETS")
> {code}
> However, creating table with sorted bucket in V2 failed with Exception
> {code:java}
> org.apache.spark.sql.AnalysisException: Cannot convert bucketing with sort 
> columns to a transform.
> {code}
> We should be able to create table with sorted bucket in V2.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37627) Add sorted column in BucketTransform

2021-12-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37627:


Assignee: (was: Apache Spark)

> Add sorted column in BucketTransform
> 
>
> Key: SPARK-37627
> URL: https://issues.apache.org/jira/browse/SPARK-37627
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Priority: Minor
>
> In V1, we can create table with sorted bucket like the following:
> {code:java}
>   sql("CREATE TABLE tbl(a INT, b INT) USING parquet " +
> "CLUSTERED BY (a) SORTED BY (b) INTO 5 BUCKETS")
> {code}
> However, creating table with sorted bucket in V2 failed with Exception
> {code:java}
> org.apache.spark.sql.AnalysisException: Cannot convert bucketing with sort 
> columns to a transform.
> {code}
> We should be able to create table with sorted bucket in V2.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37627) Add sorted column in BucketTransform

2021-12-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458185#comment-17458185
 ] 

Apache Spark commented on SPARK-37627:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/34879

> Add sorted column in BucketTransform
> 
>
> Key: SPARK-37627
> URL: https://issues.apache.org/jira/browse/SPARK-37627
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Priority: Minor
>
> In V1, we can create table with sorted bucket like the following:
> {code:java}
>   sql("CREATE TABLE tbl(a INT, b INT) USING parquet " +
> "CLUSTERED BY (a) SORTED BY (b) INTO 5 BUCKETS")
> {code}
> However, creating table with sorted bucket in V2 failed with Exception
> {code:java}
> org.apache.spark.sql.AnalysisException: Cannot convert bucketing with sort 
> columns to a transform.
> {code}
> We should be able to create table with sorted bucket in V2.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37627) Add sorted column in BucketTransform

2021-12-12 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-37627:
--

 Summary: Add sorted column in BucketTransform
 Key: SPARK-37627
 URL: https://issues.apache.org/jira/browse/SPARK-37627
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Huaxin Gao


In V1, we can create table with sorted bucket like the following:

{code:java}
  sql("CREATE TABLE tbl(a INT, b INT) USING parquet " +
"CLUSTERED BY (a) SORTED BY (b) INTO 5 BUCKETS")
{code}

However, creating table with sorted bucket in V2 failed with Exception

{code:java}
org.apache.spark.sql.AnalysisException: Cannot convert bucketing with sort 
columns to a transform.
{code}

We should be able to create table with sorted bucket in V2.




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-37626) Upgrade libthrift to 0.15.0

2021-12-12 Thread Bo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458178#comment-17458178
 ] 

Bo Zhang edited comment on SPARK-37626 at 12/13/21, 7:26 AM:
-

0.16.0 is not release yet. Could we upgrade to 0.15.0 first?



was (Author: bozhang):
0.16.0 is not release yet. Could we upgrade to 0.15.0 first?

Here is the PR for that: https://github.com/apache/spark/pull/34878

> Upgrade libthrift to 0.15.0
> ---
>
> Key: SPARK-37626
> URL: https://issues.apache.org/jira/browse/SPARK-37626
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Bo Zhang
>Priority: Major
> Fix For: 3.3.0
>
>
> Upgrade libthrift to 1.15.0 in order to avoid 
> https://nvd.nist.gov/vuln/detail/CVE-2020-13949.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37626) Upgrade libthrift to 0.15.0

2021-12-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37626:


Assignee: Apache Spark

> Upgrade libthrift to 0.15.0
> ---
>
> Key: SPARK-37626
> URL: https://issues.apache.org/jira/browse/SPARK-37626
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Bo Zhang
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.3.0
>
>
> Upgrade libthrift to 1.15.0 in order to avoid 
> https://nvd.nist.gov/vuln/detail/CVE-2020-13949.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37626) Upgrade libthrift to 0.15.0

2021-12-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37626:


Assignee: (was: Apache Spark)

> Upgrade libthrift to 0.15.0
> ---
>
> Key: SPARK-37626
> URL: https://issues.apache.org/jira/browse/SPARK-37626
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Bo Zhang
>Priority: Major
> Fix For: 3.3.0
>
>
> Upgrade libthrift to 1.15.0 in order to avoid 
> https://nvd.nist.gov/vuln/detail/CVE-2020-13949.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37626) Upgrade libthrift to 0.15.0

2021-12-12 Thread Bo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458178#comment-17458178
 ] 

Bo Zhang commented on SPARK-37626:
--

0.16.0 is not release yet. Could we upgrade to 0.15.0 first?

Here is the PR for that: https://github.com/apache/spark/pull/34878

> Upgrade libthrift to 0.15.0
> ---
>
> Key: SPARK-37626
> URL: https://issues.apache.org/jira/browse/SPARK-37626
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Bo Zhang
>Priority: Major
> Fix For: 3.3.0
>
>
> Upgrade libthrift to 1.15.0 in order to avoid 
> https://nvd.nist.gov/vuln/detail/CVE-2020-13949.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37626) Upgrade libthrift to 0.15.0

2021-12-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458177#comment-17458177
 ] 

Apache Spark commented on SPARK-37626:
--

User 'bozhang2820' has created a pull request for this issue:
https://github.com/apache/spark/pull/34878

> Upgrade libthrift to 0.15.0
> ---
>
> Key: SPARK-37626
> URL: https://issues.apache.org/jira/browse/SPARK-37626
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Bo Zhang
>Priority: Major
> Fix For: 3.3.0
>
>
> Upgrade libthrift to 1.15.0 in order to avoid 
> https://nvd.nist.gov/vuln/detail/CVE-2020-13949.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37626) Upgrade libthrift to 0.15.0

2021-12-12 Thread Bo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Zhang updated SPARK-37626:
-
Summary: Upgrade libthrift to 0.15.0  (was: Upgrade libthrift to 1.15.0)

> Upgrade libthrift to 0.15.0
> ---
>
> Key: SPARK-37626
> URL: https://issues.apache.org/jira/browse/SPARK-37626
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Bo Zhang
>Priority: Major
> Fix For: 3.3.0
>
>
> Upgrade libthrift to 1.15.0 in order to avoid 
> https://nvd.nist.gov/vuln/detail/CVE-2020-13949.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37626) Upgrade libthrift to 1.15.0

2021-12-12 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458173#comment-17458173
 ] 

Yuming Wang commented on SPARK-37626:
-

We need to upgrade to 0.16.0 because we need [this 
patch|https://github.com/apache/thrift/pull/2470].

> Upgrade libthrift to 1.15.0
> ---
>
> Key: SPARK-37626
> URL: https://issues.apache.org/jira/browse/SPARK-37626
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Bo Zhang
>Priority: Major
> Fix For: 3.3.0
>
>
> Upgrade libthrift to 1.15.0 in order to avoid 
> https://nvd.nist.gov/vuln/detail/CVE-2020-13949.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37577) ClassCastException: ArrayType cannot be cast to StructType

2021-12-12 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-37577:
--
Environment: (was: Py: 3.9)

> ClassCastException: ArrayType cannot be cast to StructType
> --
>
> Key: SPARK-37577
> URL: https://issues.apache.org/jira/browse/SPARK-37577
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Rafal Wojdyla
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.3.0
>
>
> Reproduction:
> {code:python}
> import pyspark.sql.functions as F
> from pyspark.sql.types import StructType, StructField, ArrayType, StringType
> t = StructType([StructField('o', ArrayType(StructType([StructField('s',
>StringType(), False), StructField('b',
>ArrayType(StructType([StructField('e', StringType(),
>False)]), True), False)]), True), False)])
> (
> spark.createDataFrame([], schema=t)
> .select(F.explode("o").alias("eo"))
> .select("eo.*")
> .select(F.explode("b"))
> .count()
> )
> {code}
> Code above works fine in 3.1.2, fails in 3.2.0. See stacktrace below. Note 
> that if you remove, field {{s}}, the code works fine, which is a bit 
> unexpected and likely a clue.
> {noformat}
> Py4JJavaError: An error occurred while calling o156.count.
> : java.lang.ClassCastException: class org.apache.spark.sql.types.ArrayType 
> cannot be cast to class org.apache.spark.sql.types.StructType 
> (org.apache.spark.sql.types.ArrayType and 
> org.apache.spark.sql.types.StructType are in unnamed module of loader 'app')
>   at 
> org.apache.spark.sql.catalyst.expressions.GetStructField.childSchema$lzycompute(complexTypeExtractors.scala:107)
>   at 
> org.apache.spark.sql.catalyst.expressions.GetStructField.childSchema(complexTypeExtractors.scala:107)
>   at 
> org.apache.spark.sql.catalyst.expressions.GetStructField.$anonfun$extractFieldName$1(complexTypeExtractors.scala:117)
>   at scala.Option.getOrElse(Option.scala:189)
>   at 
> org.apache.spark.sql.catalyst.expressions.GetStructField.extractFieldName(complexTypeExtractors.scala:117)
>   at 
> org.apache.spark.sql.catalyst.optimizer.GeneratorNestedColumnAliasing$$anonfun$1$$anonfun$2.applyOrElse(NestedColumnAliasing.scala:372)
>   at 
> org.apache.spark.sql.catalyst.optimizer.GeneratorNestedColumnAliasing$$anonfun$1$$anonfun$2.applyOrElse(NestedColumnAliasing.scala:368)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$4(TreeNode.scala:539)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:539)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:508)
>   at 
> org.apache.spark.sql.catalyst.optimizer.GeneratorNestedColumnAliasing$$anonfun$1.applyOrElse(NestedColumnAliasing.scala:368)
>   at 
> org.apache.spark.sql.catalyst.optimizer.GeneratorNestedColumnAliasing$$anonfun$1.applyOrElse(NestedColumnAliasing.scala:366)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsDownWithPruning$1(QueryPlan.scala:152)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsDownWithPruning(QueryPlan.scala:152)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsWithPruning(QueryPlan.scala:123)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressions(QueryPlan.scala:101)
>   at 
> org.apache.spark.sql.catalyst.optimizer.GeneratorNestedColumnAliasing$.unapply(NestedColumnAliasing.scala:366)
>   at 
> 

[jira] [Resolved] (SPARK-37577) ClassCastException: ArrayType cannot be cast to StructType

2021-12-12 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-37577.
---
Fix Version/s: 3.3.0
 Assignee: L. C. Hsieh
   Resolution: Fixed

This is resolved via https://github.com/apache/spark/pull/34845

> ClassCastException: ArrayType cannot be cast to StructType
> --
>
> Key: SPARK-37577
> URL: https://issues.apache.org/jira/browse/SPARK-37577
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
> Environment: Py: 3.9
>Reporter: Rafal Wojdyla
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.3.0
>
>
> Reproduction:
> {code:python}
> import pyspark.sql.functions as F
> from pyspark.sql.types import StructType, StructField, ArrayType, StringType
> t = StructType([StructField('o', ArrayType(StructType([StructField('s',
>StringType(), False), StructField('b',
>ArrayType(StructType([StructField('e', StringType(),
>False)]), True), False)]), True), False)])
> (
> spark.createDataFrame([], schema=t)
> .select(F.explode("o").alias("eo"))
> .select("eo.*")
> .select(F.explode("b"))
> .count()
> )
> {code}
> Code above works fine in 3.1.2, fails in 3.2.0. See stacktrace below. Note 
> that if you remove, field {{s}}, the code works fine, which is a bit 
> unexpected and likely a clue.
> {noformat}
> Py4JJavaError: An error occurred while calling o156.count.
> : java.lang.ClassCastException: class org.apache.spark.sql.types.ArrayType 
> cannot be cast to class org.apache.spark.sql.types.StructType 
> (org.apache.spark.sql.types.ArrayType and 
> org.apache.spark.sql.types.StructType are in unnamed module of loader 'app')
>   at 
> org.apache.spark.sql.catalyst.expressions.GetStructField.childSchema$lzycompute(complexTypeExtractors.scala:107)
>   at 
> org.apache.spark.sql.catalyst.expressions.GetStructField.childSchema(complexTypeExtractors.scala:107)
>   at 
> org.apache.spark.sql.catalyst.expressions.GetStructField.$anonfun$extractFieldName$1(complexTypeExtractors.scala:117)
>   at scala.Option.getOrElse(Option.scala:189)
>   at 
> org.apache.spark.sql.catalyst.expressions.GetStructField.extractFieldName(complexTypeExtractors.scala:117)
>   at 
> org.apache.spark.sql.catalyst.optimizer.GeneratorNestedColumnAliasing$$anonfun$1$$anonfun$2.applyOrElse(NestedColumnAliasing.scala:372)
>   at 
> org.apache.spark.sql.catalyst.optimizer.GeneratorNestedColumnAliasing$$anonfun$1$$anonfun$2.applyOrElse(NestedColumnAliasing.scala:368)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$4(TreeNode.scala:539)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:539)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:508)
>   at 
> org.apache.spark.sql.catalyst.optimizer.GeneratorNestedColumnAliasing$$anonfun$1.applyOrElse(NestedColumnAliasing.scala:368)
>   at 
> org.apache.spark.sql.catalyst.optimizer.GeneratorNestedColumnAliasing$$anonfun$1.applyOrElse(NestedColumnAliasing.scala:366)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsDownWithPruning$1(QueryPlan.scala:152)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsDownWithPruning(QueryPlan.scala:152)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsWithPruning(QueryPlan.scala:123)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressions(QueryPlan.scala:101)
>   at 
> 

[jira] [Created] (SPARK-37626) Upgrade libthrift to 1.15.0

2021-12-12 Thread Bo Zhang (Jira)
Bo Zhang created SPARK-37626:


 Summary: Upgrade libthrift to 1.15.0
 Key: SPARK-37626
 URL: https://issues.apache.org/jira/browse/SPARK-37626
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.3.0
Reporter: Bo Zhang
 Fix For: 3.3.0


Upgrade libthrift to 1.15.0 in order to avoid 
https://nvd.nist.gov/vuln/detail/CVE-2020-13949.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37625) update log4j to 2.15

2021-12-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458164#comment-17458164
 ] 

Apache Spark commented on SPARK-37625:
--

User 'qwe' has created a pull request for this issue:
https://github.com/apache/spark/pull/34877

> update log4j to 2.15 
> -
>
> Key: SPARK-37625
> URL: https://issues.apache.org/jira/browse/SPARK-37625
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: weifeng zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37625) update log4j to 2.15

2021-12-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37625:


Assignee: Apache Spark

> update log4j to 2.15 
> -
>
> Key: SPARK-37625
> URL: https://issues.apache.org/jira/browse/SPARK-37625
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: weifeng zhang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37625) update log4j to 2.15

2021-12-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458163#comment-17458163
 ] 

Apache Spark commented on SPARK-37625:
--

User 'qwe' has created a pull request for this issue:
https://github.com/apache/spark/pull/34877

> update log4j to 2.15 
> -
>
> Key: SPARK-37625
> URL: https://issues.apache.org/jira/browse/SPARK-37625
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: weifeng zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37625) update log4j to 2.15

2021-12-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37625:


Assignee: (was: Apache Spark)

> update log4j to 2.15 
> -
>
> Key: SPARK-37625
> URL: https://issues.apache.org/jira/browse/SPARK-37625
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: weifeng zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37625) update log4j to 2.15

2021-12-12 Thread weifeng zhang (Jira)
weifeng zhang created SPARK-37625:
-

 Summary: update log4j to 2.15 
 Key: SPARK-37625
 URL: https://issues.apache.org/jira/browse/SPARK-37625
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.2.0
Reporter: weifeng zhang






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-36571) Optimized FileOutputCommitter with StagingDir

2021-12-12 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu reopened SPARK-36571:
---

> Optimized FileOutputCommitter with StagingDir
> -
>
> Key: SPARK-36571
> URL: https://issues.apache.org/jira/browse/SPARK-36571
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36995) Add new SQLFileCommitProtocol

2021-12-12 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu resolved SPARK-36995.
---
Resolution: Duplicate

> Add new SQLFileCommitProtocol
> -
>
> Key: SPARK-36995
> URL: https://issues.apache.org/jira/browse/SPARK-36995
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36571) Optimized FileOutputCommitter with StagingDir

2021-12-12 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu resolved SPARK-36571.
---
Resolution: Duplicate

> Optimized FileOutputCommitter with StagingDir
> -
>
> Key: SPARK-36571
> URL: https://issues.apache.org/jira/browse/SPARK-36571
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37624) Suppress warnings for live pandas-on-Spark quickstart notebooks

2021-12-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37624:


Assignee: (was: Apache Spark)

> Suppress warnings for live pandas-on-Spark quickstart notebooks
> ---
>
> Key: SPARK-37624
> URL: https://issues.apache.org/jira/browse/SPARK-37624
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Minor
> Attachments: Screen Shot 2021-12-13 at 1.32.05 PM.png, Screen Shot 
> 2021-12-13 at 2.02.20 PM.png, Screen Shot 2021-12-13 at 2.02.25 PM.png, 
> Screen Shot 2021-12-13 at 2.02.45 PM.png
>
>
> https://mybinder.org/v2/gh/apache/spark/9e614e265f?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_ps.ipynb
>  shows a bunch of warnings. We should better hide them as it's a quick start 
> guide for users.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37624) Suppress warnings for live pandas-on-Spark quickstart notebooks

2021-12-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458153#comment-17458153
 ] 

Apache Spark commented on SPARK-37624:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/34875

> Suppress warnings for live pandas-on-Spark quickstart notebooks
> ---
>
> Key: SPARK-37624
> URL: https://issues.apache.org/jira/browse/SPARK-37624
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Minor
> Attachments: Screen Shot 2021-12-13 at 1.32.05 PM.png, Screen Shot 
> 2021-12-13 at 2.02.20 PM.png, Screen Shot 2021-12-13 at 2.02.25 PM.png, 
> Screen Shot 2021-12-13 at 2.02.45 PM.png
>
>
> https://mybinder.org/v2/gh/apache/spark/9e614e265f?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_ps.ipynb
>  shows a bunch of warnings. We should better hide them as it's a quick start 
> guide for users.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37624) Suppress warnings for live pandas-on-Spark quickstart notebooks

2021-12-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37624:


Assignee: Apache Spark

> Suppress warnings for live pandas-on-Spark quickstart notebooks
> ---
>
> Key: SPARK-37624
> URL: https://issues.apache.org/jira/browse/SPARK-37624
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Minor
> Attachments: Screen Shot 2021-12-13 at 1.32.05 PM.png, Screen Shot 
> 2021-12-13 at 2.02.20 PM.png, Screen Shot 2021-12-13 at 2.02.25 PM.png, 
> Screen Shot 2021-12-13 at 2.02.45 PM.png
>
>
> https://mybinder.org/v2/gh/apache/spark/9e614e265f?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_ps.ipynb
>  shows a bunch of warnings. We should better hide them as it's a quick start 
> guide for users.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37624) Suppress warnings for live pandas-on-Spark quickstart notebooks

2021-12-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37624:
-
Summary: Suppress warnings for live pandas-on-Spark quickstart notebooks  
(was: Surppress warnings for live pandas-on-Spark quickstart notebooks)

> Suppress warnings for live pandas-on-Spark quickstart notebooks
> ---
>
> Key: SPARK-37624
> URL: https://issues.apache.org/jira/browse/SPARK-37624
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Minor
> Attachments: Screen Shot 2021-12-13 at 1.32.05 PM.png, Screen Shot 
> 2021-12-13 at 2.02.20 PM.png, Screen Shot 2021-12-13 at 2.02.25 PM.png, 
> Screen Shot 2021-12-13 at 2.02.45 PM.png
>
>
> https://mybinder.org/v2/gh/apache/spark/9e614e265f?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_ps.ipynb
>  shows a bunch of warnings. We should better hide them as it's a quick start 
> guide for users.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37623) Support ANSI Aggregate Function: regr_slope & regr_intercept

2021-12-12 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-37623:
---
Summary: Support ANSI Aggregate Function: regr_slope & regr_intercept  
(was: Support ANSI Aggregate Function: regr_slope)

> Support ANSI Aggregate Function: regr_slope & regr_intercept
> 
>
> Key: SPARK-37623
> URL: https://issues.apache.org/jira/browse/SPARK-37623
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> REGR_SLOPE is an ANSI aggregate functions. many database support it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37624) Surppress warnings for live pandas-on-Spark quickstart notebooks

2021-12-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37624:
-
Attachment: Screen Shot 2021-12-13 at 2.02.25 PM.png

> Surppress warnings for live pandas-on-Spark quickstart notebooks
> 
>
> Key: SPARK-37624
> URL: https://issues.apache.org/jira/browse/SPARK-37624
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Minor
> Attachments: Screen Shot 2021-12-13 at 1.32.05 PM.png, Screen Shot 
> 2021-12-13 at 2.02.20 PM.png, Screen Shot 2021-12-13 at 2.02.25 PM.png, 
> Screen Shot 2021-12-13 at 2.02.45 PM.png
>
>
> https://mybinder.org/v2/gh/apache/spark/9e614e265f?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_ps.ipynb
>  shows a bunch of warnings. We should better hide them as it's a quick start 
> guide for users.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37624) Surppress warnings for live pandas-on-Spark quickstart notebooks

2021-12-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37624:
-
Attachment: Screen Shot 2021-12-13 at 1.32.05 PM.png

> Surppress warnings for live pandas-on-Spark quickstart notebooks
> 
>
> Key: SPARK-37624
> URL: https://issues.apache.org/jira/browse/SPARK-37624
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Minor
> Attachments: Screen Shot 2021-12-13 at 1.32.05 PM.png, Screen Shot 
> 2021-12-13 at 2.02.20 PM.png, Screen Shot 2021-12-13 at 2.02.25 PM.png, 
> Screen Shot 2021-12-13 at 2.02.45 PM.png
>
>
> https://mybinder.org/v2/gh/apache/spark/9e614e265f?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_ps.ipynb
>  shows a bunch of warnings. We should better hide them as it's a quick start 
> guide for users.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37624) Surppress warnings for live pandas-on-Spark quickstart notebooks

2021-12-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37624:
-
Attachment: Screen Shot 2021-12-13 at 2.02.45 PM.png

> Surppress warnings for live pandas-on-Spark quickstart notebooks
> 
>
> Key: SPARK-37624
> URL: https://issues.apache.org/jira/browse/SPARK-37624
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Minor
> Attachments: Screen Shot 2021-12-13 at 1.32.05 PM.png, Screen Shot 
> 2021-12-13 at 2.02.20 PM.png, Screen Shot 2021-12-13 at 2.02.25 PM.png, 
> Screen Shot 2021-12-13 at 2.02.45 PM.png
>
>
> https://mybinder.org/v2/gh/apache/spark/9e614e265f?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_ps.ipynb
>  shows a bunch of warnings. We should better hide them as it's a quick start 
> guide for users.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37624) Surppress warnings for live pandas-on-Spark quickstart notebooks

2021-12-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37624:
-
Priority: Minor  (was: Major)

> Surppress warnings for live pandas-on-Spark quickstart notebooks
> 
>
> Key: SPARK-37624
> URL: https://issues.apache.org/jira/browse/SPARK-37624
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Minor
> Attachments: Screen Shot 2021-12-13 at 1.32.05 PM.png, Screen Shot 
> 2021-12-13 at 2.02.20 PM.png, Screen Shot 2021-12-13 at 2.02.25 PM.png, 
> Screen Shot 2021-12-13 at 2.02.45 PM.png
>
>
> https://mybinder.org/v2/gh/apache/spark/9e614e265f?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_ps.ipynb
>  shows a bunch of warnings. We should better hide them as it's a quick start 
> guide for users.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37624) Surppress warnings for live pandas-on-Spark quickstart notebooks

2021-12-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37624:
-
Attachment: Screen Shot 2021-12-13 at 2.02.20 PM.png

> Surppress warnings for live pandas-on-Spark quickstart notebooks
> 
>
> Key: SPARK-37624
> URL: https://issues.apache.org/jira/browse/SPARK-37624
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Minor
> Attachments: Screen Shot 2021-12-13 at 1.32.05 PM.png, Screen Shot 
> 2021-12-13 at 2.02.20 PM.png, Screen Shot 2021-12-13 at 2.02.25 PM.png, 
> Screen Shot 2021-12-13 at 2.02.45 PM.png
>
>
> https://mybinder.org/v2/gh/apache/spark/9e614e265f?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_ps.ipynb
>  shows a bunch of warnings. We should better hide them as it's a quick start 
> guide for users.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37624) Surppress warnings for live pandas-on-Spark quickstart notebooks

2021-12-12 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-37624:


 Summary: Surppress warnings for live pandas-on-Spark quickstart 
notebooks
 Key: SPARK-37624
 URL: https://issues.apache.org/jira/browse/SPARK-37624
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, PySpark
Affects Versions: 3.2.0, 3.3.0
Reporter: Hyukjin Kwon


https://mybinder.org/v2/gh/apache/spark/9e614e265f?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_ps.ipynb
 shows a bunch of warnings. We should better hide them as it's a quick start 
guide for users.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37569) View Analysis incorrectly marks nested fields as nullable

2021-12-12 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-37569.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34839
[https://github.com/apache/spark/pull/34839]

> View Analysis incorrectly marks nested fields as nullable
> -
>
> Key: SPARK-37569
> URL: https://issues.apache.org/jira/browse/SPARK-37569
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Shardul Mahadik
>Assignee: Shardul Mahadik
>Priority: Major
> Fix For: 3.3.0
>
>
> Consider a view as follows with all fields non-nullable (required)
> {code:java}
> spark.sql("""
> CREATE OR REPLACE VIEW v AS 
> SELECT id, named_struct('a', id) AS nested
> FROM RANGE(10)
> """)
> {code}
> we can see that the view schema has been correctly stored as non-nullable
> {code:java}
> scala> 
> System.out.println(spark.sessionState.catalog.externalCatalog.getTable("default",
>  "v2"))
> CatalogTable(
> Database: default
> Table: v2
> Owner: smahadik
> Created Time: Tue Dec 07 09:00:42 PST 2021
> Last Access: UNKNOWN
> Created By: Spark 3.3.0-SNAPSHOT
> Type: VIEW
> View Text: SELECT id, named_struct('a', id) AS nested
> FROM RANGE(10)
> View Original Text: SELECT id, named_struct('a', id) AS nested
> FROM RANGE(10)
> View Catalog and Namespace: spark_catalog.default
> View Query Output Columns: [id, nested]
> Table Properties: [transient_lastDdlTime=1638896442]
> Serde Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> InputFormat: org.apache.hadoop.mapred.SequenceFileInputFormat
> OutputFormat: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> Storage Properties: [serialization.format=1]
> Schema: root
>  |-- id: long (nullable = false)
>  |-- nested: struct (nullable = false)
>  ||-- a: long (nullable = false)
> )
> {code}
> However, when trying to read this view, it incorrectly marks nested column 
> {{a}} as nullable
> {code:java}
> scala> spark.table("v2").printSchema
> root
>  |-- id: long (nullable = false)
>  |-- nested: struct (nullable = false)
>  ||-- a: long (nullable = true)
> {code}
> This is caused by [this 
> line|https://github.com/apache/spark/blob/fb40c0e19f84f2de9a3d69d809e9e4031f76ef90/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L3546]
>  in Analyzer.scala. Going through the history of changes for this block of 
> code, it seems like {{asNullable}} is a remnant of a time before we added 
> [checks|https://github.com/apache/spark/blob/fb40c0e19f84f2de9a3d69d809e9e4031f76ef90/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L3543]
>  to ensure that the from and to types of the cast were compatible. As 
> nullability is already checked, it should be safe to add a cast without 
> converting the target datatype to nullable.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37569) View Analysis incorrectly marks nested fields as nullable

2021-12-12 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-37569:
---

Assignee: Shardul Mahadik

> View Analysis incorrectly marks nested fields as nullable
> -
>
> Key: SPARK-37569
> URL: https://issues.apache.org/jira/browse/SPARK-37569
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Shardul Mahadik
>Assignee: Shardul Mahadik
>Priority: Major
>
> Consider a view as follows with all fields non-nullable (required)
> {code:java}
> spark.sql("""
> CREATE OR REPLACE VIEW v AS 
> SELECT id, named_struct('a', id) AS nested
> FROM RANGE(10)
> """)
> {code}
> we can see that the view schema has been correctly stored as non-nullable
> {code:java}
> scala> 
> System.out.println(spark.sessionState.catalog.externalCatalog.getTable("default",
>  "v2"))
> CatalogTable(
> Database: default
> Table: v2
> Owner: smahadik
> Created Time: Tue Dec 07 09:00:42 PST 2021
> Last Access: UNKNOWN
> Created By: Spark 3.3.0-SNAPSHOT
> Type: VIEW
> View Text: SELECT id, named_struct('a', id) AS nested
> FROM RANGE(10)
> View Original Text: SELECT id, named_struct('a', id) AS nested
> FROM RANGE(10)
> View Catalog and Namespace: spark_catalog.default
> View Query Output Columns: [id, nested]
> Table Properties: [transient_lastDdlTime=1638896442]
> Serde Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> InputFormat: org.apache.hadoop.mapred.SequenceFileInputFormat
> OutputFormat: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> Storage Properties: [serialization.format=1]
> Schema: root
>  |-- id: long (nullable = false)
>  |-- nested: struct (nullable = false)
>  ||-- a: long (nullable = false)
> )
> {code}
> However, when trying to read this view, it incorrectly marks nested column 
> {{a}} as nullable
> {code:java}
> scala> spark.table("v2").printSchema
> root
>  |-- id: long (nullable = false)
>  |-- nested: struct (nullable = false)
>  ||-- a: long (nullable = true)
> {code}
> This is caused by [this 
> line|https://github.com/apache/spark/blob/fb40c0e19f84f2de9a3d69d809e9e4031f76ef90/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L3546]
>  in Analyzer.scala. Going through the history of changes for this block of 
> code, it seems like {{asNullable}} is a remnant of a time before we added 
> [checks|https://github.com/apache/spark/blob/fb40c0e19f84f2de9a3d69d809e9e4031f76ef90/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L3543]
>  to ensure that the from and to types of the cast were compatible. As 
> nullability is already checked, it should be safe to add a cast without 
> converting the target datatype to nullable.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37590) Unify v1 and v2 ALTER NAMESPACE ... SET PROPERTIES tests

2021-12-12 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-37590.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34842
[https://github.com/apache/spark/pull/34842]

> Unify v1 and v2 ALTER NAMESPACE ... SET PROPERTIES tests
> 
>
> Key: SPARK-37590
> URL: https://issues.apache.org/jira/browse/SPARK-37590
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.3.0
>
>
> Unify v1 and v2 ALTER NAMESPACE ... SET PROPERTIES tests



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37590) Unify v1 and v2 ALTER NAMESPACE ... SET PROPERTIES tests

2021-12-12 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-37590:
---

Assignee: Terry Kim

> Unify v1 and v2 ALTER NAMESPACE ... SET PROPERTIES tests
> 
>
> Key: SPARK-37590
> URL: https://issues.apache.org/jira/browse/SPARK-37590
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
>
> Unify v1 and v2 ALTER NAMESPACE ... SET PROPERTIES tests



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37300) TaskSchedulerImpl should ignore task finished event if its task was already finished state

2021-12-12 Thread wuyi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuyi resolved SPARK-37300.
--
Fix Version/s: 3.3.0
 Assignee: hujiahua
   Resolution: Fixed

Issue resolved by https://github.com/apache/spark/pull/34578

> TaskSchedulerImpl should ignore task finished event if its task was already 
> finished state
> --
>
> Key: SPARK-37300
> URL: https://issues.apache.org/jira/browse/SPARK-37300
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: hujiahua
>Assignee: hujiahua
>Priority: Major
> Fix For: 3.3.0
>
>
> `TaskSchedulerImpl` handle task finished event at `handleSuccessfulTask` and 
> `handleFailedTask` , but in some case the task was already finished state, 
> which we should ignore task finished event.
> Case describe: 
> when a executor finished a task of some stage, the driver will receive a 
> StatusUpdate event to handle it. At the same time the driver found the 
> executor heartbeat timed out, so the dirver also need handle ExecutorLost 
> event simultaneously. There was a race condition issues here, which will make 
> TaskSetManager.successful and TaskSetManager.tasksSuccessful wrong result. 
> More detailed description and discussion can be viewed at 
> https://issues.apache.org/jira/browse/SPARK-36575 and 
> https://github.com/apache/spark/pull/33872



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37623) Support ANSI Aggregate Function: regr_slope

2021-12-12 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458132#comment-17458132
 ] 

jiaan.geng commented on SPARK-37623:


I'm working on.

> Support ANSI Aggregate Function: regr_slope
> ---
>
> Key: SPARK-37623
> URL: https://issues.apache.org/jira/browse/SPARK-37623
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> REGR_SLOPE is an ANSI aggregate functions. many database support it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37623) Support ANSI Aggregate Function: regr_slope

2021-12-12 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-37623:
--

 Summary: Support ANSI Aggregate Function: regr_slope
 Key: SPARK-37623
 URL: https://issues.apache.org/jira/browse/SPARK-37623
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.3.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37623) Support ANSI Aggregate Function: regr_slope

2021-12-12 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-37623:
---
Description: REGR_SLOPE is an ANSI aggregate functions. many database 
support it.

> Support ANSI Aggregate Function: regr_slope
> ---
>
> Key: SPARK-37623
> URL: https://issues.apache.org/jira/browse/SPARK-37623
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> REGR_SLOPE is an ANSI aggregate functions. many database support it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37481) Disappearance of skipped stages mislead the bug hunting

2021-12-12 Thread wuyi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458131#comment-17458131
 ] 

wuyi commented on SPARK-37481:
--

Backport fix to 3.1/3.0 is still in progress.

> Disappearance of skipped stages mislead the bug hunting 
> 
>
> Key: SPARK-37481
> URL: https://issues.apache.org/jira/browse/SPARK-37481
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2, 3.2.0, 3.3.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.2.0, 3.3.0
>
>
> # 
>  ## With FetchFailedException and Map Stage Retries
> When rerunning spark-sql shell with the original SQL in 
> [https://gist.github.com/yaooqinn/6acb7b74b343a6a6dffe8401f6b7b45c#gistcomment-3977315]
> !https://user-images.githubusercontent.com/8326978/143821530-ff498caa-abce-483d-a24b-315aacf7e0a0.png!
> 1. stage 3 threw FetchFailedException and caused itself and its parent 
> stage(stage 2) to retry
> 2. stage 2 was skipped before but its attemptId was still 0, so when its 
> retry happened it got removed from `Skipped Stages` 
> The DAG of Job 2 doesn't show that stage 2 is skipped anymore.
> !https://user-images.githubusercontent.com/8326978/143824666-6390b64a-a45b-4bc8-b05d-c5abbb28cdef.png!
> Besides, a retried stage usually has a subset of tasks from the original 
> stage. If we mark it as an original one, the metrics might lead us into 
> pitfalls.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37481) Disappearance of skipped stages mislead the bug hunting

2021-12-12 Thread wuyi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuyi resolved SPARK-37481.
--
Fix Version/s: 3.3.0
   3.2.0
 Assignee: Kent Yao
   Resolution: Fixed

Issue resolved by https://github.com/apache/spark/pull/34735

> Disappearance of skipped stages mislead the bug hunting 
> 
>
> Key: SPARK-37481
> URL: https://issues.apache.org/jira/browse/SPARK-37481
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2, 3.2.0, 3.3.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.3.0, 3.2.0
>
>
> # 
>  ## With FetchFailedException and Map Stage Retries
> When rerunning spark-sql shell with the original SQL in 
> [https://gist.github.com/yaooqinn/6acb7b74b343a6a6dffe8401f6b7b45c#gistcomment-3977315]
> !https://user-images.githubusercontent.com/8326978/143821530-ff498caa-abce-483d-a24b-315aacf7e0a0.png!
> 1. stage 3 threw FetchFailedException and caused itself and its parent 
> stage(stage 2) to retry
> 2. stage 2 was skipped before but its attemptId was still 0, so when its 
> retry happened it got removed from `Skipped Stages` 
> The DAG of Job 2 doesn't show that stage 2 is skipped anymore.
> !https://user-images.githubusercontent.com/8326978/143824666-6390b64a-a45b-4bc8-b05d-c5abbb28cdef.png!
> Besides, a retried stage usually has a subset of tasks from the original 
> stage. If we mark it as an original one, the metrics might lead us into 
> pitfalls.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37620) Use more precise types for SparkContext Optional fields (i.e. _gateway, _jvm)

2021-12-12 Thread Byron Hsu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458128#comment-17458128
 ] 

Byron Hsu commented on SPARK-37620:
---

i will work on it

> Use more precise types for SparkContext Optional fields  (i.e. _gateway, _jvm)
> --
>
> Key: SPARK-37620
> URL: https://issues.apache.org/jira/browse/SPARK-37620
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> As a part of SPARK-37152 [we 
> agreed|https://github.com/apache/spark/pull/34466/files#r762609181] to keep 
> these typed as not {{Optional}} for simplicity, but this just a temporary 
> solution to move things forward, until we decide on best approach to handle 
> such cases.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37369) Avoid redundant ColumnarToRow transistion on InMemoryTableScan

2021-12-12 Thread L. C. Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh resolved SPARK-37369.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34642
[https://github.com/apache/spark/pull/34642]

> Avoid redundant ColumnarToRow transistion on InMemoryTableScan
> --
>
> Key: SPARK-37369
> URL: https://issues.apache.org/jira/browse/SPARK-37369
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.3.0
>
>
> We have a rule to insert columnar transition between row-based and columnar 
> query plans. InMemoryTableScanExec can produce columnar output. So if its 
> parent plan isn't columnar, the rule adds a ColumnarToRow between them.
> But InMemoryTableScanExec is a special query plan because it can convert from 
> cached batch to columnar batch or row. 
> For such case, we ask InMemoryTableScanExec to convert cached batch to 
> columnar batch, and then convert to row in the added ColumnarToRow, before 
> the parent query.
> So for such case, we can simply ask InMemoryTableScanExec to produce row 
> output instead of a redundant conversion.
> ```
>+- Union   
>   
>  
>   :- ColumnarToRow
>   
>  
>   :  +- InMemoryTableScan [i#8, j#9]  
>   
>  
>   :+- InMemoryRelation [i#8, j#9], StorageLevel(disk, 
> memory, deserialized, 1 replicas) 
> ```



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37369) Avoid redundant ColumnarToRow transistion on InMemoryTableScan

2021-12-12 Thread L. C. Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh reassigned SPARK-37369:
---

Assignee: L. C. Hsieh

> Avoid redundant ColumnarToRow transistion on InMemoryTableScan
> --
>
> Key: SPARK-37369
> URL: https://issues.apache.org/jira/browse/SPARK-37369
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> We have a rule to insert columnar transition between row-based and columnar 
> query plans. InMemoryTableScanExec can produce columnar output. So if its 
> parent plan isn't columnar, the rule adds a ColumnarToRow between them.
> But InMemoryTableScanExec is a special query plan because it can convert from 
> cached batch to columnar batch or row. 
> For such case, we ask InMemoryTableScanExec to convert cached batch to 
> columnar batch, and then convert to row in the added ColumnarToRow, before 
> the parent query.
> So for such case, we can simply ask InMemoryTableScanExec to produce row 
> output instead of a redundant conversion.
> ```
>+- Union   
>   
>  
>   :- ColumnarToRow
>   
>  
>   :  +- InMemoryTableScan [i#8, j#9]  
>   
>  
>   :+- InMemoryRelation [i#8, j#9], StorageLevel(disk, 
> memory, deserialized, 1 replicas) 
> ```



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37622) Support K8s executor rolling policy

2021-12-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458116#comment-17458116
 ] 

Apache Spark commented on SPARK-37622:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/34874

> Support K8s executor rolling policy
> ---
>
> Key: SPARK-37622
> URL: https://issues.apache.org/jira/browse/SPARK-37622
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37622) Support K8s executor rolling policy

2021-12-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37622:


Assignee: (was: Apache Spark)

> Support K8s executor rolling policy
> ---
>
> Key: SPARK-37622
> URL: https://issues.apache.org/jira/browse/SPARK-37622
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37622) Support K8s executor rolling policy

2021-12-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37622:


Assignee: Apache Spark

> Support K8s executor rolling policy
> ---
>
> Key: SPARK-37622
> URL: https://issues.apache.org/jira/browse/SPARK-37622
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37622) Support K8s executor rolling policy

2021-12-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458115#comment-17458115
 ] 

Apache Spark commented on SPARK-37622:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/34874

> Support K8s executor rolling policy
> ---
>
> Key: SPARK-37622
> URL: https://issues.apache.org/jira/browse/SPARK-37622
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36038) Basic speculation metrics at stage level

2021-12-12 Thread Kousuke Saruta (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-36038.

  Assignee: Thejdeep Gudivada
Resolution: Fixed

Issue resolved in https://github.com/apache/spark/pull/34607

> Basic speculation metrics at stage level
> 
>
> Key: SPARK-36038
> URL: https://issues.apache.org/jira/browse/SPARK-36038
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Venkata krishnan Sowrirajan
>Assignee: Thejdeep Gudivada
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently there are no speculation metrics available either at application 
> level or at stage level. With in our platform, we have added speculation 
> metrics at stage level as a summary similarly to the stage level metrics 
> tracking numTotalSpeculated, numCompleted (successful), numFailed, numKilled 
> etc. This enables us to effectively understand speculative execution feature 
> at an application level and helps in further tuning the speculation configs.
> cc [~ron8hu]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37622) Support K8s executor rolling policy

2021-12-12 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-37622:
-

 Summary: Support K8s executor rolling policy
 Key: SPARK-37622
 URL: https://issues.apache.org/jira/browse/SPARK-37622
 Project: Spark
  Issue Type: New Feature
  Components: Kubernetes
Affects Versions: 3.3.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37619) Upgrade Maven to 3.8.4

2021-12-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-37619:


Assignee: Dongjoon Hyun

> Upgrade Maven to 3.8.4
> --
>
> Key: SPARK-37619
> URL: https://issues.apache.org/jira/browse/SPARK-37619
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37619) Upgrade Maven to 3.8.4

2021-12-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37619.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34873
[https://github.com/apache/spark/pull/34873]

> Upgrade Maven to 3.8.4
> --
>
> Key: SPARK-37619
> URL: https://issues.apache.org/jira/browse/SPARK-37619
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37603) org.apache.spark.sql.catalyst.expressions.ListQuery; no valid constructor

2021-12-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37603.
--
Resolution: Cannot Reproduce

> org.apache.spark.sql.catalyst.expressions.ListQuery; no valid constructor
> -
>
> Key: SPARK-37603
> URL: https://issues.apache.org/jira/browse/SPARK-37603
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: huangweiguo
>Priority: Major
>
> I don't use the class org.apache.spark.sql.catalyst.expressions.ListQuery, 
> but it 
> serialize exception (maybe):
>  
> {code:java}
> //stack
> Caused by: java.io.InvalidClassException: 
> org.apache.spark.sql.catalyst.expressions.ListQuery; no valid constructor
>   at 
> java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:150)
>   at 
> java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:790)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2001)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
>   at 
> 

[jira] [Commented] (SPARK-37607) Refactor antlr4 syntax file

2021-12-12 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458105#comment-17458105
 ] 

Hyukjin Kwon commented on SPARK-37607:
--

[~melin] can you double check the link? it shows no page found.

> Refactor antlr4 syntax file
> ---
>
> Key: SPARK-37607
> URL: https://issues.apache.org/jira/browse/SPARK-37607
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: melin
>Priority: Major
>
> Refer to the ShardingSphere project, split into multiple files by type, very 
> clear
> https://github.com/apache/shardingsphere/tree/master/shardingsphere-sql-parser/shardingsphere-sql-parser-dialect/shardin
>  gsphere-sql-parser-mysql/src/main/antlr4/imports/mysql



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37608) Can anyone help me to compile a scala file

2021-12-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37608.
--
Resolution: Invalid

> Can anyone help me to compile a scala file
> --
>
> Key: SPARK-37608
> URL: https://issues.apache.org/jira/browse/SPARK-37608
> Project: Spark
>  Issue Type: Question
>  Components: SQL
>Affects Versions: 2.2.3
> Environment: I am currently using spark 2.3.2.3.1.5.0-152 and Scala 
> 2.11.12
>Reporter: Gary Liu
>Priority: Major
>
> I am working with spark SQL, trying to read data from SAS using JDBC. I have 
> to use customized JdbcDialect to handle special format of the SAS data. Here 
> is the code:
> {code:java}
> import org.apache.spark.sql.jdbc.JdbcDialectobject 
> object SasDialect extends JdbcDialect {
>    override def canHandle(url: String): Boolean = 
> url.startsWith("jdbc:sharenet")
>    override def quoteIdentifier(colName: String): String = "\"" + colName + 
> "\"n" 
> } 
> {code}
> It worked well in Scala, but my major language is Python, so I need to 
> compile this into jar file, so I can call it from python, like
> {code:python}
> from py4j.java_gateway import java_import
> gw = spark.sparkContext._gateway
> java_import(gw.jvm, "com.me.SasJDBCDialect")
> gw.jvm.org.apache.spark.sql.jdbc.JdbcDialects.registerDialect(
>   gw.jvm.com.me.SasJDBCDialect()){code}
> But I am a newbie of Scala, and don't know how to make it a jar file so it 
> can be call as "com.me.MyJDBCDialect". Can anyone here do me a favour to 
> generate a jar file for me?
>  
> Thanks!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37607) Refactor antlr4 syntax file

2021-12-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37607:
-
Summary: Refactor antlr4 syntax file  (was: Optimized antlr4 syntax file)

> Refactor antlr4 syntax file
> ---
>
> Key: SPARK-37607
> URL: https://issues.apache.org/jira/browse/SPARK-37607
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: melin
>Priority: Major
>
> Refer to the ShardingSphere project, split into multiple files by type, very 
> clear
> https://github.com/apache/shardingsphere/tree/master/shardingsphere-sql-parser/shardingsphere-sql-parser-dialect/shardin
>  gsphere-sql-parser-mysql/src/main/antlr4/imports/mysql



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37608) Can anyone help me to compile a scala file

2021-12-12 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458104#comment-17458104
 ] 

Hyukjin Kwon commented on SPARK-37608:
--

For questions, let's discuss in Spark mailing list instead of filing it as a 
ticket - you would be able to get better answers there.

> Can anyone help me to compile a scala file
> --
>
> Key: SPARK-37608
> URL: https://issues.apache.org/jira/browse/SPARK-37608
> Project: Spark
>  Issue Type: Question
>  Components: SQL
>Affects Versions: 2.2.3
> Environment: I am currently using spark 2.3.2.3.1.5.0-152 and Scala 
> 2.11.12
>Reporter: Gary Liu
>Priority: Major
>
> I am working with spark SQL, trying to read data from SAS using JDBC. I have 
> to use customized JdbcDialect to handle special format of the SAS data. Here 
> is the code:
> {code:java}
> import org.apache.spark.sql.jdbc.JdbcDialectobject 
> object SasDialect extends JdbcDialect {
>    override def canHandle(url: String): Boolean = 
> url.startsWith("jdbc:sharenet")
>    override def quoteIdentifier(colName: String): String = "\"" + colName + 
> "\"n" 
> } 
> {code}
> It worked well in Scala, but my major language is Python, so I need to 
> compile this into jar file, so I can call it from python, like
> {code:python}
> from py4j.java_gateway import java_import
> gw = spark.sparkContext._gateway
> java_import(gw.jvm, "com.me.SasJDBCDialect")
> gw.jvm.org.apache.spark.sql.jdbc.JdbcDialects.registerDialect(
>   gw.jvm.com.me.SasJDBCDialect()){code}
> But I am a newbie of Scala, and don't know how to make it a jar file so it 
> can be call as "com.me.MyJDBCDialect". Can anyone here do me a favour to 
> generate a jar file for me?
>  
> Thanks!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37609) Transient StackOverflowError on DataFrame from Catalyst QueryPlan

2021-12-12 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458103#comment-17458103
 ] 

Hyukjin Kwon commented on SPARK-37609:
--

[~ravwojdyla] I don;t think people will dare to reproduce and debug for further 
investigation. it would be great to have minimised self-contained reproducer 
here.

> Transient StackOverflowError on DataFrame from Catalyst QueryPlan
> -
>
> Key: SPARK-37609
> URL: https://issues.apache.org/jira/browse/SPARK-37609
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.1.2
> Environment: py:3.9
>Reporter: Rafal Wojdyla
>Priority: Major
>
> I sporadically observe a StackOverflowError from Catalyst's QueryPlan (for a 
> relatively complicated query), below is a stacktrace from the {{count}} on 
> that DF.  It's a bit troubling because it's a transient error, with enough 
> retries (no change to code, probably some kind of cache?), I can get the op 
> to work :(
> {noformat}
> ---
> Py4JJavaError Traceback (most recent call last)
> ~/miniconda3/envs/tr-dev/lib/python3.9/site-packages/pyspark/sql/dataframe.py 
> in count(self)
> 662 2
> 663 """
> --> 664 return int(self._jdf.count())
> 665 
> 666 def collect(self):
> ~/miniconda3/envs/tr-dev/lib/python3.9/site-packages/py4j/java_gateway.py in 
> __call__(self, *args)
>1302 
>1303 answer = self.gateway_client.send_command(command)
> -> 1304 return_value = get_return_value(
>1305 answer, self.gateway_client, self.target_id, self.name)
>1306 
> ~/miniconda3/envs/tr-dev/lib/python3.9/site-packages/pyspark/sql/utils.py in 
> deco(*a, **kw)
> 109 def deco(*a, **kw):
> 110 try:
> --> 111 return f(*a, **kw)
> 112 except py4j.protocol.Py4JJavaError as e:
> 113 converted = convert_exception(e.java_exception)
> ~/miniconda3/envs/tr-dev/lib/python3.9/site-packages/py4j/protocol.py in 
> get_return_value(answer, gateway_client, target_id, name)
> 324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
> 325 if answer[1] == REFERENCE_TYPE:
> --> 326 raise Py4JJavaError(
> 327 "An error occurred while calling {0}{1}{2}.\n".
> 328 format(target_id, ".", name), value)
> Py4JJavaError: An error occurred while calling o9123.count.
> : java.lang.StackOverflowError
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:188)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37610) Anonymized/obfuscated query plan?

2021-12-12 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458102#comment-17458102
 ] 

Hyukjin Kwon commented on SPARK-37610:
--

I think it's better to have workload specific script to address this. I doubt 
if we can have a generalized way to hide sensitive information for the whole 
query plan since it can contains arbitrary information.

> Anonymized/obfuscated query plan?
> -
>
> Key: SPARK-37610
> URL: https://issues.apache.org/jira/browse/SPARK-37610
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Rafal Wojdyla
>Priority: Major
>
> I would like to share a query plan for a specific 
> issue(https://issues.apache.org/jira/browse/SPARK-37609), but can't without 
> at least anonymising the column names. If I could call {{explain}} that would 
> anonymise/obfuscate the column names that would be useful.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37610) Anonymized/obfuscated query plan

2021-12-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37610:
-
Summary: Anonymized/obfuscated query plan  (was: Anonymized/obfuscated 
query plan?)

> Anonymized/obfuscated query plan
> 
>
> Key: SPARK-37610
> URL: https://issues.apache.org/jira/browse/SPARK-37610
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Rafal Wojdyla
>Priority: Major
>
> I would like to share a query plan for a specific 
> issue(https://issues.apache.org/jira/browse/SPARK-37609), but can't without 
> at least anonymising the column names. If I could call {{explain}} that would 
> anonymise/obfuscate the column names that would be useful.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37621) ClassCastException when trying to persist the result of a join between two Iceberg tables

2021-12-12 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458101#comment-17458101
 ] 

Hyukjin Kwon commented on SPARK-37621:
--

Is this an issue specific to icebug? or other sources in Spark too?

> ClassCastException when trying to persist the result of a join between two 
> Iceberg tables
> -
>
> Key: SPARK-37621
> URL: https://issues.apache.org/jira/browse/SPARK-37621
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 3.1.2
>Reporter: Ciprian Gerea
>Priority: Major
>
> I am gettin an error when I try to persist the results on a Join operation. 
> Note that both tables to be joined and the output table are Iceberg tables.
> SQL code to repro. 
> String sqlJoin = String.format(
> "SELECT * from " +
> "((select %s from %s.%s where %s ) l " +
> "join (select %s from %s.%s where %s ) r " +
> "using (%s))",
> );
> spark.sql(sqlJoin).writeTo("ciptest.ttt").option("write-format", 
> "parquet").createOrReplace();
> My exception stack is:
> {{Caused by: java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.expressions.GenericInternalRow cannot be cast 
> to org.apache.spark.sql.catalyst.expressions.UnsafeRow}}
> {{at 
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$1.writeValue(UnsafeRowSerializer.scala:64)}}
> {{at 
> org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:249)}}
> {{at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:158)}}
> {{at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)}}
> {{at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)}}
> {{at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)}}
> {{at org.apache.spark.scheduler.Task.run(Task.scala:131)}}
> {{at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)}}
> {{at ….}}
> Explain on the Sql statement gets the following plan:
> {{== Physical Plan ==}}
> {{Project [ ... ]}}
> {{+- SortMergeJoin […], Inner}}
> {{  :- Sort […], false, 0}}
> {{  : +- Exchange hashpartitioning(…, 10), ENSURE_REQUIREMENTS, [id=#38]}}
> {{  :   +- Filter (…)}}
> {{  :+- BatchScan[... ] left [filters=…]}}
> {{  +- *(2) Sort […], false, 0}}
> {{   +- Exchange hashpartitioning(…, 10), ENSURE_REQUIREMENTS, [id=#47]}}
> {{ +- *(1) Filter (…)}}
> {{  +- BatchScan[…] right [filters=…] }}
> {{Note that several variations of this fail. Besides the repro code listed 
> above I have tried doing CTAS and trying to write the result into parquet 
> files without making a table out of it.}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37617) In CTAS,replace alias to normal column that satisfy parquet schema

2021-12-12 Thread jiahong.li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiahong.li updated SPARK-37617:
---
Description: In CTAS, Replace name columns that have not alias. Mostly, 
columns without alias always is operator such as sum, divide that will lead to 
Parquet schema check error.  (was: In CTAS, Replace name columns that have not 
alias. Mostly, columns without alias always is operator such as sum, divide 
that will lead to schema check error.)

> In CTAS,replace alias to normal column that satisfy parquet schema
> --
>
> Key: SPARK-37617
> URL: https://issues.apache.org/jira/browse/SPARK-37617
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiahong.li
>Priority: Minor
> Fix For: 3.2.0
>
>
> In CTAS, Replace name columns that have not alias. Mostly, columns without 
> alias always is operator such as sum, divide that will lead to Parquet schema 
> check error.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37598) Pyspark's newAPIHadoopRDD() method fails with ShortWritables

2021-12-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-37598:


Assignee: Keith Massey

> Pyspark's newAPIHadoopRDD() method fails with ShortWritables
> 
>
> Key: SPARK-37598
> URL: https://issues.apache.org/jira/browse/SPARK-37598
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.8, 3.0.3, 3.1.2, 3.2.0
>Reporter: Keith Massey
>Assignee: Keith Massey
>Priority: Minor
>
> If sc. newAPIHadoopRDD() is called from Pyspark using an InputFormat that has 
> a ShortWritable as a field, then the call to newAPIHadoopRDD() fails. The 
> reason is that shortWritable is not explicitly handled by PythonHadoopUtil 
> the way that other numeric writables are (like LongWritable). The result is 
> that the ShortWritable is not converted to an object that can be serialized 
> by spark, and a serialization error occurs. Below is an example stack trace 
> from within the pyspark shell:
> {code:java}
> >>> rdd = 
> >>> sc.newAPIHadoopRDD(inputFormatClass="[org.elasticsearch.hadoop.mr|http://org.elasticsearch.hadoop.mr/].EsInputFormat;,
> ... 
> keyClass="[org.apache.hadoop.io|http://org.apache.hadoop.io/].NullWritable;,
> ... 
> valueClass="[org.elasticsearch.hadoop.mr|http://org.elasticsearch.hadoop.mr/].LinkedMapWritable;,
> ... conf=conf)
> 2021-12-08 14:38:40,439 ERROR scheduler.TaskSetManager: task 0.0 in stage 
> 15.0 (TID 31) had a not serializable result: 
> org.apache.hadoop.io.ShortWritable
> Serialization stack:
> - object not serializable (class: 
> [org.apache.hadoop.io|http://org.apache.hadoop.io/].ShortWritable, value: 1)
> - writeObject data (class: java.util.HashMap)
> - object (class java.util.HashMap, \{price=1})
> - field (class: scala.Tuple2, name: _2, type: class java.lang.Object)
> - object (class scala.Tuple2, (1,\{price=1}))
> - element of array (index: 0)
> - array (class [Lscala.Tuple2;, size 1); not retrying
> Traceback (most recent call last):
>  File "", line 4, in 
>  File "/home/hduser/spark-3.1.2-bin-hadoop3.2/python/pyspark/context.py", 
> line 853, in newAPIHadoopRDD
>   jconf, batchSize)
>  File 
> "/home/hduser/spark-3.1.2-bin-hadoop3.2/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py",
>  line 1305, in __call__
>  File "/home/hduser/spark-3.1.2-bin-hadoop3.2/python/pyspark/sql/utils.py", 
> line 111, in deco
>   return f(*a, **kw)
>  File 
> "/home/hduser/spark-3.1.2-bin-hadoop3.2/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py",
>  line 328, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
> : org.apache.spark.SparkException: Job aborted due to stage failure: task 0.0 
> in stage 15.0 (TID 31) had a not serializable result: 
> org.apache.hadoop.io.ShortWritable
> Serialization stack:
> - object not serializable (class: 
> [org.apache.hadoop.io|http://org.apache.hadoop.io/].ShortWritable, value: 1)
> - writeObject data (class: java.util.HashMap)
> - object (class java.util.HashMap, \{price=1})
> - field (class: scala.Tuple2, name: _2, type: class java.lang.Object)
> - object (class scala.Tuple2, (1,\{price=1}))
> - element of array (index: 0)
> - array (class [Lscala.Tuple2;, size 1)
> at 
> org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2258)
> at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2207)
> at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2206)
> at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
> at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2206)
> at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1079)
> at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1079)
> at scala.Option.foreach(Option.scala:407)
> at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1079)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2445)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2387)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2376)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:868)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2196)

[jira] [Resolved] (SPARK-37598) Pyspark's newAPIHadoopRDD() method fails with ShortWritables

2021-12-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37598.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34838
[https://github.com/apache/spark/pull/34838]

> Pyspark's newAPIHadoopRDD() method fails with ShortWritables
> 
>
> Key: SPARK-37598
> URL: https://issues.apache.org/jira/browse/SPARK-37598
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.8, 3.0.3, 3.1.2, 3.2.0
>Reporter: Keith Massey
>Assignee: Keith Massey
>Priority: Minor
> Fix For: 3.3.0
>
>
> If sc. newAPIHadoopRDD() is called from Pyspark using an InputFormat that has 
> a ShortWritable as a field, then the call to newAPIHadoopRDD() fails. The 
> reason is that shortWritable is not explicitly handled by PythonHadoopUtil 
> the way that other numeric writables are (like LongWritable). The result is 
> that the ShortWritable is not converted to an object that can be serialized 
> by spark, and a serialization error occurs. Below is an example stack trace 
> from within the pyspark shell:
> {code:java}
> >>> rdd = 
> >>> sc.newAPIHadoopRDD(inputFormatClass="[org.elasticsearch.hadoop.mr|http://org.elasticsearch.hadoop.mr/].EsInputFormat;,
> ... 
> keyClass="[org.apache.hadoop.io|http://org.apache.hadoop.io/].NullWritable;,
> ... 
> valueClass="[org.elasticsearch.hadoop.mr|http://org.elasticsearch.hadoop.mr/].LinkedMapWritable;,
> ... conf=conf)
> 2021-12-08 14:38:40,439 ERROR scheduler.TaskSetManager: task 0.0 in stage 
> 15.0 (TID 31) had a not serializable result: 
> org.apache.hadoop.io.ShortWritable
> Serialization stack:
> - object not serializable (class: 
> [org.apache.hadoop.io|http://org.apache.hadoop.io/].ShortWritable, value: 1)
> - writeObject data (class: java.util.HashMap)
> - object (class java.util.HashMap, \{price=1})
> - field (class: scala.Tuple2, name: _2, type: class java.lang.Object)
> - object (class scala.Tuple2, (1,\{price=1}))
> - element of array (index: 0)
> - array (class [Lscala.Tuple2;, size 1); not retrying
> Traceback (most recent call last):
>  File "", line 4, in 
>  File "/home/hduser/spark-3.1.2-bin-hadoop3.2/python/pyspark/context.py", 
> line 853, in newAPIHadoopRDD
>   jconf, batchSize)
>  File 
> "/home/hduser/spark-3.1.2-bin-hadoop3.2/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py",
>  line 1305, in __call__
>  File "/home/hduser/spark-3.1.2-bin-hadoop3.2/python/pyspark/sql/utils.py", 
> line 111, in deco
>   return f(*a, **kw)
>  File 
> "/home/hduser/spark-3.1.2-bin-hadoop3.2/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py",
>  line 328, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
> : org.apache.spark.SparkException: Job aborted due to stage failure: task 0.0 
> in stage 15.0 (TID 31) had a not serializable result: 
> org.apache.hadoop.io.ShortWritable
> Serialization stack:
> - object not serializable (class: 
> [org.apache.hadoop.io|http://org.apache.hadoop.io/].ShortWritable, value: 1)
> - writeObject data (class: java.util.HashMap)
> - object (class java.util.HashMap, \{price=1})
> - field (class: scala.Tuple2, name: _2, type: class java.lang.Object)
> - object (class scala.Tuple2, (1,\{price=1}))
> - element of array (index: 0)
> - array (class [Lscala.Tuple2;, size 1)
> at 
> org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2258)
> at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2207)
> at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2206)
> at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
> at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2206)
> at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1079)
> at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1079)
> at scala.Option.foreach(Option.scala:407)
> at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1079)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2445)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2387)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2376)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
> at 

[jira] [Created] (SPARK-37621) ClassCastException when trying to persist the result of a join between two Iceberg tables

2021-12-12 Thread Ciprian Gerea (Jira)
Ciprian Gerea created SPARK-37621:
-

 Summary: ClassCastException when trying to persist the result of a 
join between two Iceberg tables
 Key: SPARK-37621
 URL: https://issues.apache.org/jira/browse/SPARK-37621
 Project: Spark
  Issue Type: Bug
  Components: Input/Output
Affects Versions: 3.1.2
Reporter: Ciprian Gerea


I am gettin an error when I try to persist the results on a Join operation. 
Note that both tables to be joined and the output table are Iceberg tables.

SQL code to repro. 


String sqlJoin = String.format(
"SELECT * from " +
"((select %s from %s.%s where %s ) l " +
"join (select %s from %s.%s where %s ) r " +
"using (%s))",
);
spark.sql(sqlJoin).writeTo("ciptest.ttt").option("write-format", 
"parquet").createOrReplace();

My exception stack is:
{{Caused by: java.lang.ClassCastException: 
org.apache.spark.sql.catalyst.expressions.GenericInternalRow cannot be cast to 
org.apache.spark.sql.catalyst.expressions.UnsafeRow}}
{{  at 
org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$1.writeValue(UnsafeRowSerializer.scala:64)}}
{{  at 
org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:249)}}
{{  at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:158)}}
{{  at 
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)}}
{{  at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)}}
{{  at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)}}
{{  at org.apache.spark.scheduler.Task.run(Task.scala:131)}}
{{  at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)}}
{{  at ….}}

Explain on the Sql statement gets the following plan:
{{== Physical Plan ==}}
{{Project [ ... ]}}
{{+- SortMergeJoin […], Inner}}
{{  :- Sort […], false, 0}}
{{  : +- Exchange hashpartitioning(…, 10), ENSURE_REQUIREMENTS, [id=#38]}}
{{  :   +- Filter (…)}}
{{  :+- BatchScan[... ] left [filters=…]}}
{{  +- *(2) Sort […], false, 0}}
{{   +- Exchange hashpartitioning(…, 10), ENSURE_REQUIREMENTS, [id=#47]}}
{{ +- *(1) Filter (…)}}
{{  +- BatchScan[…] right [filters=…] }}


{{Note that several variations of this fail. Besides the repro code listed 
above I have tried doing CTAS and trying to write the result into parquet files 
without making a table out of it.}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37620) Use more precise types for SparkContext Optional fields (i.e. _gateway, _jvm)

2021-12-12 Thread Maciej Szymkiewicz (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458050#comment-17458050
 ] 

Maciej Szymkiewicz commented on SPARK-37620:


FYI [~ueshin] [~itholic] [~XinrongM] [~hyukjin.kwon] [~byronhsu]

> Use more precise types for SparkContext Optional fields  (i.e. _gateway, _jvm)
> --
>
> Key: SPARK-37620
> URL: https://issues.apache.org/jira/browse/SPARK-37620
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> As a part of SPARK-37152 [we 
> agreed|https://github.com/apache/spark/pull/34466/files#r762609181] to keep 
> these typed as not {{Optional}} for simplicity, but this just a temporary 
> solution to move things forward, until we decide on best approach to handle 
> such cases.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37620) Use more precise types for SparkContext Optional fields (i.e. _gateway, _jvm)

2021-12-12 Thread Maciej Szymkiewicz (Jira)
Maciej Szymkiewicz created SPARK-37620:
--

 Summary: Use more precise types for SparkContext Optional fields  
(i.e. _gateway, _jvm)
 Key: SPARK-37620
 URL: https://issues.apache.org/jira/browse/SPARK-37620
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.3.0
Reporter: Maciej Szymkiewicz


As a part of SPARK-37152 [we 
agreed|https://github.com/apache/spark/pull/34466/files#r762609181] to keep 
these typed as not {{Optional}} for simplicity, but this just a temporary 
solution to move things forward, until we decide on best approach to handle 
such cases.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37152) Inline type hints for python/pyspark/context.py

2021-12-12 Thread Maciej Szymkiewicz (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz resolved SPARK-37152.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34466
[https://github.com/apache/spark/pull/34466]

> Inline type hints for python/pyspark/context.py
> ---
>
> Key: SPARK-37152
> URL: https://issues.apache.org/jira/browse/SPARK-37152
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Byron Hsu
>Assignee: Byron Hsu
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37152) Inline type hints for python/pyspark/context.py

2021-12-12 Thread Maciej Szymkiewicz (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz reassigned SPARK-37152:
--

Assignee: Byron Hsu

> Inline type hints for python/pyspark/context.py
> ---
>
> Key: SPARK-37152
> URL: https://issues.apache.org/jira/browse/SPARK-37152
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Byron Hsu
>Assignee: Byron Hsu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37619) Upgrade Maven to 3.8.4

2021-12-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37619:


Assignee: (was: Apache Spark)

> Upgrade Maven to 3.8.4
> --
>
> Key: SPARK-37619
> URL: https://issues.apache.org/jira/browse/SPARK-37619
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37619) Upgrade Maven to 3.8.4

2021-12-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458019#comment-17458019
 ] 

Apache Spark commented on SPARK-37619:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/34873

> Upgrade Maven to 3.8.4
> --
>
> Key: SPARK-37619
> URL: https://issues.apache.org/jira/browse/SPARK-37619
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37619) Upgrade Maven to 3.8.4

2021-12-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37619:


Assignee: Apache Spark

> Upgrade Maven to 3.8.4
> --
>
> Key: SPARK-37619
> URL: https://issues.apache.org/jira/browse/SPARK-37619
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37619) Upgrade Maven to 3.8.4

2021-12-12 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-37619:
-

 Summary: Upgrade Maven to 3.8.4
 Key: SPARK-37619
 URL: https://issues.apache.org/jira/browse/SPARK-37619
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 3.3.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37618) Support cleaning up shuffle blocks from external shuffle service

2021-12-12 Thread Adam Binford (Jira)
Adam Binford created SPARK-37618:


 Summary: Support cleaning up shuffle blocks from external shuffle 
service
 Key: SPARK-37618
 URL: https://issues.apache.org/jira/browse/SPARK-37618
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle, Spark Core
Affects Versions: 3.2.0
Reporter: Adam Binford


Currently shuffle data is not cleaned up when an external shuffle service is 
used and the associated executor has been deallocated before the shuffle is 
cleaned up. Shuffle data is only cleaned up once the application ends.

There have been various issues filed for this:

https://issues.apache.org/jira/browse/SPARK-26020

https://issues.apache.org/jira/browse/SPARK-17233

https://issues.apache.org/jira/browse/SPARK-4236

But shuffle files will still stick around until an application completes. 
Dynamic allocation is commonly used for long running jobs (such as structured 
streaming), so any long running jobs with a large shuffle involved will 
eventually fill up local disk space. The shuffle service already supports 
cleaning up shuffle service persisted RDDs, so it should be able to support 
cleaning up shuffle blocks as well once the shuffle is removed by the 
ContextCleaner. 

The current alternative is to use shuffle tracking instead of an external 
shuffle service, but this is less optimal from a resource perspective as all 
executors must be kept alive until the shuffle has been fully consumed and 
cleaned up (and with the default GC interval being 30 minutes this can waste a 
lot of time with executors held onto but not doing anything).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37617) In CTAS,replace alias to normal column that satisfy parquet schema

2021-12-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37617:


Assignee: Apache Spark

> In CTAS,replace alias to normal column that satisfy parquet schema
> --
>
> Key: SPARK-37617
> URL: https://issues.apache.org/jira/browse/SPARK-37617
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiahong.li
>Assignee: Apache Spark
>Priority: Minor
> Fix For: 3.2.0
>
>
> In CTAS, Replace name columns that have not alias. Mostly, columns without 
> alias always is operator such as sum, divide that will lead to schema check 
> error.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37617) In CTAS,replace alias to normal column that satisfy parquet schema

2021-12-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37617:


Assignee: (was: Apache Spark)

> In CTAS,replace alias to normal column that satisfy parquet schema
> --
>
> Key: SPARK-37617
> URL: https://issues.apache.org/jira/browse/SPARK-37617
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiahong.li
>Priority: Minor
> Fix For: 3.2.0
>
>
> In CTAS, Replace name columns that have not alias. Mostly, columns without 
> alias always is operator such as sum, divide that will lead to schema check 
> error.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37617) In CTAS,replace alias to normal column that satisfy parquet schema

2021-12-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17457977#comment-17457977
 ] 

Apache Spark commented on SPARK-37617:
--

User 'monkeyboy123' has created a pull request for this issue:
https://github.com/apache/spark/pull/34872

> In CTAS,replace alias to normal column that satisfy parquet schema
> --
>
> Key: SPARK-37617
> URL: https://issues.apache.org/jira/browse/SPARK-37617
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiahong.li
>Priority: Minor
> Fix For: 3.2.0
>
>
> In CTAS, Replace name columns that have not alias. Mostly, columns without 
> alias always is operator such as sum, divide that will lead to schema check 
> error.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37617) In CTAS,replace alias to normal column that satisfy parquet schema

2021-12-12 Thread jiahong.li (Jira)
jiahong.li created SPARK-37617:
--

 Summary: In CTAS,replace alias to normal column that satisfy 
parquet schema
 Key: SPARK-37617
 URL: https://issues.apache.org/jira/browse/SPARK-37617
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: jiahong.li
 Fix For: 3.2.0


In CTAS, Replace name columns that have not alias. Mostly, columns without 
alias always is operator such as sum, divide that will lead to schema check 
error.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37575) null values should be saved as nothing rather than quoted empty Strings "" by default settings

2021-12-12 Thread Guo Wei (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guo Wei updated SPARK-37575:

Summary: null values should be saved as nothing rather than quoted empty 
Strings "" by default settings  (was: Null values are saved as quoted empty 
Strings "" rather than nothing)

> null values should be saved as nothing rather than quoted empty Strings "" by 
> default settings
> --
>
> Key: SPARK-37575
> URL: https://issues.apache.org/jira/browse/SPARK-37575
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 3.2.0
>Reporter: Guo Wei
>Priority: Major
>
> As mentioned in sql migration 
> guide([https://spark.apache.org/docs/latest/sql-migration-guide.html#upgrading-from-spark-sql-23-to-24]),
> {noformat}
> Since Spark 2.4, empty strings are saved as quoted empty strings "". In 
> version 2.3 and earlier, empty strings are equal to null values and do not 
> reflect to any characters in saved CSV files. For example, the row of "a", 
> null, "", 1 was written as a,,,1. Since Spark 2.4, the same row is saved as 
> a,,"",1. To restore the previous behavior, set the CSV option emptyValue to 
> empty (not quoted) string.{noformat}
>  
> But actually, both empty strings and null values are saved as quoted empty 
> Strings "" rather than "" (for empty strings) and nothing(for null values)。
> code:
> {code:java}
> val data = List("spark", null, "").toDF("name")
> data.coalesce(1).write.csv("spark_csv_test")
> {code}
>  actual result:
> {noformat}
> line1: spark
> line2: ""
> line3: ""{noformat}
> expected result:
> {noformat}
> line1: spark
> line2: 
> line3: ""
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37616) Support pushing down a dynamic partition pruning from one join to other joins

2021-12-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37616:


Assignee: (was: Apache Spark)

> Support pushing down a dynamic partition pruning from one join to other joins
> -
>
> Key: SPARK-37616
> URL: https://issues.apache.org/jira/browse/SPARK-37616
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> Support pushing down a dynamic partition pruning from one join to other joins



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37616) Support pushing down a dynamic partition pruning from one join to other joins

2021-12-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37616:


Assignee: Apache Spark

> Support pushing down a dynamic partition pruning from one join to other joins
> -
>
> Key: SPARK-37616
> URL: https://issues.apache.org/jira/browse/SPARK-37616
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2, 3.2.0
>Reporter: weixiuli
>Assignee: Apache Spark
>Priority: Major
>
> Support pushing down a dynamic partition pruning from one join to other joins



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37616) Support pushing down a dynamic partition pruning from one join to other joins

2021-12-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17457928#comment-17457928
 ] 

Apache Spark commented on SPARK-37616:
--

User 'weixiuli' has created a pull request for this issue:
https://github.com/apache/spark/pull/34871

> Support pushing down a dynamic partition pruning from one join to other joins
> -
>
> Key: SPARK-37616
> URL: https://issues.apache.org/jira/browse/SPARK-37616
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> Support pushing down a dynamic partition pruning from one join to other joins



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37616) Support pushing down a dynamic partition pruning from one join to other joins

2021-12-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17457929#comment-17457929
 ] 

Apache Spark commented on SPARK-37616:
--

User 'weixiuli' has created a pull request for this issue:
https://github.com/apache/spark/pull/34871

> Support pushing down a dynamic partition pruning from one join to other joins
> -
>
> Key: SPARK-37616
> URL: https://issues.apache.org/jira/browse/SPARK-37616
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> Support pushing down a dynamic partition pruning from one join to other joins



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37616) Support pushing down a dynamic partition pruning from one join to other joins

2021-12-12 Thread weixiuli (Jira)
weixiuli created SPARK-37616:


 Summary: Support pushing down a dynamic partition pruning from one 
join to other joins
 Key: SPARK-37616
 URL: https://issues.apache.org/jira/browse/SPARK-37616
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0, 3.1.2, 3.1.1
Reporter: weixiuli


Support pushing down a dynamic partition pruning from one join to other joins



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org