[jira] [Commented] (SPARK-35924) Add Java 17 ea build test to GitHub action

2021-06-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371076#comment-17371076
 ] 

Apache Spark commented on SPARK-35924:
--

User 'williamhyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/33126

> Add Java 17 ea build test to GitHub action
> --
>
> Key: SPARK-35924
> URL: https://issues.apache.org/jira/browse/SPARK-35924
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Tests
>Affects Versions: 3.2.0
>Reporter: William Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35924) Add Java 17 ea build test to GitHub action

2021-06-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35924:


Assignee: Apache Spark

> Add Java 17 ea build test to GitHub action
> --
>
> Key: SPARK-35924
> URL: https://issues.apache.org/jira/browse/SPARK-35924
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Tests
>Affects Versions: 3.2.0
>Reporter: William Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35924) Add Java 17 ea build test to GitHub action

2021-06-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35924:


Assignee: (was: Apache Spark)

> Add Java 17 ea build test to GitHub action
> --
>
> Key: SPARK-35924
> URL: https://issues.apache.org/jira/browse/SPARK-35924
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Tests
>Affects Versions: 3.2.0
>Reporter: William Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35924) Add Java 17 ea build test to GitHub action

2021-06-28 Thread William Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Hyun updated SPARK-35924:
-
Component/s: Tests

> Add Java 17 ea build test to GitHub action
> --
>
> Key: SPARK-35924
> URL: https://issues.apache.org/jira/browse/SPARK-35924
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Tests
>Affects Versions: 3.2.0
>Reporter: William Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35924) Add Java 17 ea build test to GitHub action

2021-06-28 Thread William Hyun (Jira)
William Hyun created SPARK-35924:


 Summary: Add Java 17 ea build test to GitHub action
 Key: SPARK-35924
 URL: https://issues.apache.org/jira/browse/SPARK-35924
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.2.0
Reporter: William Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35483) Add a new GA test job for the docker integration tests

2021-06-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371063#comment-17371063
 ] 

Apache Spark commented on SPARK-35483:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/33125

> Add a new GA test job for the docker integration tests
> --
>
> Key: SPARK-35483
> URL: https://issues.apache.org/jira/browse/SPARK-35483
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.0.2, 3.1.1, 3.2.0
>Reporter: Takeshi Yamamuro
>Assignee: Kousuke Saruta
>Priority: Major
> Fix For: 3.2.0
>
>
> This ticket proposes to add a new GA test job for the integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35483) Add a new GA test job for the docker integration tests

2021-06-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371062#comment-17371062
 ] 

Apache Spark commented on SPARK-35483:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/33125

> Add a new GA test job for the docker integration tests
> --
>
> Key: SPARK-35483
> URL: https://issues.apache.org/jira/browse/SPARK-35483
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.0.2, 3.1.1, 3.2.0
>Reporter: Takeshi Yamamuro
>Assignee: Kousuke Saruta
>Priority: Major
> Fix For: 3.2.0
>
>
> This ticket proposes to add a new GA test job for the integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35922) Upgrade maven-shade-plugin to 3.2.4

2021-06-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-35922.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33122
[https://github.com/apache/spark/pull/33122]

> Upgrade maven-shade-plugin to 3.2.4
> ---
>
> Key: SPARK-35922
> URL: https://issues.apache.org/jira/browse/SPARK-35922
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35922) Upgrade maven-shade-plugin to 3.2.4

2021-06-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-35922:
-

Assignee: Dongjoon Hyun

> Upgrade maven-shade-plugin to 3.2.4
> ---
>
> Key: SPARK-35922
> URL: https://issues.apache.org/jira/browse/SPARK-35922
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34302) Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework

2021-06-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371041#comment-17371041
 ] 

Apache Spark commented on SPARK-34302:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/33124

> Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework
> 
>
> Key: SPARK-34302
> URL: https://issues.apache.org/jira/browse/SPARK-34302
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.2.0
>
>
> # Create the Command logical node for ALTER TABLE .. CHANGE COLUMN
> # Remove AlterTableAlterColumnStatement
> # Remove the check verifyAlterTableType() from run()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34302) Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework

2021-06-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371040#comment-17371040
 ] 

Apache Spark commented on SPARK-34302:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/33124

> Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework
> 
>
> Key: SPARK-34302
> URL: https://issues.apache.org/jira/browse/SPARK-34302
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.2.0
>
>
> # Create the Command logical node for ALTER TABLE .. CHANGE COLUMN
> # Remove AlterTableAlterColumnStatement
> # Remove the check verifyAlterTableType() from run()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35920) Upgrade to Chill 0.10.0

2021-06-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-35920.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33119
[https://github.com/apache/spark/pull/33119]

> Upgrade to Chill 0.10.0
> ---
>
> Key: SPARK-35920
> URL: https://issues.apache.org/jira/browse/SPARK-35920
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35920) Upgrade to Chill 0.10.0

2021-06-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-35920:
-

Assignee: Dongjoon Hyun

> Upgrade to Chill 0.10.0
> ---
>
> Key: SPARK-35920
> URL: https://issues.apache.org/jira/browse/SPARK-35920
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35923) Coalesce empty partition with mixed CoalescedPartitionSpec and PartialReducerPartitionSpec

2021-06-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35923:


Assignee: (was: Apache Spark)

> Coalesce empty partition with mixed CoalescedPartitionSpec and 
> PartialReducerPartitionSpec
> --
>
> Key: SPARK-35923
> URL: https://issues.apache.org/jira/browse/SPARK-35923
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Priority: Major
>
> Since [SPARK-35447](https://issues.apache.org/jira/browse/SPARK-35447), we 
> apply `OptimizeSkewedJoin` before `CoalesceShufflePartitions`. However, There 
> are something different with the order of these two rules.
> Let's say if we have a skewed partitions: [0, 128MB, 0, 128MB, 0]:
>  # coalesce partitions first then optimize skewed partitions:
>  [64MB, 64MB, 64MB, 64MB]
>  # optimize skew partition first then coalesce partitions:
>  [0, 64MB, 64MB, 0, 64MB, 64MB, 0]
> So we can do coalesce in ShufflePartitionsUtil.coalescePartitionsWithSkew 
> with mixed CoalescedPartitionSpec and PartialReducerPartitionSpec if 
> CoalescedPartitionSpec is empty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35923) Coalesce empty partition with mixed CoalescedPartitionSpec and PartialReducerPartitionSpec

2021-06-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35923:


Assignee: Apache Spark

> Coalesce empty partition with mixed CoalescedPartitionSpec and 
> PartialReducerPartitionSpec
> --
>
> Key: SPARK-35923
> URL: https://issues.apache.org/jira/browse/SPARK-35923
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Assignee: Apache Spark
>Priority: Major
>
> Since [SPARK-35447](https://issues.apache.org/jira/browse/SPARK-35447), we 
> apply `OptimizeSkewedJoin` before `CoalesceShufflePartitions`. However, There 
> are something different with the order of these two rules.
> Let's say if we have a skewed partitions: [0, 128MB, 0, 128MB, 0]:
>  # coalesce partitions first then optimize skewed partitions:
>  [64MB, 64MB, 64MB, 64MB]
>  # optimize skew partition first then coalesce partitions:
>  [0, 64MB, 64MB, 0, 64MB, 64MB, 0]
> So we can do coalesce in ShufflePartitionsUtil.coalescePartitionsWithSkew 
> with mixed CoalescedPartitionSpec and PartialReducerPartitionSpec if 
> CoalescedPartitionSpec is empty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35923) Coalesce empty partition with mixed CoalescedPartitionSpec and PartialReducerPartitionSpec

2021-06-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371038#comment-17371038
 ] 

Apache Spark commented on SPARK-35923:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/33123

> Coalesce empty partition with mixed CoalescedPartitionSpec and 
> PartialReducerPartitionSpec
> --
>
> Key: SPARK-35923
> URL: https://issues.apache.org/jira/browse/SPARK-35923
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Priority: Major
>
> Since [SPARK-35447](https://issues.apache.org/jira/browse/SPARK-35447), we 
> apply `OptimizeSkewedJoin` before `CoalesceShufflePartitions`. However, There 
> are something different with the order of these two rules.
> Let's say if we have a skewed partitions: [0, 128MB, 0, 128MB, 0]:
>  # coalesce partitions first then optimize skewed partitions:
>  [64MB, 64MB, 64MB, 64MB]
>  # optimize skew partition first then coalesce partitions:
>  [0, 64MB, 64MB, 0, 64MB, 64MB, 0]
> So we can do coalesce in ShufflePartitionsUtil.coalescePartitionsWithSkew 
> with mixed CoalescedPartitionSpec and PartialReducerPartitionSpec if 
> CoalescedPartitionSpec is empty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35923) Coalesce empty partition with mixed CoalescedPartitionSpec and PartialReducerPartitionSpec

2021-06-28 Thread XiDuo You (Jira)
XiDuo You created SPARK-35923:
-

 Summary: Coalesce empty partition with mixed 
CoalescedPartitionSpec and PartialReducerPartitionSpec
 Key: SPARK-35923
 URL: https://issues.apache.org/jira/browse/SPARK-35923
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: XiDuo You


Since [SPARK-35447](https://issues.apache.org/jira/browse/SPARK-35447), we 
apply `OptimizeSkewedJoin` before `CoalesceShufflePartitions`. However, There 
are something different with the order of these two rules.

Let's say if we have a skewed partitions: [0, 128MB, 0, 128MB, 0]:
 # coalesce partitions first then optimize skewed partitions:
 [64MB, 64MB, 64MB, 64MB]

 # optimize skew partition first then coalesce partitions:
 [0, 64MB, 64MB, 0, 64MB, 64MB, 0]

So we can do coalesce in ShufflePartitionsUtil.coalescePartitionsWithSkew with 
mixed CoalescedPartitionSpec and PartialReducerPartitionSpec if 
CoalescedPartitionSpec is empty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35876) array_zip unexpected column names

2021-06-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-35876:


Assignee: Kousuke Saruta

> array_zip unexpected column names
> -
>
> Key: SPARK-35876
> URL: https://issues.apache.org/jira/browse/SPARK-35876
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Derk Crezee
>Assignee: Kousuke Saruta
>Priority: Major
>
> {{When I'm using the array_zip function in combination with renamed columns, 
> I get an unexpected schema written to disk.}}
> {code:java}
> // code placeholder
> from pyspark.sql import * 
> from pyspark.sql.functions import *
> spark = SparkSession.builder.getOrCreate()
> data = [
>   Row(a1=["a", "a"], b1=["b", "b"]),
> ]
> df = (
>   spark.sparkContext.parallelize(data).toDF()
> .withColumnRenamed("a1", "a2")
> .withColumnRenamed("b1", "b2")
> .withColumn("zipped", arrays_zip(col("a2"), col("b2")))
> )
> df.printSchema()
> // root
> //  |-- a2: array (nullable = true)
> //  ||-- element: string (containsNull = true)
> //  |-- b2: array (nullable = true)
> //  ||-- element: string (containsNull = true)
> //  |-- zipped: array (nullable = true)
> //  ||-- element: struct (containsNull = false)
> //  |||-- a2: string (nullable = true)
> //  |||-- b2: string (nullable = true)
> df.write.save("test.parquet")
> spark.read.load("test.parquet").printSchema()
> // root
> //  |-- a2: array (nullable = true)
> //  ||-- element: string (containsNull = true)
> //  |-- b2: array (nullable = true)
> //  ||-- element: string (containsNull = true)
> //  |-- zipped: array (nullable = true)
> //  ||-- element: struct (containsNull = true)
> //  |||-- a1: string (nullable = true)
> //  |||-- b1: string (nullable = true){code}
> I would expect the schema of the DataFrame written to disk to be the same as 
> that printed out. It seems that instead of using the renamed version of the 
> column names, it uses the old column names.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35876) array_zip unexpected column names

2021-06-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-35876.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33106
[https://github.com/apache/spark/pull/33106]

> array_zip unexpected column names
> -
>
> Key: SPARK-35876
> URL: https://issues.apache.org/jira/browse/SPARK-35876
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Derk Crezee
>Assignee: Kousuke Saruta
>Priority: Major
> Fix For: 3.2.0
>
>
> {{When I'm using the array_zip function in combination with renamed columns, 
> I get an unexpected schema written to disk.}}
> {code:java}
> // code placeholder
> from pyspark.sql import * 
> from pyspark.sql.functions import *
> spark = SparkSession.builder.getOrCreate()
> data = [
>   Row(a1=["a", "a"], b1=["b", "b"]),
> ]
> df = (
>   spark.sparkContext.parallelize(data).toDF()
> .withColumnRenamed("a1", "a2")
> .withColumnRenamed("b1", "b2")
> .withColumn("zipped", arrays_zip(col("a2"), col("b2")))
> )
> df.printSchema()
> // root
> //  |-- a2: array (nullable = true)
> //  ||-- element: string (containsNull = true)
> //  |-- b2: array (nullable = true)
> //  ||-- element: string (containsNull = true)
> //  |-- zipped: array (nullable = true)
> //  ||-- element: struct (containsNull = false)
> //  |||-- a2: string (nullable = true)
> //  |||-- b2: string (nullable = true)
> df.write.save("test.parquet")
> spark.read.load("test.parquet").printSchema()
> // root
> //  |-- a2: array (nullable = true)
> //  ||-- element: string (containsNull = true)
> //  |-- b2: array (nullable = true)
> //  ||-- element: string (containsNull = true)
> //  |-- zipped: array (nullable = true)
> //  ||-- element: struct (containsNull = true)
> //  |||-- a1: string (nullable = true)
> //  |||-- b1: string (nullable = true){code}
> I would expect the schema of the DataFrame written to disk to be the same as 
> that printed out. It seems that instead of using the renamed version of the 
> column names, it uses the old column names.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35922) Upgrade maven-shade-plugin to 3.2.4

2021-06-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35922:


Assignee: (was: Apache Spark)

> Upgrade maven-shade-plugin to 3.2.4
> ---
>
> Key: SPARK-35922
> URL: https://issues.apache.org/jira/browse/SPARK-35922
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35922) Upgrade maven-shade-plugin to 3.2.4

2021-06-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35922:


Assignee: (was: Apache Spark)

> Upgrade maven-shade-plugin to 3.2.4
> ---
>
> Key: SPARK-35922
> URL: https://issues.apache.org/jira/browse/SPARK-35922
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35922) Upgrade maven-shade-plugin to 3.2.4

2021-06-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371014#comment-17371014
 ] 

Apache Spark commented on SPARK-35922:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/33122

> Upgrade maven-shade-plugin to 3.2.4
> ---
>
> Key: SPARK-35922
> URL: https://issues.apache.org/jira/browse/SPARK-35922
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35922) Upgrade maven-shade-plugin to 3.2.4

2021-06-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35922:


Assignee: Apache Spark

> Upgrade maven-shade-plugin to 3.2.4
> ---
>
> Key: SPARK-35922
> URL: https://issues.apache.org/jira/browse/SPARK-35922
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35922) Upgrade maven-shade-plugin to 3.2.4

2021-06-28 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-35922:
-

 Summary: Upgrade maven-shade-plugin to 3.2.4
 Key: SPARK-35922
 URL: https://issues.apache.org/jira/browse/SPARK-35922
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.2.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-34537) Repartition miss/duplicated data

2021-06-28 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu resolved SPARK-34537.
---
Resolution: Not A Problem

> Repartition miss/duplicated data
> 
>
> Key: SPARK-34537
> URL: https://issues.apache.org/jira/browse/SPARK-34537
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: angerszhu
>Priority: Major
> Attachments: image-2021-02-25-19-43-49-687.png, 
> image-2021-02-25-19-46-52-809.png, image-2021-02-25-19-47-10-005.png
>
>
> We have a SQL
> {code:java}
> INSERT OVERWRITE TABLE t1 
> SELECT /*+ repartition(300) */ * from t2.{code}
> Below is SQL metrics of the repartition ShuffleExchange. we can see that the 
> shuffle record written and records read is not same. 
> In the result table, there are some data missing and some data duplicated.
> !image-2021-02-25-19-43-49-687.png!
> !image-2021-02-25-19-46-52-809.png|width=408,height=654!!image-2021-02-25-19-47-10-005.png|width=282,height=414!
> We can see that *InsertIntoHadoopFsRelationCommand's output is save as 
> repartition Exchange's record read(reducer side)*
> *and repartition Exchange's shuffle record written (mapper side written) is 
> same as Filter's output.*
> *So we can see that repartition's Exchange return wrong data.*
>  
> *In our env, AQE and speculation is open.*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35914) Driver can't distribute task to executor because NullPointerException

2021-06-28 Thread Helt Long (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371009#comment-17371009
 ] 

Helt Long commented on SPARK-35914:
---

I guess this problem is related to hadoop-version, I use 
CDH-5.7.1:hadoop-2.6.5, spark3 use hadoop-2.7. Because the other problem I 
found in spark about webui, it's caused by the version. I will try higher 
hadoop-version to confirm it.

[SPARK-35802] Error loading the stages/stage/ page in spark UI - ASF JIRA 
(apache.org)

> Driver can't distribute task to executor because NullPointerException
> -
>
> Key: SPARK-35914
> URL: https://issues.apache.org/jira/browse/SPARK-35914
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1, 3.1.1, 3.1.2
> Environment: CDH 5.7.1: Hadoop 2.6.5
> Spark 3.0.1, 3.1.1, 3.1.2
>Reporter: Helt Long
>Priority: Major
> Attachments: stuck log.png, webui stuck.png
>
>
> When use spark3 submit a spark job to yarn cluster, I get a problem. Once in 
> a while, driver can't distribute any tasks to any executors, and the stage 
> will stuck , the total spark job will stuck. Check driver log, I found 
> NullPointerException. It's like a netty problem, I can confirm this problem 
> only exist in spark3, because I use spark2 never happend.
>  
> {code:java}
> // Error message
> 21/06/28 14:42:43 INFO TaskSetManager: Starting task 2592.0 in stage 1.0 (TID 
> 3494) (worker39.hadoop, executor 84, partition 2592, RACK_LOCAL, 5006 bytes) 
> taskResourceAssignments Map()
> 21/06/28 14:42:43 INFO TaskSetManager: Finished task 4155.0 in stage 1.0 (TID 
> 3367) in 36670 ms on worker39.hadoop (executor 84) (3278/4249)
> 21/06/28 14:42:43 INFO TaskSetManager: Finished task 2283.0 in stage 1.0 (TID 
> 3422) in 22371 ms on worker15.hadoop (executor 109) (3279/4249)
> 21/06/28 14:42:43 ERROR Inbox: Ignoring error
> java.lang.NullPointerException
>   at java.lang.String.length(String.java:623)
>   at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:420)
>   at java.lang.StringBuilder.append(StringBuilder.java:136)
>   at 
> org.apache.spark.scheduler.TaskSetManager.$anonfun$resourceOffer$5(TaskSetManager.scala:483)
>   at org.apache.spark.internal.Logging.logInfo(Logging.scala:57)
>   at org.apache.spark.internal.Logging.logInfo$(Logging.scala:56)
>   at 
> org.apache.spark.scheduler.TaskSetManager.logInfo(TaskSetManager.scala:54)
>   at 
> org.apache.spark.scheduler.TaskSetManager.$anonfun$resourceOffer$2(TaskSetManager.scala:484)
>   at scala.Option.map(Option.scala:230)
>   at 
> org.apache.spark.scheduler.TaskSetManager.resourceOffer(TaskSetManager.scala:444)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$2(TaskSchedulerImpl.scala:397)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$2$adapted(TaskSchedulerImpl.scala:392)
>   at scala.Option.foreach(Option.scala:407)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$1(TaskSchedulerImpl.scala:392)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.resourceOfferSingleTaskSet(TaskSchedulerImpl.scala:383)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$20(TaskSchedulerImpl.scala:581)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$20$adapted(TaskSchedulerImpl.scala:576)
>   at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>   at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$16(TaskSchedulerImpl.scala:576)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$16$adapted(TaskSchedulerImpl.scala:547)
>   at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:547)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.$anonfun$makeOffers$5(CoarseGrainedSchedulerBackend.scala:340)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.org$apache$spark$scheduler$cluster$CoarseGrainedSchedulerBackend$$withLock(CoarseGrainedSchedulerBackend.scala:904)
>   at 
> 

[jira] [Assigned] (SPARK-35921) ${spark.yarn.isHadoopProvided} in config.properties is not edited if build with SBT

2021-06-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35921:


Assignee: Apache Spark  (was: Kousuke Saruta)

> ${spark.yarn.isHadoopProvided} in config.properties is not edited if build 
> with SBT
> ---
>
> Key: SPARK-35921
> URL: https://issues.apache.org/jira/browse/SPARK-35921
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Minor
>
> yarn sub-module contains config.properties.
> {code}
> spark.yarn.isHadoopProvided = ${spark.yarn.isHadoopProvided}
> {code}
> The ${spark.yarn.isHadoopProvided} part is replaced with true or false in 
> build depending on whether Hadoop is provided or not (specified by 
> -Phadoop-provided).
> The edited config.properties will be loaded at runtime to control how to 
> populate Hadoop-related classpath.
> If we build with Maven, these process works but doesn't with SBT.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35921) ${spark.yarn.isHadoopProvided} in config.properties is not edited if build with SBT

2021-06-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35921:


Assignee: Kousuke Saruta  (was: Apache Spark)

> ${spark.yarn.isHadoopProvided} in config.properties is not edited if build 
> with SBT
> ---
>
> Key: SPARK-35921
> URL: https://issues.apache.org/jira/browse/SPARK-35921
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> yarn sub-module contains config.properties.
> {code}
> spark.yarn.isHadoopProvided = ${spark.yarn.isHadoopProvided}
> {code}
> The ${spark.yarn.isHadoopProvided} part is replaced with true or false in 
> build depending on whether Hadoop is provided or not (specified by 
> -Phadoop-provided).
> The edited config.properties will be loaded at runtime to control how to 
> populate Hadoop-related classpath.
> If we build with Maven, these process works but doesn't with SBT.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35921) ${spark.yarn.isHadoopProvided} in config.properties is not edited if build with SBT

2021-06-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371007#comment-17371007
 ] 

Apache Spark commented on SPARK-35921:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/33121

> ${spark.yarn.isHadoopProvided} in config.properties is not edited if build 
> with SBT
> ---
>
> Key: SPARK-35921
> URL: https://issues.apache.org/jira/browse/SPARK-35921
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> yarn sub-module contains config.properties.
> {code}
> spark.yarn.isHadoopProvided = ${spark.yarn.isHadoopProvided}
> {code}
> The ${spark.yarn.isHadoopProvided} part is replaced with true or false in 
> build depending on whether Hadoop is provided or not (specified by 
> -Phadoop-provided).
> The edited config.properties will be loaded at runtime to control how to 
> populate Hadoop-related classpath.
> If we build with Maven, these process works but doesn't with SBT.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35802) Error loading the stages/stage/ page in spark UI

2021-06-28 Thread Helt Long (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Helt Long resolved SPARK-35802.
---
Resolution: Not A Bug

I try hadoop-2.7.5, the problem is not existed. So I confirm it's because I 
used hadoop-2.6.5. Sorry for this, I closed It.

> Error loading the stages/stage/ page in spark UI
> 
>
> Key: SPARK-35802
> URL: https://issues.apache.org/jira/browse/SPARK-35802
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0, 3.0.1, 3.1.1, 3.1.2
> Environment: CDH 5.7.1: Hadoop 2.6.5
> Spark on yarn cluster mode
>Reporter: Helt Long
>Priority: Major
> Attachments: spark3.1.2-request-20210628093538.png, 
> spark3.1.2-stage-20210628093549.png, spark3.1.2-webui-20210628093559.png
>
>
> I try to load the sparkUI page for a specific stage, I get the following 
> error:
> {quote}Unable to connect to the server. Looks like the Spark application must 
> have ended. Please Switch to the history UI.
> {quote}
> Obviously the server is still alive and process new messages.
> Looking at the network tab shows one of the requests fails:
>  
> {{curl 
> 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable'
> 
>  
>  
>  Error 500 Request failed.
>  
>  HTTP ERROR 500
>  Problem accessing 
> /api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable. Reason:
>   Request failed. href="http://eclipse.org/jetty;>Powered by Jetty:// 9.4.z-SNAPSHOT
> 
>  }}
> requests to any other object that I've tested seem to work, for example
>  
> {{curl 
> 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskSummary'}}
>  
> The exception is:
> {{/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable
>  javax.servlet.ServletException: java.lang.NullPointerException
>  at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410)
>  at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366)
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319)
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
>  at 
> org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623)
>  at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>  at 
> org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
>  at 
> org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>  at org.sparkproject.jetty.server.Server.handle(Server.java:505)
>  at org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:370)
>  at 
> org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
>  at 
> org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
>  at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:103)
>  at org.sparkproject.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
>  at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
>  at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
>  at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
>  at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
>  at 
> org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
>  at 
> 

[jira] [Assigned] (SPARK-34302) Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework

2021-06-28 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-34302:
---

Assignee: Terry Kim  (was: Max Gekk)

> Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework
> 
>
> Key: SPARK-34302
> URL: https://issues.apache.org/jira/browse/SPARK-34302
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.2.0
>
>
> # Create the Command logical node for ALTER TABLE .. CHANGE COLUMN
> # Remove AlterTableAlterColumnStatement
> # Remove the check verifyAlterTableType() from run()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-34302) Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework

2021-06-28 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-34302.
-
Resolution: Fixed

Issue resolved by pull request 33113
[https://github.com/apache/spark/pull/33113]

> Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework
> 
>
> Key: SPARK-34302
> URL: https://issues.apache.org/jira/browse/SPARK-34302
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> # Create the Command logical node for ALTER TABLE .. CHANGE COLUMN
> # Remove AlterTableAlterColumnStatement
> # Remove the check verifyAlterTableType() from run()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35888) Add dataSize field in CoalescedPartitionSpec

2021-06-28 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-35888.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33079
[https://github.com/apache/spark/pull/33079]

> Add dataSize field in CoalescedPartitionSpec
> 
>
> Key: SPARK-35888
> URL: https://issues.apache.org/jira/browse/SPARK-35888
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.2.0
>
>
> Currently, all test suite about `CoalescedPartitionSpec` do not check the 
> data size due to it doesn't contains data size field.
> We can add data size in `CoalescedPartitionSpec` and then add test case for 
> better coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35888) Add dataSize field in CoalescedPartitionSpec

2021-06-28 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-35888:
---

Assignee: XiDuo You

> Add dataSize field in CoalescedPartitionSpec
> 
>
> Key: SPARK-35888
> URL: https://issues.apache.org/jira/browse/SPARK-35888
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
>
> Currently, all test suite about `CoalescedPartitionSpec` do not check the 
> data size due to it doesn't contains data size field.
> We can add data size in `CoalescedPartitionSpec` and then add test case for 
> better coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35921) ${spark.yarn.isHadoopProvided} in config.properties is not edited if build with SBT

2021-06-28 Thread Kousuke Saruta (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-35921:
---
Summary: ${spark.yarn.isHadoopProvided} in config.properties is not edited 
if build with SBT  (was: The value of spark.yarn.isHadoopProvided property in 
config.properties is not edited if build with SBT)

> ${spark.yarn.isHadoopProvided} in config.properties is not edited if build 
> with SBT
> ---
>
> Key: SPARK-35921
> URL: https://issues.apache.org/jira/browse/SPARK-35921
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> yarn sub-module contains config.properties.
> {code}
> spark.yarn.isHadoopProvided = ${spark.yarn.isHadoopProvided}
> {code}
> The ${spark.yarn.isHadoopProvided} part is replaced with true or false in 
> build depending on whether Hadoop is provided or not (specified by 
> -Phadoop-provided).
> The edited config.properties will be loaded at runtime to control how to 
> populate Hadoop-related classpath.
> If we build with Maven, these process works but doesn't with SBT.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35921) The value of spark.yarn.isHadoopProvided property in config.properties is not edited if build with SBT

2021-06-28 Thread Kousuke Saruta (Jira)
Kousuke Saruta created SPARK-35921:
--

 Summary: The value of spark.yarn.isHadoopProvided property in 
config.properties is not edited if build with SBT
 Key: SPARK-35921
 URL: https://issues.apache.org/jira/browse/SPARK-35921
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.2.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


yarn sub-module contains config.properties.
{code}
spark.yarn.isHadoopProvided = ${spark.yarn.isHadoopProvided}
{code}

The ${spark.yarn.isHadoopProvided} part is replaced with true or false in build 
depending on whether Hadoop is provided or not (specified by -Phadoop-provided).
The edited config.properties will be loaded at runtime to control how to 
populate Hadoop-related classpath.

If we build with Maven, these process works but doesn't with SBT.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26764) [SPIP] Spark Relational Cache

2021-06-28 Thread Adrian Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370993#comment-17370993
 ] 

Adrian Wang commented on SPARK-26764:
-

[~zshao] Thanks for the interest. We created an open-source plugin: 
[https://github.com/alibaba/SparkCube], to demonstrate the basic ideas.

> [SPIP] Spark Relational Cache
> -
>
> Key: SPARK-26764
> URL: https://issues.apache.org/jira/browse/SPARK-26764
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Adrian Wang
>Priority: Major
> Attachments: Relational+Cache+SPIP.pdf
>
>
> In modern database systems, relational cache is a common technology to boost 
> ad-hoc queries. While Spark provides cache natively, Spark SQL should be able 
> to utilize the relationship between relations to boost all possible queries. 
> In this SPIP, we will make Spark be able to utilize all defined cached 
> relations if possible, without explicit substitution in user query, as well 
> as keep some user defined cache available in different sessions. Materialized 
> views in many database systems provide similar function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35899) Add a utility to convert connector expressions to Catalyst expressions

2021-06-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370991#comment-17370991
 ] 

Apache Spark commented on SPARK-35899:
--

User 'aokolnychyi' has created a pull request for this issue:
https://github.com/apache/spark/pull/33120

> Add a utility to convert connector expressions to Catalyst expressions
> --
>
> Key: SPARK-35899
> URL: https://issues.apache.org/jira/browse/SPARK-35899
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Anton Okolnychyi
>Assignee: Anton Okolnychyi
>Priority: Major
> Fix For: 3.2.0
>
>
> There are more and more places that require converting a v2 connector 
> expression to an internal Catalyst expression. We need to build a utility 
> method to avoid having the same logic in a lot of places.
> See [here|https://github.com/apache/spark/pull/32921#discussion_r653129793].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35899) Add a utility to convert connector expressions to Catalyst expressions

2021-06-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370990#comment-17370990
 ] 

Apache Spark commented on SPARK-35899:
--

User 'aokolnychyi' has created a pull request for this issue:
https://github.com/apache/spark/pull/33120

> Add a utility to convert connector expressions to Catalyst expressions
> --
>
> Key: SPARK-35899
> URL: https://issues.apache.org/jira/browse/SPARK-35899
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Anton Okolnychyi
>Assignee: Anton Okolnychyi
>Priority: Major
> Fix For: 3.2.0
>
>
> There are more and more places that require converting a v2 connector 
> expression to an internal Catalyst expression. We need to build a utility 
> method to avoid having the same logic in a lot of places.
> See [here|https://github.com/apache/spark/pull/32921#discussion_r653129793].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33898) Support SHOW CREATE TABLE in v2

2021-06-28 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-33898.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32931
[https://github.com/apache/spark/pull/32931]

> Support SHOW CREATE TABLE in v2
> ---
>
> Key: SPARK-33898
> URL: https://issues.apache.org/jira/browse/SPARK-33898
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: PengLei
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33898) Support SHOW CREATE TABLE in v2

2021-06-28 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-33898:


Assignee: PengLei

> Support SHOW CREATE TABLE in v2
> ---
>
> Key: SPARK-33898
> URL: https://issues.apache.org/jira/browse/SPARK-33898
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: PengLei
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35344) Support creating a Column of numpy literal value in pandas-on-Spark

2021-06-28 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-35344.
---
Fix Version/s: 3.2.0
 Assignee: Xinrong Meng
   Resolution: Fixed

Issue resolved by pull request 32955
https://github.com/apache/spark/pull/32955

> Support creating a Column of numpy literal value in pandas-on-Spark
> ---
>
> Key: SPARK-35344
> URL: https://issues.apache.org/jira/browse/SPARK-35344
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> Spark (`lit` function defined in `pyspark.sql.functions`) doesn't support 
> creating a Column out of numpy literal value.
> So `lit` function defined in `pyspark.pandas.spark.functions` should be 
> adjusted in order to support that in pandas-on-Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35920) Upgrade to Chill 0.10.0

2021-06-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370971#comment-17370971
 ] 

Apache Spark commented on SPARK-35920:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/33119

> Upgrade to Chill 0.10.0
> ---
>
> Key: SPARK-35920
> URL: https://issues.apache.org/jira/browse/SPARK-35920
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35920) Upgrade to Chill 0.10.0

2021-06-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35920:


Assignee: (was: Apache Spark)

> Upgrade to Chill 0.10.0
> ---
>
> Key: SPARK-35920
> URL: https://issues.apache.org/jira/browse/SPARK-35920
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35920) Upgrade to Chill 0.10.0

2021-06-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370970#comment-17370970
 ] 

Apache Spark commented on SPARK-35920:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/33119

> Upgrade to Chill 0.10.0
> ---
>
> Key: SPARK-35920
> URL: https://issues.apache.org/jira/browse/SPARK-35920
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35920) Upgrade to Chill 0.10.0

2021-06-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35920:


Assignee: Apache Spark

> Upgrade to Chill 0.10.0
> ---
>
> Key: SPARK-35920
> URL: https://issues.apache.org/jira/browse/SPARK-35920
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35920) Upgrade to Chill 0.10.0

2021-06-28 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-35920:
-

 Summary: Upgrade to Chill 0.10.0
 Key: SPARK-35920
 URL: https://issues.apache.org/jira/browse/SPARK-35920
 Project: Spark
  Issue Type: Task
  Components: Build
Affects Versions: 3.2.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35919) Support pathlib.PurePath-like objects in DataFrameReader / DataFrameWriter

2021-06-28 Thread Andrew Grigorev (Jira)
Andrew Grigorev created SPARK-35919:
---

 Summary: Support pathlib.PurePath-like objects in DataFrameReader 
/ DataFrameWriter
 Key: SPARK-35919
 URL: https://issues.apache.org/jira/browse/SPARK-35919
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.1.2, 2.4.8
Reporter: Andrew Grigorev


It would be nice to support Path objects in 
`spark.\{read,write}.\{parquet,orc,csv,...etc}` methods.

Without pyspark source code changes it currently seems possible only by the 
ugly monkeypatching hacks - https://stackoverflow.com/q/68170685/2649222.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35917) Disable push-based shuffle until the feature is complete

2021-06-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35917:


Assignee: Apache Spark

> Disable push-based shuffle until the feature is complete
> 
>
> Key: SPARK-35917
> URL: https://issues.apache.org/jira/browse/SPARK-35917
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Chandni Singh
>Assignee: Apache Spark
>Priority: Major
>
> Push-based shuffle is partially merged in apache master but some of the tasks 
> are still incomplete. Since 3.2 is going to cut soon, we will not be able to 
> get the pending tasks reviewed and merged. Few of the pending tasks make 
> protocol changes to the push-based shuffle protocols, so we would like to 
> prevent users from enabling push-based shuffle both on the client and the 
> server until push-based shuffle implementation is complete. 
> We can prevent push-based shuffle to be used by throwing 
> {{UnsupportedOperationException}} (or something like that) both on the client 
> and the server when the user tries to enable it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35917) Disable push-based shuffle until the feature is complete

2021-06-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35917:


Assignee: (was: Apache Spark)

> Disable push-based shuffle until the feature is complete
> 
>
> Key: SPARK-35917
> URL: https://issues.apache.org/jira/browse/SPARK-35917
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Chandni Singh
>Priority: Major
>
> Push-based shuffle is partially merged in apache master but some of the tasks 
> are still incomplete. Since 3.2 is going to cut soon, we will not be able to 
> get the pending tasks reviewed and merged. Few of the pending tasks make 
> protocol changes to the push-based shuffle protocols, so we would like to 
> prevent users from enabling push-based shuffle both on the client and the 
> server until push-based shuffle implementation is complete. 
> We can prevent push-based shuffle to be used by throwing 
> {{UnsupportedOperationException}} (or something like that) both on the client 
> and the server when the user tries to enable it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35917) Disable push-based shuffle until the feature is complete

2021-06-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370877#comment-17370877
 ] 

Apache Spark commented on SPARK-35917:
--

User 'otterc' has created a pull request for this issue:
https://github.com/apache/spark/pull/33118

> Disable push-based shuffle until the feature is complete
> 
>
> Key: SPARK-35917
> URL: https://issues.apache.org/jira/browse/SPARK-35917
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Chandni Singh
>Priority: Major
>
> Push-based shuffle is partially merged in apache master but some of the tasks 
> are still incomplete. Since 3.2 is going to cut soon, we will not be able to 
> get the pending tasks reviewed and merged. Few of the pending tasks make 
> protocol changes to the push-based shuffle protocols, so we would like to 
> prevent users from enabling push-based shuffle both on the client and the 
> server until push-based shuffle implementation is complete. 
> We can prevent push-based shuffle to be used by throwing 
> {{UnsupportedOperationException}} (or something like that) both on the client 
> and the server when the user tries to enable it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35881) [SQL] AQE does not support columnar execution for the final query stage

2021-06-28 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-35881:
---
Description: 
In AdaptiveSparkPlanExec, a query is broken down into stages and these stages 
are executed until the entire query has been executed. These stages can be 
row-based or columnar. However, the final stage, produced by the private 
getFinalPhysicalPlan method is always assumed to be row-based. The only way to 
execute the final stage is by calling the various doExecute methods on 
AdaptiveSparkPlanExec, and doExecuteColumnar is not implemented. The 
supportsColumnar method also always returns false.

In the RAPIDS Accelerator for Apache Spark, we currently call the private 
getFinalPhysicalPlan method using reflection and then determine if that plan is 
columnar or not, and then call the appropriate doExecute method, bypassing the 
doExecute methods on AdaptiveSparkPlanExec. We would like a supported mechanism 
for executing a columnar AQE plan so that we do not need to use reflection.

 

 

 

 

  was:
In AdaptiveSparkPlanExec, a query is broken down into stages and these stages 
are executed until the entire query has been executed. These stages can be 
row-based or columnar. However, the final stage, produced by the private 
getFinalPhysicalPlan method is always assumed to be row-based. The only way to 
execute the final stage is by calling the various doExecute methods on 
AdaptiveSparkPlanExec. The supportsColumnar method also always returns false, 
which is another limitation. However, AQE is special because we don't know if 
the final stage will be columnar or not until the child stages have been 
executed and the final stage has been re-planned and re-optimized, so we can't 
easily change the behavior of supportsColumnar. We can't just implement 
doExecuteColumnar because we don't know whether the final stage will be 
columnar oir not until after we start executing the query.

In the RAPIDS Accelerator for Apache Spark, we currently call the private 
getFinalPhysicalPlan method using reflection and then determine if that plan is 
columnar or not, and then calling the appropriate doExecute method, bypassing 
the doExecute methods on AdaptiveSparkPlanExec.

I propose that we make getFinalPhysicalPlan public, and part of the developer 
API, so that columnar plugins can call this method and determine if the final 
stage is columnar or not, and execute it appropriately. This would not affect 
any existing Spark functionality. We also need a mechanism for invoking 
finalPlanUpdate after the query has been executed.

 

 

 


> [SQL] AQE does not support columnar execution for the final query stage
> ---
>
> Key: SPARK-35881
> URL: https://issues.apache.org/jira/browse/SPARK-35881
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Andy Grove
>Priority: Major
>
> In AdaptiveSparkPlanExec, a query is broken down into stages and these stages 
> are executed until the entire query has been executed. These stages can be 
> row-based or columnar. However, the final stage, produced by the private 
> getFinalPhysicalPlan method is always assumed to be row-based. The only way 
> to execute the final stage is by calling the various doExecute methods on 
> AdaptiveSparkPlanExec, and doExecuteColumnar is not implemented. The 
> supportsColumnar method also always returns false.
> In the RAPIDS Accelerator for Apache Spark, we currently call the private 
> getFinalPhysicalPlan method using reflection and then determine if that plan 
> is columnar or not, and then call the appropriate doExecute method, bypassing 
> the doExecute methods on AdaptiveSparkPlanExec. We would like a supported 
> mechanism for executing a columnar AQE plan so that we do not need to use 
> reflection.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35918) Consolidate logic between AvroSerializer/AvroDeserializer for schema mismatch handling and error messages

2021-06-28 Thread Erik Krogen (Jira)
Erik Krogen created SPARK-35918:
---

 Summary: Consolidate logic between AvroSerializer/AvroDeserializer 
for schema mismatch handling and error messages
 Key: SPARK-35918
 URL: https://issues.apache.org/jira/browse/SPARK-35918
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.2
Reporter: Erik Krogen


While working on [PR #31490|https://github.com/apache/spark/pull/31490] for 
SPARK-34365, we discussed that there is room for improvement in how schema 
mismatch errors are reported 
([comment1|https://github.com/apache/spark/pull/31490#discussion_r659970793], 
[comment2|https://github.com/apache/spark/pull/31490#issuecomment-869866848]). 
We can also consolidate more of the logic between AvroSerializer and 
AvroDeserializer to avoid some duplication of error handling and consolidate 
how these error messages are generated.

This will essentially be taking the [logic from the initial proposal from PR 
#31490|https://github.com/apache/spark/pull/31490/commits/83a922fdff08528e59233f67930ac78bfb3fa178],
 but applied separately from the current set of proposed changes to cut down on 
PR size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35910) Update remoteBlockBytes based on merged block info

2021-06-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-35910:
-

Assignee: Kent Yao

> Update remoteBlockBytes based on merged block info
> --
>
> Key: SPARK-35910
> URL: https://issues.apache.org/jira/browse/SPARK-35910
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>
> Currently, we calculate the `remoteBlockBytes` based on the original block 
> info list. If the original reducer size is big but the actual reducer size is 
> small due to automatically partition coalescing of AQE, the reducer will take 
> more time to calculate `remoteBlockBytes`. We can reduce this cost via remote 
> requests which contain merged block info lists.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35910) Update remoteBlockBytes based on merged block info

2021-06-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-35910.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33109
[https://github.com/apache/spark/pull/33109]

> Update remoteBlockBytes based on merged block info
> --
>
> Key: SPARK-35910
> URL: https://issues.apache.org/jira/browse/SPARK-35910
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.2.0
>
>
> Currently, we calculate the `remoteBlockBytes` based on the original block 
> info list. If the original reducer size is big but the actual reducer size is 
> small due to automatically partition coalescing of AQE, the reducer will take 
> more time to calculate `remoteBlockBytes`. We can reduce this cost via remote 
> requests which contain merged block info lists.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35859) Cleanup type hints.

2021-06-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370842#comment-17370842
 ] 

Apache Spark commented on SPARK-35859:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/33117

> Cleanup type hints.
> ---
>
> Key: SPARK-35859
> URL: https://issues.apache.org/jira/browse/SPARK-35859
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> - Consolidate the declaration of type vars, type aliases, etc.
> - Renam type vars, like {{T_Frame}}, {{T_IndexOps}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35859) Cleanup type hints.

2021-06-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35859:


Assignee: Apache Spark

> Cleanup type hints.
> ---
>
> Key: SPARK-35859
> URL: https://issues.apache.org/jira/browse/SPARK-35859
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>
> - Consolidate the declaration of type vars, type aliases, etc.
> - Renam type vars, like {{T_Frame}}, {{T_IndexOps}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35859) Cleanup type hints.

2021-06-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35859:


Assignee: (was: Apache Spark)

> Cleanup type hints.
> ---
>
> Key: SPARK-35859
> URL: https://issues.apache.org/jira/browse/SPARK-35859
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> - Consolidate the declaration of type vars, type aliases, etc.
> - Renam type vars, like {{T_Frame}}, {{T_IndexOps}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35859) Cleanup type hints.

2021-06-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370840#comment-17370840
 ] 

Apache Spark commented on SPARK-35859:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/33117

> Cleanup type hints.
> ---
>
> Key: SPARK-35859
> URL: https://issues.apache.org/jira/browse/SPARK-35859
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> - Consolidate the declaration of type vars, type aliases, etc.
> - Renam type vars, like {{T_Frame}}, {{T_IndexOps}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35259) ExternalBlockHandler metrics have misleading unit in the name

2021-06-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370792#comment-17370792
 ] 

Apache Spark commented on SPARK-35259:
--

User 'xkrogen' has created a pull request for this issue:
https://github.com/apache/spark/pull/33116

> ExternalBlockHandler metrics have misleading unit in the name
> -
>
> Key: SPARK-35259
> URL: https://issues.apache.org/jira/browse/SPARK-35259
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.1.1
>Reporter: Erik Krogen
>Priority: Major
>
> Today {{ExternalBlockHandler}} exposes a few {{Timer}} metrics:
> {code}
> // Time latency for open block request in ms
> private final Timer openBlockRequestLatencyMillis = new Timer();
> // Time latency for executor registration latency in ms
> private final Timer registerExecutorRequestLatencyMillis = new Timer();
> // Time latency for processing fetch merged blocks meta request latency 
> in ms
> private final Timer fetchMergedBlocksMetaLatencyMillis = new Timer();
> // Time latency for processing finalize shuffle merge request latency in 
> ms
> private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
> {code}
> However these Dropwizard Timers by default use nanoseconds 
> ([documentation|https://metrics.dropwizard.io/3.2.3/getting-started.html#timers]).
>  It's certainly possible to extract milliseconds from them, but it seems 
> misleading to have millis in the name here.
> This causes {{YarnShuffleServiceMetrics}} to expose confusingly-named metrics 
> like {{openBlockRequestLatencyMillis_count}} and 
> {{openBlockRequestLatencyMillis_nanos}}. It should be up to the metrics 
> exporter, like {{YarnShuffleServiceMetrics}}, to decide the unit and adjust 
> the name accordingly, so the unit shouldn't be included in the name of the 
> metric itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35259) ExternalBlockHandler metrics have misleading unit in the name

2021-06-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35259:


Assignee: Apache Spark

> ExternalBlockHandler metrics have misleading unit in the name
> -
>
> Key: SPARK-35259
> URL: https://issues.apache.org/jira/browse/SPARK-35259
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.1.1
>Reporter: Erik Krogen
>Assignee: Apache Spark
>Priority: Major
>
> Today {{ExternalBlockHandler}} exposes a few {{Timer}} metrics:
> {code}
> // Time latency for open block request in ms
> private final Timer openBlockRequestLatencyMillis = new Timer();
> // Time latency for executor registration latency in ms
> private final Timer registerExecutorRequestLatencyMillis = new Timer();
> // Time latency for processing fetch merged blocks meta request latency 
> in ms
> private final Timer fetchMergedBlocksMetaLatencyMillis = new Timer();
> // Time latency for processing finalize shuffle merge request latency in 
> ms
> private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
> {code}
> However these Dropwizard Timers by default use nanoseconds 
> ([documentation|https://metrics.dropwizard.io/3.2.3/getting-started.html#timers]).
>  It's certainly possible to extract milliseconds from them, but it seems 
> misleading to have millis in the name here.
> This causes {{YarnShuffleServiceMetrics}} to expose confusingly-named metrics 
> like {{openBlockRequestLatencyMillis_count}} and 
> {{openBlockRequestLatencyMillis_nanos}}. It should be up to the metrics 
> exporter, like {{YarnShuffleServiceMetrics}}, to decide the unit and adjust 
> the name accordingly, so the unit shouldn't be included in the name of the 
> metric itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35259) ExternalBlockHandler metrics have misleading unit in the name

2021-06-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370790#comment-17370790
 ] 

Apache Spark commented on SPARK-35259:
--

User 'xkrogen' has created a pull request for this issue:
https://github.com/apache/spark/pull/33116

> ExternalBlockHandler metrics have misleading unit in the name
> -
>
> Key: SPARK-35259
> URL: https://issues.apache.org/jira/browse/SPARK-35259
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.1.1
>Reporter: Erik Krogen
>Priority: Major
>
> Today {{ExternalBlockHandler}} exposes a few {{Timer}} metrics:
> {code}
> // Time latency for open block request in ms
> private final Timer openBlockRequestLatencyMillis = new Timer();
> // Time latency for executor registration latency in ms
> private final Timer registerExecutorRequestLatencyMillis = new Timer();
> // Time latency for processing fetch merged blocks meta request latency 
> in ms
> private final Timer fetchMergedBlocksMetaLatencyMillis = new Timer();
> // Time latency for processing finalize shuffle merge request latency in 
> ms
> private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
> {code}
> However these Dropwizard Timers by default use nanoseconds 
> ([documentation|https://metrics.dropwizard.io/3.2.3/getting-started.html#timers]).
>  It's certainly possible to extract milliseconds from them, but it seems 
> misleading to have millis in the name here.
> This causes {{YarnShuffleServiceMetrics}} to expose confusingly-named metrics 
> like {{openBlockRequestLatencyMillis_count}} and 
> {{openBlockRequestLatencyMillis_nanos}}. It should be up to the metrics 
> exporter, like {{YarnShuffleServiceMetrics}}, to decide the unit and adjust 
> the name accordingly, so the unit shouldn't be included in the name of the 
> metric itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35259) ExternalBlockHandler metrics have misleading unit in the name

2021-06-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35259:


Assignee: (was: Apache Spark)

> ExternalBlockHandler metrics have misleading unit in the name
> -
>
> Key: SPARK-35259
> URL: https://issues.apache.org/jira/browse/SPARK-35259
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.1.1
>Reporter: Erik Krogen
>Priority: Major
>
> Today {{ExternalBlockHandler}} exposes a few {{Timer}} metrics:
> {code}
> // Time latency for open block request in ms
> private final Timer openBlockRequestLatencyMillis = new Timer();
> // Time latency for executor registration latency in ms
> private final Timer registerExecutorRequestLatencyMillis = new Timer();
> // Time latency for processing fetch merged blocks meta request latency 
> in ms
> private final Timer fetchMergedBlocksMetaLatencyMillis = new Timer();
> // Time latency for processing finalize shuffle merge request latency in 
> ms
> private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
> {code}
> However these Dropwizard Timers by default use nanoseconds 
> ([documentation|https://metrics.dropwizard.io/3.2.3/getting-started.html#timers]).
>  It's certainly possible to extract milliseconds from them, but it seems 
> misleading to have millis in the name here.
> This causes {{YarnShuffleServiceMetrics}} to expose confusingly-named metrics 
> like {{openBlockRequestLatencyMillis_count}} and 
> {{openBlockRequestLatencyMillis_nanos}}. It should be up to the metrics 
> exporter, like {{YarnShuffleServiceMetrics}}, to decide the unit and adjust 
> the name accordingly, so the unit shouldn't be included in the name of the 
> metric itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35259) ExternalBlockHandler metrics have misleading unit in the name

2021-06-28 Thread Erik Krogen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated SPARK-35259:

Description: 
Today {{ExternalBlockHandler}} exposes a few {{Timer}} metrics:
{code}
// Time latency for open block request in ms
private final Timer openBlockRequestLatencyMillis = new Timer();
// Time latency for executor registration latency in ms
private final Timer registerExecutorRequestLatencyMillis = new Timer();
// Time latency for processing fetch merged blocks meta request latency in 
ms
private final Timer fetchMergedBlocksMetaLatencyMillis = new Timer();
// Time latency for processing finalize shuffle merge request latency in ms
private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
{code}
However these Dropwizard Timers by default use nanoseconds 
([documentation|https://metrics.dropwizard.io/3.2.3/getting-started.html#timers]).
 It's certainly possible to extract milliseconds from them, but it seems 
misleading to have millis in the name here.

This causes {{YarnShuffleServiceMetrics}} to expose confusingly-named metrics 
like {{openBlockRequestLatencyMillis_count}} and 
{{openBlockRequestLatencyMillis_nanos}}. It should be up to the metrics 
exporter, like {{YarnShuffleServiceMetrics}}, to decide the unit and adjust the 
name accordingly, so the unit shouldn't be included in the name of the metric 
itself.

  was:
Today {{ExternalBlockHandler}} exposes a few {{Timer}} metrics:
{code}
// Time latency for open block request in ms
private final Timer openBlockRequestLatencyMillis = new Timer();
// Time latency for executor registration latency in ms
private final Timer registerExecutorRequestLatencyMillis = new Timer();
// Time latency for processing finalize shuffle merge request latency in ms
private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
{code}
However these Dropwizard Timers by default use nanoseconds 
([documentation|https://metrics.dropwizard.io/3.2.3/getting-started.html#timers]).
 It's certainly possible to extract milliseconds from them, but it seems 
misleading to have millis in the name here.

{{YarnShuffleServiceMetrics}} currently doesn't expose any incorrectly-named 
metrics since it doesn't export any timing information from these metrics 
(which I am trying to address in SPARK-35258), but these names still result in 
kind of misleading metric names like 
{{finalizeShuffleMergeLatencyMillis_count}} -- a count doesn't have a unit. It 
should be up to the metrics exporter, like {{YarnShuffleServiceMetrics}}, to 
decide the unit and adjust the name accordingly.


> ExternalBlockHandler metrics have misleading unit in the name
> -
>
> Key: SPARK-35259
> URL: https://issues.apache.org/jira/browse/SPARK-35259
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.1.1
>Reporter: Erik Krogen
>Priority: Major
>
> Today {{ExternalBlockHandler}} exposes a few {{Timer}} metrics:
> {code}
> // Time latency for open block request in ms
> private final Timer openBlockRequestLatencyMillis = new Timer();
> // Time latency for executor registration latency in ms
> private final Timer registerExecutorRequestLatencyMillis = new Timer();
> // Time latency for processing fetch merged blocks meta request latency 
> in ms
> private final Timer fetchMergedBlocksMetaLatencyMillis = new Timer();
> // Time latency for processing finalize shuffle merge request latency in 
> ms
> private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
> {code}
> However these Dropwizard Timers by default use nanoseconds 
> ([documentation|https://metrics.dropwizard.io/3.2.3/getting-started.html#timers]).
>  It's certainly possible to extract milliseconds from them, but it seems 
> misleading to have millis in the name here.
> This causes {{YarnShuffleServiceMetrics}} to expose confusingly-named metrics 
> like {{openBlockRequestLatencyMillis_count}} and 
> {{openBlockRequestLatencyMillis_nanos}}. It should be up to the metrics 
> exporter, like {{YarnShuffleServiceMetrics}}, to decide the unit and adjust 
> the name accordingly, so the unit shouldn't be included in the name of the 
> metric itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26764) [SPIP] Spark Relational Cache

2021-06-28 Thread Zheng Shao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370770#comment-17370770
 ] 

Zheng Shao commented on SPARK-26764:


[~adrian-wang] It has been over 2 years since this issue was created.  Can you 
give us an update on the latest status of this effort so far?

> [SPIP] Spark Relational Cache
> -
>
> Key: SPARK-26764
> URL: https://issues.apache.org/jira/browse/SPARK-26764
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Adrian Wang
>Priority: Major
> Attachments: Relational+Cache+SPIP.pdf
>
>
> In modern database systems, relational cache is a common technology to boost 
> ad-hoc queries. While Spark provides cache natively, Spark SQL should be able 
> to utilize the relationship between relations to boost all possible queries. 
> In this SPIP, we will make Spark be able to utilize all defined cached 
> relations if possible, without explicit substitution in user query, as well 
> as keep some user defined cache available in different sessions. Materialized 
> views in many database systems provide similar function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29038) SPIP: Support Spark Materialized View

2021-06-28 Thread Zheng Shao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370768#comment-17370768
 ] 

Zheng Shao commented on SPARK-29038:


[~cltlfcjin] and [~AidenZhang]. I also recently started to look at materialized 
views.  This is a huge opportunity for us to improve query performance.

It has been almost a year since the last update.  Are there any new updates 
from your side?

> SPIP: Support Spark Materialized View
> -
>
> Key: SPARK-29038
> URL: https://issues.apache.org/jira/browse/SPARK-29038
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Lantao Jin
>Priority: Major
>
> Materialized view is an important approach in DBMS to cache data to 
> accelerate queries. By creating a materialized view through SQL, the data 
> that can be cached is very flexible, and needs to be configured arbitrarily 
> according to specific usage scenarios. The Materialization Manager 
> automatically updates the cache data according to changes in detail source 
> tables, simplifying user work. When user submit query, Spark optimizer 
> rewrites the execution plan based on the available materialized view to 
> determine the optimal execution plan.
> Details in [design 
> doc|https://docs.google.com/document/d/1q5pjSWoTNVc9zsAfbNzJ-guHyVwPsEroIEP8Cca179A/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35917) Disable push-based shuffle until the feature is complete

2021-06-28 Thread Chandni Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated SPARK-35917:
--
Description: 
Push-based shuffle is partially merged in apache master but some of the tasks 
are still incomplete. Since 3.2 is going to cut soon, we will not be able to 
get the pending tasks reviewed and merged. Few of the pending tasks make 
protocol changes to the push-based shuffle protocols, so we would like to 
prevent users from enabling push-based shuffle both on the client and the 
server until push-based shuffle implementation is complete. 
We can prevent push-based shuffle to be used by throwing 
{{UnsupportedOperationException}} (or something like that) both on the client 
and the server when the user tries to enable it.

> Disable push-based shuffle until the feature is complete
> 
>
> Key: SPARK-35917
> URL: https://issues.apache.org/jira/browse/SPARK-35917
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Chandni Singh
>Priority: Major
>
> Push-based shuffle is partially merged in apache master but some of the tasks 
> are still incomplete. Since 3.2 is going to cut soon, we will not be able to 
> get the pending tasks reviewed and merged. Few of the pending tasks make 
> protocol changes to the push-based shuffle protocols, so we would like to 
> prevent users from enabling push-based shuffle both on the client and the 
> server until push-based shuffle implementation is complete. 
> We can prevent push-based shuffle to be used by throwing 
> {{UnsupportedOperationException}} (or something like that) both on the client 
> and the server when the user tries to enable it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35917) Disable push-based shuffle until the feature is complete

2021-06-28 Thread Chandni Singh (Jira)
Chandni Singh created SPARK-35917:
-

 Summary: Disable push-based shuffle until the feature is complete
 Key: SPARK-35917
 URL: https://issues.apache.org/jira/browse/SPARK-35917
 Project: Spark
  Issue Type: Sub-task
  Components: Shuffle, Spark Core
Affects Versions: 3.1.0
Reporter: Chandni Singh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35898) Converting arrays with RowToColumnConverter triggers assertion

2021-06-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-35898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-35898.
---
Fix Version/s: 3.1.3
   3.2.0
   Resolution: Fixed

> Converting arrays with RowToColumnConverter triggers assertion
> --
>
> Key: SPARK-35898
> URL: https://issues.apache.org/jira/browse/SPARK-35898
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Tom van Bussel
>Assignee: Tom van Bussel
>Priority: Major
> Fix For: 3.2.0, 3.1.3
>
>
> When trying to convert a row that contains an array to a ColumnVector with 
> RowToColumnConverter the following error is thrown:
> {code:java}
> java.lang.AssertionError at 
> org.apache.spark.sql.execution.vectorized.OffHeapColumnVector.putArray(OffHeapColumnVector.java:560)
>  at 
> org.apache.spark.sql.execution.vectorized.WritableColumnVector.appendArray(WritableColumnVector.java:622)
>  at 
> org.apache.spark.sql.execution.RowToColumnConverter$ArrayConverter.append(Columnar.scala:353)
>  at 
> org.apache.spark.sql.execution.RowToColumnConverter$BasicNullableTypeConverter.append(Columnar.scala:241)
>  at 
> org.apache.spark.sql.execution.RowToColumnConverter.convert(Columnar.scala:221)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35898) Converting arrays with RowToColumnConverter triggers assertion

2021-06-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-35898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell reassigned SPARK-35898:
-

Assignee: Tom van Bussel

> Converting arrays with RowToColumnConverter triggers assertion
> --
>
> Key: SPARK-35898
> URL: https://issues.apache.org/jira/browse/SPARK-35898
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Tom van Bussel
>Assignee: Tom van Bussel
>Priority: Major
>
> When trying to convert a row that contains an array to a ColumnVector with 
> RowToColumnConverter the following error is thrown:
> {code:java}
> java.lang.AssertionError at 
> org.apache.spark.sql.execution.vectorized.OffHeapColumnVector.putArray(OffHeapColumnVector.java:560)
>  at 
> org.apache.spark.sql.execution.vectorized.WritableColumnVector.appendArray(WritableColumnVector.java:622)
>  at 
> org.apache.spark.sql.execution.RowToColumnConverter$ArrayConverter.append(Columnar.scala:353)
>  at 
> org.apache.spark.sql.execution.RowToColumnConverter$BasicNullableTypeConverter.append(Columnar.scala:241)
>  at 
> org.apache.spark.sql.execution.RowToColumnConverter.convert(Columnar.scala:221)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35802) Error loading the stages/stage/ page in spark UI

2021-06-28 Thread Helt Long (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370626#comment-17370626
 ] 

Helt Long commented on SPARK-35802:
---

Sound like it's key point, I will try higher hadoop version and close it 
myself. Thanks [~sarutak]!

> Error loading the stages/stage/ page in spark UI
> 
>
> Key: SPARK-35802
> URL: https://issues.apache.org/jira/browse/SPARK-35802
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0, 3.0.1, 3.1.1, 3.1.2
> Environment: CDH 5.7.1: Hadoop 2.6.5
> Spark on yarn cluster mode
>Reporter: Helt Long
>Priority: Major
> Attachments: spark3.1.2-request-20210628093538.png, 
> spark3.1.2-stage-20210628093549.png, spark3.1.2-webui-20210628093559.png
>
>
> I try to load the sparkUI page for a specific stage, I get the following 
> error:
> {quote}Unable to connect to the server. Looks like the Spark application must 
> have ended. Please Switch to the history UI.
> {quote}
> Obviously the server is still alive and process new messages.
> Looking at the network tab shows one of the requests fails:
>  
> {{curl 
> 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable'
> 
>  
>  
>  Error 500 Request failed.
>  
>  HTTP ERROR 500
>  Problem accessing 
> /api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable. Reason:
>   Request failed. href="http://eclipse.org/jetty;>Powered by Jetty:// 9.4.z-SNAPSHOT
> 
>  }}
> requests to any other object that I've tested seem to work, for example
>  
> {{curl 
> 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskSummary'}}
>  
> The exception is:
> {{/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable
>  javax.servlet.ServletException: java.lang.NullPointerException
>  at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410)
>  at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366)
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319)
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
>  at 
> org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623)
>  at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>  at 
> org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
>  at 
> org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>  at org.sparkproject.jetty.server.Server.handle(Server.java:505)
>  at org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:370)
>  at 
> org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
>  at 
> org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
>  at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:103)
>  at org.sparkproject.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
>  at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
>  at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
>  at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
>  at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
>  at 
> org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
>  at 
> 

[jira] [Commented] (SPARK-35802) Error loading the stages/stage/ page in spark UI

2021-06-28 Thread Kousuke Saruta (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370590#comment-17370590
 ] 

Kousuke Saruta commented on SPARK-35802:


Hadoop version is 2.9.0 on my environment.
I ran an application with yarn-cluster mode but the issue was not reproduced. 

> Error loading the stages/stage/ page in spark UI
> 
>
> Key: SPARK-35802
> URL: https://issues.apache.org/jira/browse/SPARK-35802
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0, 3.0.1, 3.1.1, 3.1.2
> Environment: CDH 5.7.1: Hadoop 2.6.5
> Spark on yarn cluster mode
>Reporter: Helt Long
>Priority: Major
> Attachments: spark3.1.2-request-20210628093538.png, 
> spark3.1.2-stage-20210628093549.png, spark3.1.2-webui-20210628093559.png
>
>
> I try to load the sparkUI page for a specific stage, I get the following 
> error:
> {quote}Unable to connect to the server. Looks like the Spark application must 
> have ended. Please Switch to the history UI.
> {quote}
> Obviously the server is still alive and process new messages.
> Looking at the network tab shows one of the requests fails:
>  
> {{curl 
> 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable'
> 
>  
>  
>  Error 500 Request failed.
>  
>  HTTP ERROR 500
>  Problem accessing 
> /api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable. Reason:
>   Request failed. href="http://eclipse.org/jetty;>Powered by Jetty:// 9.4.z-SNAPSHOT
> 
>  }}
> requests to any other object that I've tested seem to work, for example
>  
> {{curl 
> 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskSummary'}}
>  
> The exception is:
> {{/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable
>  javax.servlet.ServletException: java.lang.NullPointerException
>  at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410)
>  at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366)
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319)
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
>  at 
> org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623)
>  at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>  at 
> org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
>  at 
> org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>  at org.sparkproject.jetty.server.Server.handle(Server.java:505)
>  at org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:370)
>  at 
> org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
>  at 
> org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
>  at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:103)
>  at org.sparkproject.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
>  at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
>  at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
>  at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
>  at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
>  at 
> org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
>  at 
> 

[jira] [Assigned] (SPARK-35916) Support subtraction among Date/Timestamp/TimestampWithoutTZ

2021-06-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35916:


Assignee: Gengliang Wang  (was: Apache Spark)

> Support subtraction among Date/Timestamp/TimestampWithoutTZ
> ---
>
> Key: SPARK-35916
> URL: https://issues.apache.org/jira/browse/SPARK-35916
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Support the following operations:
> * TimestampWithoutTZ - Date
> * Date - TimestampWithoutTZ
> * TimestampWithoutTZ - Timestamp
> * Timestamp - TimestampWithoutTZ
> * TimestampWithoutTZ - TimestampWithoutTZ



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35916) Support subtraction among Date/Timestamp/TimestampWithoutTZ

2021-06-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370578#comment-17370578
 ] 

Apache Spark commented on SPARK-35916:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/33115

> Support subtraction among Date/Timestamp/TimestampWithoutTZ
> ---
>
> Key: SPARK-35916
> URL: https://issues.apache.org/jira/browse/SPARK-35916
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Support the following operations:
> * TimestampWithoutTZ - Date
> * Date - TimestampWithoutTZ
> * TimestampWithoutTZ - Timestamp
> * Timestamp - TimestampWithoutTZ
> * TimestampWithoutTZ - TimestampWithoutTZ



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35916) Support subtraction among Date/Timestamp/TimestampWithoutTZ

2021-06-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35916:


Assignee: Apache Spark  (was: Gengliang Wang)

> Support subtraction among Date/Timestamp/TimestampWithoutTZ
> ---
>
> Key: SPARK-35916
> URL: https://issues.apache.org/jira/browse/SPARK-35916
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> Support the following operations:
> * TimestampWithoutTZ - Date
> * Date - TimestampWithoutTZ
> * TimestampWithoutTZ - Timestamp
> * Timestamp - TimestampWithoutTZ
> * TimestampWithoutTZ - TimestampWithoutTZ



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35916) Support subtraction among Date/Timestamp/TimestampWithoutTZ

2021-06-28 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-35916:
--

 Summary: Support subtraction among 
Date/Timestamp/TimestampWithoutTZ
 Key: SPARK-35916
 URL: https://issues.apache.org/jira/browse/SPARK-35916
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


Support the following operations:
* TimestampWithoutTZ - Date
* Date - TimestampWithoutTZ
* TimestampWithoutTZ - Timestamp
* Timestamp - TimestampWithoutTZ
* TimestampWithoutTZ - TimestampWithoutTZ



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35912) [SQL] JSON read behavior is different depending on the cache setting when nullable is false.

2021-06-28 Thread Fu Chen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370512#comment-17370512
 ] 

Fu Chen commented on SPARK-35912:
-

Working on this

> [SQL] JSON read behavior is different depending on the cache setting when 
> nullable is false.
> 
>
> Key: SPARK-35912
> URL: https://issues.apache.org/jira/browse/SPARK-35912
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Heedo Lee
>Priority: Minor
>
> Below is the reproduced code.
>  
> {code:java}
> import org.apache.spark.sql.Encoders
>  
> case class TestSchema(x: Int, y: Int)
> case class BaseSchema(value: TestSchema)
>  
> val schema = Encoders.product[BaseSchema].schema
> val testDS = Seq("""{"value":{"x":1}}""", """{"value":{"x":2}}""").toDS
> val jsonDS = spark.read.schema(schema).json(testDS)
> jsonDS.show
> +-+
> |value|
> +-+
> |{1, null}|
> |{2, null}|
> +-+
> jsonDS.cache.show
> +--+
> | value|
> +--+
> |{1, 0}|
> |{2, 0}|
> +--+
> {code}
>  
> The above result occurs when a schema is created with a nested StructType and 
> nullable of StructField is false.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35915) Kafka doesn't recover from data loss

2021-06-28 Thread Yuval Yellin (Jira)
Yuval Yellin created SPARK-35915:


 Summary: Kafka doesn't recover from data loss
 Key: SPARK-35915
 URL: https://issues.apache.org/jira/browse/SPARK-35915
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 3.1.1
Reporter: Yuval Yellin


I configured a strcutured streaming source for kafka with failOnDataLoss=false, 

Getting this error when checkopint offsets are not found :

 
{code:java}
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 7 in stage 5.0 failed 1 times, most recent failure: Lost task 7.0 in stage 
5.0 (TID 113) ( executor driver): java.lang.IllegalStateException: This 
consumer has already been closed.
  at 
org.apache.kafka.clients.consumer.KafkaConsumer.acquireAndEnsureOpen(KafkaConsumer.java:2439)
  at 
org.apache.kafka.clients.consumer.KafkaConsumer.seekToBeginning(KafkaConsumer.java:1656)
  at 
org.apache.spark.sql.kafka010.consumer.InternalKafkaConsumer.getAvailableOffsetRange(KafkaDataConsumer.scala:108)
  at 
org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer.getEarliestAvailableOffsetBetween(KafkaDataConsumer.scala:385)
  at 
org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer.$anonfun$get$1(KafkaDataConsumer.scala:332)
  at 
org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77)
  at 
org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer.runUninterruptiblyIfPossible(KafkaDataConsumer.scala:604)
  at 
org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer.get(KafkaDataConsumer.scala:287)
  at 
org.apache.spark.sql.kafka010.KafkaBatchPartitionReader.next(KafkaBatchPartitionReader.scala:63)
  at 
org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:79)
  at 
org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:112)
  at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)

{code}
 

The issue seems to me to be related to the OffsetOutOfRange exception in (line 
(323 in KafkaDataConsumer): 

 
{code:java}
 case e: OffsetOutOfRangeException =>
// When there is some error thrown, it's better to use a new consumer to 
drop all cached
// states in the old consumer. We don't need to worry about the performance 
because this
// is not a common path.
releaseConsumer()
fetchedData.reset()

reportDataLoss(topicPartition, groupId, failOnDataLoss,
  s"Cannot fetch offset $toFetchOffset", e)
toFetchOffset = getEarliestAvailableOffsetBetween(consumer, toFetchOffset, 
untilOffset)
}
{code}
seems like releaseConsumer will destoy the consumer , which later is used ...

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35914) Driver can't distribute task to executor because NullPointerException

2021-06-28 Thread Helt Long (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Helt Long updated SPARK-35914:
--
Attachment: webui stuck.png

> Driver can't distribute task to executor because NullPointerException
> -
>
> Key: SPARK-35914
> URL: https://issues.apache.org/jira/browse/SPARK-35914
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1, 3.1.1, 3.1.2
> Environment: CDH 5.7.1: Hadoop 2.6.5
> Spark 3.0.1, 3.1.1, 3.1.2
>Reporter: Helt Long
>Priority: Major
> Attachments: stuck log.png, webui stuck.png
>
>
> When use spark3 submit a spark job to yarn cluster, I get a problem. Once in 
> a while, driver can't distribute any tasks to any executors, and the stage 
> will stuck , the total spark job will stuck. Check driver log, I found 
> NullPointerException. It's like a netty problem, I can confirm this problem 
> only exist in spark3, because I use spark2 never happend.
>  
> {code:java}
> // Error message
> 21/06/28 14:42:43 INFO TaskSetManager: Starting task 2592.0 in stage 1.0 (TID 
> 3494) (worker39.hadoop, executor 84, partition 2592, RACK_LOCAL, 5006 bytes) 
> taskResourceAssignments Map()
> 21/06/28 14:42:43 INFO TaskSetManager: Finished task 4155.0 in stage 1.0 (TID 
> 3367) in 36670 ms on worker39.hadoop (executor 84) (3278/4249)
> 21/06/28 14:42:43 INFO TaskSetManager: Finished task 2283.0 in stage 1.0 (TID 
> 3422) in 22371 ms on worker15.hadoop (executor 109) (3279/4249)
> 21/06/28 14:42:43 ERROR Inbox: Ignoring error
> java.lang.NullPointerException
>   at java.lang.String.length(String.java:623)
>   at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:420)
>   at java.lang.StringBuilder.append(StringBuilder.java:136)
>   at 
> org.apache.spark.scheduler.TaskSetManager.$anonfun$resourceOffer$5(TaskSetManager.scala:483)
>   at org.apache.spark.internal.Logging.logInfo(Logging.scala:57)
>   at org.apache.spark.internal.Logging.logInfo$(Logging.scala:56)
>   at 
> org.apache.spark.scheduler.TaskSetManager.logInfo(TaskSetManager.scala:54)
>   at 
> org.apache.spark.scheduler.TaskSetManager.$anonfun$resourceOffer$2(TaskSetManager.scala:484)
>   at scala.Option.map(Option.scala:230)
>   at 
> org.apache.spark.scheduler.TaskSetManager.resourceOffer(TaskSetManager.scala:444)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$2(TaskSchedulerImpl.scala:397)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$2$adapted(TaskSchedulerImpl.scala:392)
>   at scala.Option.foreach(Option.scala:407)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$1(TaskSchedulerImpl.scala:392)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.resourceOfferSingleTaskSet(TaskSchedulerImpl.scala:383)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$20(TaskSchedulerImpl.scala:581)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$20$adapted(TaskSchedulerImpl.scala:576)
>   at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>   at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$16(TaskSchedulerImpl.scala:576)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$16$adapted(TaskSchedulerImpl.scala:547)
>   at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:547)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.$anonfun$makeOffers$5(CoarseGrainedSchedulerBackend.scala:340)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.org$apache$spark$scheduler$cluster$CoarseGrainedSchedulerBackend$$withLock(CoarseGrainedSchedulerBackend.scala:904)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.org$apache$spark$scheduler$cluster$CoarseGrainedSchedulerBackend$DriverEndpoint$$makeOffers(CoarseGrainedSchedulerBackend.scala:332)
>   at 
> 

[jira] [Updated] (SPARK-35914) Driver can't distribute task to executor because NullPointerException

2021-06-28 Thread Helt Long (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Helt Long updated SPARK-35914:
--
Attachment: stuck log.png

> Driver can't distribute task to executor because NullPointerException
> -
>
> Key: SPARK-35914
> URL: https://issues.apache.org/jira/browse/SPARK-35914
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1, 3.1.1, 3.1.2
> Environment: CDH 5.7.1: Hadoop 2.6.5
> Spark 3.0.1, 3.1.1, 3.1.2
>Reporter: Helt Long
>Priority: Major
> Attachments: stuck log.png, webui stuck.png
>
>
> When use spark3 submit a spark job to yarn cluster, I get a problem. Once in 
> a while, driver can't distribute any tasks to any executors, and the stage 
> will stuck , the total spark job will stuck. Check driver log, I found 
> NullPointerException. It's like a netty problem, I can confirm this problem 
> only exist in spark3, because I use spark2 never happend.
>  
> {code:java}
> // Error message
> 21/06/28 14:42:43 INFO TaskSetManager: Starting task 2592.0 in stage 1.0 (TID 
> 3494) (worker39.hadoop, executor 84, partition 2592, RACK_LOCAL, 5006 bytes) 
> taskResourceAssignments Map()
> 21/06/28 14:42:43 INFO TaskSetManager: Finished task 4155.0 in stage 1.0 (TID 
> 3367) in 36670 ms on worker39.hadoop (executor 84) (3278/4249)
> 21/06/28 14:42:43 INFO TaskSetManager: Finished task 2283.0 in stage 1.0 (TID 
> 3422) in 22371 ms on worker15.hadoop (executor 109) (3279/4249)
> 21/06/28 14:42:43 ERROR Inbox: Ignoring error
> java.lang.NullPointerException
>   at java.lang.String.length(String.java:623)
>   at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:420)
>   at java.lang.StringBuilder.append(StringBuilder.java:136)
>   at 
> org.apache.spark.scheduler.TaskSetManager.$anonfun$resourceOffer$5(TaskSetManager.scala:483)
>   at org.apache.spark.internal.Logging.logInfo(Logging.scala:57)
>   at org.apache.spark.internal.Logging.logInfo$(Logging.scala:56)
>   at 
> org.apache.spark.scheduler.TaskSetManager.logInfo(TaskSetManager.scala:54)
>   at 
> org.apache.spark.scheduler.TaskSetManager.$anonfun$resourceOffer$2(TaskSetManager.scala:484)
>   at scala.Option.map(Option.scala:230)
>   at 
> org.apache.spark.scheduler.TaskSetManager.resourceOffer(TaskSetManager.scala:444)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$2(TaskSchedulerImpl.scala:397)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$2$adapted(TaskSchedulerImpl.scala:392)
>   at scala.Option.foreach(Option.scala:407)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$1(TaskSchedulerImpl.scala:392)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.resourceOfferSingleTaskSet(TaskSchedulerImpl.scala:383)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$20(TaskSchedulerImpl.scala:581)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$20$adapted(TaskSchedulerImpl.scala:576)
>   at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>   at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$16(TaskSchedulerImpl.scala:576)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$16$adapted(TaskSchedulerImpl.scala:547)
>   at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:547)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.$anonfun$makeOffers$5(CoarseGrainedSchedulerBackend.scala:340)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.org$apache$spark$scheduler$cluster$CoarseGrainedSchedulerBackend$$withLock(CoarseGrainedSchedulerBackend.scala:904)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.org$apache$spark$scheduler$cluster$CoarseGrainedSchedulerBackend$DriverEndpoint$$makeOffers(CoarseGrainedSchedulerBackend.scala:332)
>   at 
> 

[jira] [Assigned] (SPARK-35904) Collapse above RebalancePartitions

2021-06-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35904:


Assignee: Yuming Wang  (was: Apache Spark)

> Collapse above RebalancePartitions
> --
>
> Key: SPARK-35904
> URL: https://issues.apache.org/jira/browse/SPARK-35904
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> Make RebalancePartitions extends RepartitionOperation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35904) Collapse above RebalancePartitions

2021-06-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35904:


Assignee: Apache Spark  (was: Yuming Wang)

> Collapse above RebalancePartitions
> --
>
> Key: SPARK-35904
> URL: https://issues.apache.org/jira/browse/SPARK-35904
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>
> Make RebalancePartitions extends RepartitionOperation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35904) Collapse above RebalancePartitions

2021-06-28 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-35904:

Fix Version/s: (was: 3.2.0)

> Collapse above RebalancePartitions
> --
>
> Key: SPARK-35904
> URL: https://issues.apache.org/jira/browse/SPARK-35904
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> Make RebalancePartitions extends RepartitionOperation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-35904) Collapse above RebalancePartitions

2021-06-28 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reopened SPARK-35904:
-

Reverted at 
https://github.com/apache/spark/commit/108635af1708173a72bec0e36bf3f2cea5b088c4

> Collapse above RebalancePartitions
> --
>
> Key: SPARK-35904
> URL: https://issues.apache.org/jira/browse/SPARK-35904
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> Make RebalancePartitions extends RepartitionOperation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35672) Spark fails to launch executors with very large user classpath lists on YARN

2021-06-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-35672:
-
Fix Version/s: 3.1.3

> Spark fails to launch executors with very large user classpath lists on YARN
> 
>
> Key: SPARK-35672
> URL: https://issues.apache.org/jira/browse/SPARK-35672
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 3.1.2
> Environment: Linux RHEL7
> Spark 3.1.1
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.2.0, 3.1.3
>
>
> When running Spark on YARN, the {{user-class-path}} argument to 
> {{CoarseGrainedExecutorBackend}} is used to pass a list of user JAR URIs to 
> executor processes. The argument is specified once for each JAR, and the URIs 
> are fully-qualified, so the paths can be quite long. With large user JAR 
> lists (say 1000+), this can result in system-level argument length limits 
> being exceeded, typically manifesting as the error message:
> {code}
> /bin/bash: Argument list too long
> {code}
> A [Google 
> search|https://www.google.com/search?q=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22]
>  indicates that this is not a theoretical problem and afflicts real users, 
> including ours. This issue was originally observed on Spark 2.3, but has been 
> confirmed to exist in the master branch as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35802) Error loading the stages/stage/ page in spark UI

2021-06-28 Thread Helt Long (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370491#comment-17370491
 ] 

Helt Long commented on SPARK-35802:
---

[~sarutak]

I used CDH 5.7.1, and used Spark on Yarn cluster mode, this problem happend all 
the time. When I google thhis problem, I found the same problem on 
stackoverflow, so I move the problem there

[Error loading the stages/stage/ page in spark UI - Stack 
Overflow|https://stackoverflow.com/questions/64265444/error-loading-the-stages-stage-id-page-in-spark-ui]

I can 100% recurrence problem

I add some message about env like blow: 

CDH 5.7.1: Hadoop 2.6.5

Spark on yarn cluster mode

> Error loading the stages/stage/ page in spark UI
> 
>
> Key: SPARK-35802
> URL: https://issues.apache.org/jira/browse/SPARK-35802
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0, 3.0.1, 3.1.1, 3.1.2
> Environment: CDH 5.7.1: Hadoop 2.6.5
> Spark on yarn cluster mode
>Reporter: Helt Long
>Priority: Major
> Attachments: spark3.1.2-request-20210628093538.png, 
> spark3.1.2-stage-20210628093549.png, spark3.1.2-webui-20210628093559.png
>
>
> I try to load the sparkUI page for a specific stage, I get the following 
> error:
> {quote}Unable to connect to the server. Looks like the Spark application must 
> have ended. Please Switch to the history UI.
> {quote}
> Obviously the server is still alive and process new messages.
> Looking at the network tab shows one of the requests fails:
>  
> {{curl 
> 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable'
> 
>  
>  
>  Error 500 Request failed.
>  
>  HTTP ERROR 500
>  Problem accessing 
> /api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable. Reason:
>   Request failed. href="http://eclipse.org/jetty;>Powered by Jetty:// 9.4.z-SNAPSHOT
> 
>  }}
> requests to any other object that I've tested seem to work, for example
>  
> {{curl 
> 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskSummary'}}
>  
> The exception is:
> {{/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable
>  javax.servlet.ServletException: java.lang.NullPointerException
>  at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410)
>  at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366)
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319)
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
>  at 
> org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623)
>  at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>  at 
> org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
>  at 
> org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>  at org.sparkproject.jetty.server.Server.handle(Server.java:505)
>  at org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:370)
>  at 
> org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
>  at 
> org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
>  at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:103)
>  at org.sparkproject.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
>  at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
>  at 
> 

[jira] [Comment Edited] (SPARK-35802) Error loading the stages/stage/ page in spark UI

2021-06-28 Thread Helt Long (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370491#comment-17370491
 ] 

Helt Long edited comment on SPARK-35802 at 6/28/21, 8:33 AM:
-

[~sarutak]

I used CDH 5.7.1, and used Spark on Yarn cluster mode, this problem happend all 
the time. When I google this problem, I found the same problem on 
stackoverflow, so I move the problem there

[Error loading the stages/stage/ page in spark UI - Stack 
Overflow|https://stackoverflow.com/questions/64265444/error-loading-the-stages-stage-id-page-in-spark-ui]

I can 100% recurrence problem

I add some message about env like blow: 

CDH 5.7.1: Hadoop 2.6.5

Spark on yarn cluster mode


was (Author: heltman):
[~sarutak]

I used CDH 5.7.1, and used Spark on Yarn cluster mode, this problem happend all 
the time. When I google thhis problem, I found the same problem on 
stackoverflow, so I move the problem there

[Error loading the stages/stage/ page in spark UI - Stack 
Overflow|https://stackoverflow.com/questions/64265444/error-loading-the-stages-stage-id-page-in-spark-ui]

I can 100% recurrence problem

I add some message about env like blow: 

CDH 5.7.1: Hadoop 2.6.5

Spark on yarn cluster mode

> Error loading the stages/stage/ page in spark UI
> 
>
> Key: SPARK-35802
> URL: https://issues.apache.org/jira/browse/SPARK-35802
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0, 3.0.1, 3.1.1, 3.1.2
> Environment: CDH 5.7.1: Hadoop 2.6.5
> Spark on yarn cluster mode
>Reporter: Helt Long
>Priority: Major
> Attachments: spark3.1.2-request-20210628093538.png, 
> spark3.1.2-stage-20210628093549.png, spark3.1.2-webui-20210628093559.png
>
>
> I try to load the sparkUI page for a specific stage, I get the following 
> error:
> {quote}Unable to connect to the server. Looks like the Spark application must 
> have ended. Please Switch to the history UI.
> {quote}
> Obviously the server is still alive and process new messages.
> Looking at the network tab shows one of the requests fails:
>  
> {{curl 
> 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable'
> 
>  
>  
>  Error 500 Request failed.
>  
>  HTTP ERROR 500
>  Problem accessing 
> /api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable. Reason:
>   Request failed. href="http://eclipse.org/jetty;>Powered by Jetty:// 9.4.z-SNAPSHOT
> 
>  }}
> requests to any other object that I've tested seem to work, for example
>  
> {{curl 
> 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskSummary'}}
>  
> The exception is:
> {{/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable
>  javax.servlet.ServletException: java.lang.NullPointerException
>  at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410)
>  at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366)
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319)
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
>  at 
> org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623)
>  at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>  at 
> org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
>  at 
> org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>  at org.sparkproject.jetty.server.Server.handle(Server.java:505)
>  at 

[jira] [Updated] (SPARK-35802) Error loading the stages/stage/ page in spark UI

2021-06-28 Thread Helt Long (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Helt Long updated SPARK-35802:
--
Description: 
I try to load the sparkUI page for a specific stage, I get the following error:
{quote}Unable to connect to the server. Looks like the Spark application must 
have ended. Please Switch to the history UI.
{quote}
Obviously the server is still alive and process new messages.

Looking at the network tab shows one of the requests fails:

 

{{curl 
'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable'


 
 
 Error 500 Request failed.
 
 HTTP ERROR 500
 Problem accessing 
/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable. Reason:
  Request failed.http://eclipse.org/jetty;>Powered 
by Jetty:// 9.4.z-SNAPSHOT


 }}

requests to any other object that I've tested seem to work, for example

 

{{curl 
'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskSummary'}}

 

The exception is:

{{/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable
 javax.servlet.ServletException: java.lang.NullPointerException
 at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410)
 at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)
 at 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366)
 at 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319)
 at 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
 at org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873)
 at 
org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623)
 at org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
 at 
org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
 at 
org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
 at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
 at 
org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
 at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
 at 
org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
 at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
 at 
org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
 at 
org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
 at 
org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753)
 at 
org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
 at 
org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
 at org.sparkproject.jetty.server.Server.handle(Server.java:505)
 at org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:370)
 at 
org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
 at 
org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
 at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:103)
 at org.sparkproject.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
 at 
org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
 at 
org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
 at 
org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
 at 
org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
 at 
org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
 at 
org.sparkproject.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:698)
 at 
org.sparkproject.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:804)
 at java.lang.Thread.run(Thread.java:748)
 Caused by: java.lang.NullPointerException
 at 
org.apache.spark.status.api.v1.StagesResource.$anonfun$doPagination$1(StagesResource.scala:175)
 at 
org.apache.spark.status.api.v1.BaseAppResource.$anonfun$withUI$1(ApiRootResource.scala:140)
 at org.apache.spark.ui.SparkUI.withSparkUI(SparkUI.scala:107)
 at 
org.apache.spark.status.api.v1.BaseAppResource.withUI(ApiRootResource.scala:135)
 at 
org.apache.spark.status.api.v1.BaseAppResource.withUI$(ApiRootResource.scala:133)
 at 
org.apache.spark.status.api.v1.StagesResource.withUI(StagesResource.scala:28)
 at 
org.apache.spark.status.api.v1.StagesResource.doPagination(StagesResource.scala:174)
 at 
org.apache.spark.status.api.v1.StagesResource.$anonfun$taskTable$1(StagesResource.scala:129)
 at 

[jira] [Created] (SPARK-35914) Driver can't distribute task to executor because NullPointerException

2021-06-28 Thread Helt Long (Jira)
Helt Long created SPARK-35914:
-

 Summary: Driver can't distribute task to executor because 
NullPointerException
 Key: SPARK-35914
 URL: https://issues.apache.org/jira/browse/SPARK-35914
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.1.2, 3.1.1, 3.0.1
 Environment: CDH 5.7.1: Hadoop 2.6.5

Spark 3.0.1, 3.1.1, 3.1.2
Reporter: Helt Long


When use spark3 submit a spark job to yarn cluster, I get a problem. Once in a 
while, driver can't distribute any tasks to any executors, and the stage will 
stuck , the total spark job will stuck. Check driver log, I found 
NullPointerException. It's like a netty problem, I can confirm this problem 
only exist in spark3, because I use spark2 never happend.

 
{code:java}
// Error message
21/06/28 14:42:43 INFO TaskSetManager: Starting task 2592.0 in stage 1.0 (TID 
3494) (worker39.hadoop, executor 84, partition 2592, RACK_LOCAL, 5006 bytes) 
taskResourceAssignments Map()
21/06/28 14:42:43 INFO TaskSetManager: Finished task 4155.0 in stage 1.0 (TID 
3367) in 36670 ms on worker39.hadoop (executor 84) (3278/4249)
21/06/28 14:42:43 INFO TaskSetManager: Finished task 2283.0 in stage 1.0 (TID 
3422) in 22371 ms on worker15.hadoop (executor 109) (3279/4249)
21/06/28 14:42:43 ERROR Inbox: Ignoring error
java.lang.NullPointerException
at java.lang.String.length(String.java:623)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:420)
at java.lang.StringBuilder.append(StringBuilder.java:136)
at 
org.apache.spark.scheduler.TaskSetManager.$anonfun$resourceOffer$5(TaskSetManager.scala:483)
at org.apache.spark.internal.Logging.logInfo(Logging.scala:57)
at org.apache.spark.internal.Logging.logInfo$(Logging.scala:56)
at 
org.apache.spark.scheduler.TaskSetManager.logInfo(TaskSetManager.scala:54)
at 
org.apache.spark.scheduler.TaskSetManager.$anonfun$resourceOffer$2(TaskSetManager.scala:484)
at scala.Option.map(Option.scala:230)
at 
org.apache.spark.scheduler.TaskSetManager.resourceOffer(TaskSetManager.scala:444)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$2(TaskSchedulerImpl.scala:397)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$2$adapted(TaskSchedulerImpl.scala:392)
at scala.Option.foreach(Option.scala:407)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$1(TaskSchedulerImpl.scala:392)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.resourceOfferSingleTaskSet(TaskSchedulerImpl.scala:383)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$20(TaskSchedulerImpl.scala:581)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$20$adapted(TaskSchedulerImpl.scala:576)
at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$16(TaskSchedulerImpl.scala:576)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$16$adapted(TaskSchedulerImpl.scala:547)
at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:547)
at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.$anonfun$makeOffers$5(CoarseGrainedSchedulerBackend.scala:340)
at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.org$apache$spark$scheduler$cluster$CoarseGrainedSchedulerBackend$$withLock(CoarseGrainedSchedulerBackend.scala:904)
at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.org$apache$spark$scheduler$cluster$CoarseGrainedSchedulerBackend$DriverEndpoint$$makeOffers(CoarseGrainedSchedulerBackend.scala:332)
at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$receive$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:157)
at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at 
org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
 

[jira] [Resolved] (SPARK-35258) Enhance ESS ExternalBlockHandler with additional block rate-based metrics and histograms

2021-06-28 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-35258.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32388
[https://github.com/apache/spark/pull/32388]

> Enhance ESS ExternalBlockHandler with additional block rate-based metrics and 
> histograms
> 
>
> Key: SPARK-35258
> URL: https://issues.apache.org/jira/browse/SPARK-35258
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, YARN
>Affects Versions: 3.1.1
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.2.0
>
>
> Today the {{ExternalBlockHandler}} component of ESS exposes some useful 
> metrics, but is lacking around metrics for the rate of block transfers. We 
> have {{blockTransferRateBytes}} to tell us the rate of _bytes_, but no metric 
> to tell us the rate of _blocks_, which is especially relevant when running 
> the ESS on HDDs that are sensitive to random reads. Many small block 
> transfers can have a negative impact on performance, but won't show up as a 
> spike in {{blockTransferRateBytes}} since the sizes are small.
> We can also enhance {{YarnShuffleServiceMetrics}} to expose histogram-style 
> metrics from the {{Timer}} instances within {{ExternalBlockHandler}} -- today 
> it is only exposing the count and rate, but not timing information from the 
> {{Snapshot}}.
> These two changes can make it easier to monitor the health of the ESS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35258) Enhance ESS ExternalBlockHandler with additional block rate-based metrics and histograms

2021-06-28 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-35258:
---

Assignee: Erik Krogen

> Enhance ESS ExternalBlockHandler with additional block rate-based metrics and 
> histograms
> 
>
> Key: SPARK-35258
> URL: https://issues.apache.org/jira/browse/SPARK-35258
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, YARN
>Affects Versions: 3.1.1
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
>
> Today the {{ExternalBlockHandler}} component of ESS exposes some useful 
> metrics, but is lacking around metrics for the rate of block transfers. We 
> have {{blockTransferRateBytes}} to tell us the rate of _bytes_, but no metric 
> to tell us the rate of _blocks_, which is especially relevant when running 
> the ESS on HDDs that are sensitive to random reads. Many small block 
> transfers can have a negative impact on performance, but won't show up as a 
> spike in {{blockTransferRateBytes}} since the sizes are small.
> We can also enhance {{YarnShuffleServiceMetrics}} to expose histogram-style 
> metrics from the {{Timer}} instances within {{ExternalBlockHandler}} -- today 
> it is only exposing the count and rate, but not timing information from the 
> {{Snapshot}}.
> These two changes can make it easier to monitor the health of the ESS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35064) Group exception messages in spark/sql (catalyst)

2021-06-28 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-35064:
---

Assignee: dgd_contributor

> Group exception messages in spark/sql (catalyst)
> 
>
> Key: SPARK-35064
> URL: https://issues.apache.org/jira/browse/SPARK-35064
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Assignee: dgd_contributor
>Priority: Major
>
> Group all errors in sql/catalyst/src/main/scala/org/apache/spark/sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35064) Group exception messages in spark/sql (catalyst)

2021-06-28 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-35064.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32916
[https://github.com/apache/spark/pull/32916]

> Group exception messages in spark/sql (catalyst)
> 
>
> Key: SPARK-35064
> URL: https://issues.apache.org/jira/browse/SPARK-35064
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Assignee: dgd_contributor
>Priority: Major
> Fix For: 3.2.0
>
>
> Group all errors in sql/catalyst/src/main/scala/org/apache/spark/sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35802) Error loading the stages/stage/ page in spark UI

2021-06-28 Thread Kousuke Saruta (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370448#comment-17370448
 ] 

Kousuke Saruta commented on SPARK-35802:


[~Heltman]According to the URL, I guess you run your application on YARN.
I ran spark-shell on YARN with Spark 3.1.2 but this issue did't happen...
Could you narrow down the condition to reproduce this issue?

> Error loading the stages/stage/ page in spark UI
> 
>
> Key: SPARK-35802
> URL: https://issues.apache.org/jira/browse/SPARK-35802
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0, 3.0.1, 3.1.1, 3.1.2
>Reporter: Helt Long
>Priority: Major
> Attachments: spark3.1.2-request-20210628093538.png, 
> spark3.1.2-stage-20210628093549.png, spark3.1.2-webui-20210628093559.png
>
>
> I try to load the sparkUI page for a specific stage, I get the following 
> error:
> {quote}Unable to connect to the server. Looks like the Spark application must 
> have ended. Please Switch to the history UI.
> {quote}
> Obviously the server is still alive and process new messages.
> Looking at the network tab shows one of the requests fails:
>  
> {{curl 
> 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable'
> 
> 
> 
> Error 500 Request failed.
> 
> HTTP ERROR 500
> Problem accessing 
> /api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable. Reason:
> Request failed. href="http://eclipse.org/jetty;>Powered by Jetty:// 9.4.z-SNAPSHOT
> 
> }}
> requests to any other object that I've tested seem to work, for example
>  
> {{curl 
> 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskSummary'}}
>  
> The exception is:
> {{/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable
> javax.servlet.ServletException: java.lang.NullPointerException
> at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410)
> at 
> org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)
> at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366)
> at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319)
> at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
> at 
> org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873)
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623)
> at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
> at 
> org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
> at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
> at 
> org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
> at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
> at 
> org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
> at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
> at 
> org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
> at 
> org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
> at 
> org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753)
> at 
> org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
> at 
> org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> at org.sparkproject.jetty.server.Server.handle(Server.java:505)
> at org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:370)
> at 
> org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
> at 
> org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
> at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:103)
> at 
> org.sparkproject.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
> at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
> at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
> at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
> at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
> at 
> 

[jira] [Reopened] (SPARK-35802) Error loading the stages/stage/ page in spark UI

2021-06-28 Thread Kousuke Saruta (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta reopened SPARK-35802:


> Error loading the stages/stage/ page in spark UI
> 
>
> Key: SPARK-35802
> URL: https://issues.apache.org/jira/browse/SPARK-35802
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0, 3.0.1, 3.1.1, 3.1.2
>Reporter: Helt Long
>Priority: Major
> Attachments: spark3.1.2-request-20210628093538.png, 
> spark3.1.2-stage-20210628093549.png, spark3.1.2-webui-20210628093559.png
>
>
> I try to load the sparkUI page for a specific stage, I get the following 
> error:
> {quote}Unable to connect to the server. Looks like the Spark application must 
> have ended. Please Switch to the history UI.
> {quote}
> Obviously the server is still alive and process new messages.
> Looking at the network tab shows one of the requests fails:
>  
> {{curl 
> 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable'
> 
> 
> 
> Error 500 Request failed.
> 
> HTTP ERROR 500
> Problem accessing 
> /api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable. Reason:
> Request failed. href="http://eclipse.org/jetty;>Powered by Jetty:// 9.4.z-SNAPSHOT
> 
> }}
> requests to any other object that I've tested seem to work, for example
>  
> {{curl 
> 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskSummary'}}
>  
> The exception is:
> {{/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable
> javax.servlet.ServletException: java.lang.NullPointerException
> at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410)
> at 
> org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)
> at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366)
> at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319)
> at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
> at 
> org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873)
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623)
> at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
> at 
> org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
> at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
> at 
> org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
> at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
> at 
> org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
> at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
> at 
> org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
> at 
> org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
> at 
> org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753)
> at 
> org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
> at 
> org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> at org.sparkproject.jetty.server.Server.handle(Server.java:505)
> at org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:370)
> at 
> org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
> at 
> org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
> at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:103)
> at 
> org.sparkproject.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
> at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
> at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
> at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
> at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
> at 
> org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
> at 
> org.sparkproject.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:698)
> at 
>