[jira] [Commented] (SPARK-35924) Add Java 17 ea build test to GitHub action
[ https://issues.apache.org/jira/browse/SPARK-35924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371076#comment-17371076 ] Apache Spark commented on SPARK-35924: -- User 'williamhyun' has created a pull request for this issue: https://github.com/apache/spark/pull/33126 > Add Java 17 ea build test to GitHub action > -- > > Key: SPARK-35924 > URL: https://issues.apache.org/jira/browse/SPARK-35924 > Project: Spark > Issue Type: Improvement > Components: Build, Tests >Affects Versions: 3.2.0 >Reporter: William Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35924) Add Java 17 ea build test to GitHub action
[ https://issues.apache.org/jira/browse/SPARK-35924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35924: Assignee: Apache Spark > Add Java 17 ea build test to GitHub action > -- > > Key: SPARK-35924 > URL: https://issues.apache.org/jira/browse/SPARK-35924 > Project: Spark > Issue Type: Improvement > Components: Build, Tests >Affects Versions: 3.2.0 >Reporter: William Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35924) Add Java 17 ea build test to GitHub action
[ https://issues.apache.org/jira/browse/SPARK-35924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35924: Assignee: (was: Apache Spark) > Add Java 17 ea build test to GitHub action > -- > > Key: SPARK-35924 > URL: https://issues.apache.org/jira/browse/SPARK-35924 > Project: Spark > Issue Type: Improvement > Components: Build, Tests >Affects Versions: 3.2.0 >Reporter: William Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35924) Add Java 17 ea build test to GitHub action
[ https://issues.apache.org/jira/browse/SPARK-35924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] William Hyun updated SPARK-35924: - Component/s: Tests > Add Java 17 ea build test to GitHub action > -- > > Key: SPARK-35924 > URL: https://issues.apache.org/jira/browse/SPARK-35924 > Project: Spark > Issue Type: Improvement > Components: Build, Tests >Affects Versions: 3.2.0 >Reporter: William Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35924) Add Java 17 ea build test to GitHub action
William Hyun created SPARK-35924: Summary: Add Java 17 ea build test to GitHub action Key: SPARK-35924 URL: https://issues.apache.org/jira/browse/SPARK-35924 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.2.0 Reporter: William Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35483) Add a new GA test job for the docker integration tests
[ https://issues.apache.org/jira/browse/SPARK-35483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371063#comment-17371063 ] Apache Spark commented on SPARK-35483: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/33125 > Add a new GA test job for the docker integration tests > -- > > Key: SPARK-35483 > URL: https://issues.apache.org/jira/browse/SPARK-35483 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.0.2, 3.1.1, 3.2.0 >Reporter: Takeshi Yamamuro >Assignee: Kousuke Saruta >Priority: Major > Fix For: 3.2.0 > > > This ticket proposes to add a new GA test job for the integration tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35483) Add a new GA test job for the docker integration tests
[ https://issues.apache.org/jira/browse/SPARK-35483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371062#comment-17371062 ] Apache Spark commented on SPARK-35483: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/33125 > Add a new GA test job for the docker integration tests > -- > > Key: SPARK-35483 > URL: https://issues.apache.org/jira/browse/SPARK-35483 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.0.2, 3.1.1, 3.2.0 >Reporter: Takeshi Yamamuro >Assignee: Kousuke Saruta >Priority: Major > Fix For: 3.2.0 > > > This ticket proposes to add a new GA test job for the integration tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35922) Upgrade maven-shade-plugin to 3.2.4
[ https://issues.apache.org/jira/browse/SPARK-35922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-35922. --- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33122 [https://github.com/apache/spark/pull/33122] > Upgrade maven-shade-plugin to 3.2.4 > --- > > Key: SPARK-35922 > URL: https://issues.apache.org/jira/browse/SPARK-35922 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35922) Upgrade maven-shade-plugin to 3.2.4
[ https://issues.apache.org/jira/browse/SPARK-35922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-35922: - Assignee: Dongjoon Hyun > Upgrade maven-shade-plugin to 3.2.4 > --- > > Key: SPARK-35922 > URL: https://issues.apache.org/jira/browse/SPARK-35922 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34302) Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework
[ https://issues.apache.org/jira/browse/SPARK-34302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371041#comment-17371041 ] Apache Spark commented on SPARK-34302: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/33124 > Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework > > > Key: SPARK-34302 > URL: https://issues.apache.org/jira/browse/SPARK-34302 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Terry Kim >Priority: Major > Fix For: 3.2.0 > > > # Create the Command logical node for ALTER TABLE .. CHANGE COLUMN > # Remove AlterTableAlterColumnStatement > # Remove the check verifyAlterTableType() from run() -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34302) Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework
[ https://issues.apache.org/jira/browse/SPARK-34302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371040#comment-17371040 ] Apache Spark commented on SPARK-34302: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/33124 > Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework > > > Key: SPARK-34302 > URL: https://issues.apache.org/jira/browse/SPARK-34302 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Terry Kim >Priority: Major > Fix For: 3.2.0 > > > # Create the Command logical node for ALTER TABLE .. CHANGE COLUMN > # Remove AlterTableAlterColumnStatement > # Remove the check verifyAlterTableType() from run() -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35920) Upgrade to Chill 0.10.0
[ https://issues.apache.org/jira/browse/SPARK-35920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-35920. --- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33119 [https://github.com/apache/spark/pull/33119] > Upgrade to Chill 0.10.0 > --- > > Key: SPARK-35920 > URL: https://issues.apache.org/jira/browse/SPARK-35920 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35920) Upgrade to Chill 0.10.0
[ https://issues.apache.org/jira/browse/SPARK-35920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-35920: - Assignee: Dongjoon Hyun > Upgrade to Chill 0.10.0 > --- > > Key: SPARK-35920 > URL: https://issues.apache.org/jira/browse/SPARK-35920 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35923) Coalesce empty partition with mixed CoalescedPartitionSpec and PartialReducerPartitionSpec
[ https://issues.apache.org/jira/browse/SPARK-35923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35923: Assignee: (was: Apache Spark) > Coalesce empty partition with mixed CoalescedPartitionSpec and > PartialReducerPartitionSpec > -- > > Key: SPARK-35923 > URL: https://issues.apache.org/jira/browse/SPARK-35923 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: XiDuo You >Priority: Major > > Since [SPARK-35447](https://issues.apache.org/jira/browse/SPARK-35447), we > apply `OptimizeSkewedJoin` before `CoalesceShufflePartitions`. However, There > are something different with the order of these two rules. > Let's say if we have a skewed partitions: [0, 128MB, 0, 128MB, 0]: > # coalesce partitions first then optimize skewed partitions: > [64MB, 64MB, 64MB, 64MB] > # optimize skew partition first then coalesce partitions: > [0, 64MB, 64MB, 0, 64MB, 64MB, 0] > So we can do coalesce in ShufflePartitionsUtil.coalescePartitionsWithSkew > with mixed CoalescedPartitionSpec and PartialReducerPartitionSpec if > CoalescedPartitionSpec is empty. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35923) Coalesce empty partition with mixed CoalescedPartitionSpec and PartialReducerPartitionSpec
[ https://issues.apache.org/jira/browse/SPARK-35923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35923: Assignee: Apache Spark > Coalesce empty partition with mixed CoalescedPartitionSpec and > PartialReducerPartitionSpec > -- > > Key: SPARK-35923 > URL: https://issues.apache.org/jira/browse/SPARK-35923 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: XiDuo You >Assignee: Apache Spark >Priority: Major > > Since [SPARK-35447](https://issues.apache.org/jira/browse/SPARK-35447), we > apply `OptimizeSkewedJoin` before `CoalesceShufflePartitions`. However, There > are something different with the order of these two rules. > Let's say if we have a skewed partitions: [0, 128MB, 0, 128MB, 0]: > # coalesce partitions first then optimize skewed partitions: > [64MB, 64MB, 64MB, 64MB] > # optimize skew partition first then coalesce partitions: > [0, 64MB, 64MB, 0, 64MB, 64MB, 0] > So we can do coalesce in ShufflePartitionsUtil.coalescePartitionsWithSkew > with mixed CoalescedPartitionSpec and PartialReducerPartitionSpec if > CoalescedPartitionSpec is empty. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35923) Coalesce empty partition with mixed CoalescedPartitionSpec and PartialReducerPartitionSpec
[ https://issues.apache.org/jira/browse/SPARK-35923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371038#comment-17371038 ] Apache Spark commented on SPARK-35923: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/33123 > Coalesce empty partition with mixed CoalescedPartitionSpec and > PartialReducerPartitionSpec > -- > > Key: SPARK-35923 > URL: https://issues.apache.org/jira/browse/SPARK-35923 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: XiDuo You >Priority: Major > > Since [SPARK-35447](https://issues.apache.org/jira/browse/SPARK-35447), we > apply `OptimizeSkewedJoin` before `CoalesceShufflePartitions`. However, There > are something different with the order of these two rules. > Let's say if we have a skewed partitions: [0, 128MB, 0, 128MB, 0]: > # coalesce partitions first then optimize skewed partitions: > [64MB, 64MB, 64MB, 64MB] > # optimize skew partition first then coalesce partitions: > [0, 64MB, 64MB, 0, 64MB, 64MB, 0] > So we can do coalesce in ShufflePartitionsUtil.coalescePartitionsWithSkew > with mixed CoalescedPartitionSpec and PartialReducerPartitionSpec if > CoalescedPartitionSpec is empty. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35923) Coalesce empty partition with mixed CoalescedPartitionSpec and PartialReducerPartitionSpec
XiDuo You created SPARK-35923: - Summary: Coalesce empty partition with mixed CoalescedPartitionSpec and PartialReducerPartitionSpec Key: SPARK-35923 URL: https://issues.apache.org/jira/browse/SPARK-35923 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: XiDuo You Since [SPARK-35447](https://issues.apache.org/jira/browse/SPARK-35447), we apply `OptimizeSkewedJoin` before `CoalesceShufflePartitions`. However, There are something different with the order of these two rules. Let's say if we have a skewed partitions: [0, 128MB, 0, 128MB, 0]: # coalesce partitions first then optimize skewed partitions: [64MB, 64MB, 64MB, 64MB] # optimize skew partition first then coalesce partitions: [0, 64MB, 64MB, 0, 64MB, 64MB, 0] So we can do coalesce in ShufflePartitionsUtil.coalescePartitionsWithSkew with mixed CoalescedPartitionSpec and PartialReducerPartitionSpec if CoalescedPartitionSpec is empty. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35876) array_zip unexpected column names
[ https://issues.apache.org/jira/browse/SPARK-35876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-35876: Assignee: Kousuke Saruta > array_zip unexpected column names > - > > Key: SPARK-35876 > URL: https://issues.apache.org/jira/browse/SPARK-35876 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.2 >Reporter: Derk Crezee >Assignee: Kousuke Saruta >Priority: Major > > {{When I'm using the array_zip function in combination with renamed columns, > I get an unexpected schema written to disk.}} > {code:java} > // code placeholder > from pyspark.sql import * > from pyspark.sql.functions import * > spark = SparkSession.builder.getOrCreate() > data = [ > Row(a1=["a", "a"], b1=["b", "b"]), > ] > df = ( > spark.sparkContext.parallelize(data).toDF() > .withColumnRenamed("a1", "a2") > .withColumnRenamed("b1", "b2") > .withColumn("zipped", arrays_zip(col("a2"), col("b2"))) > ) > df.printSchema() > // root > // |-- a2: array (nullable = true) > // ||-- element: string (containsNull = true) > // |-- b2: array (nullable = true) > // ||-- element: string (containsNull = true) > // |-- zipped: array (nullable = true) > // ||-- element: struct (containsNull = false) > // |||-- a2: string (nullable = true) > // |||-- b2: string (nullable = true) > df.write.save("test.parquet") > spark.read.load("test.parquet").printSchema() > // root > // |-- a2: array (nullable = true) > // ||-- element: string (containsNull = true) > // |-- b2: array (nullable = true) > // ||-- element: string (containsNull = true) > // |-- zipped: array (nullable = true) > // ||-- element: struct (containsNull = true) > // |||-- a1: string (nullable = true) > // |||-- b1: string (nullable = true){code} > I would expect the schema of the DataFrame written to disk to be the same as > that printed out. It seems that instead of using the renamed version of the > column names, it uses the old column names. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35876) array_zip unexpected column names
[ https://issues.apache.org/jira/browse/SPARK-35876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-35876. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33106 [https://github.com/apache/spark/pull/33106] > array_zip unexpected column names > - > > Key: SPARK-35876 > URL: https://issues.apache.org/jira/browse/SPARK-35876 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.2 >Reporter: Derk Crezee >Assignee: Kousuke Saruta >Priority: Major > Fix For: 3.2.0 > > > {{When I'm using the array_zip function in combination with renamed columns, > I get an unexpected schema written to disk.}} > {code:java} > // code placeholder > from pyspark.sql import * > from pyspark.sql.functions import * > spark = SparkSession.builder.getOrCreate() > data = [ > Row(a1=["a", "a"], b1=["b", "b"]), > ] > df = ( > spark.sparkContext.parallelize(data).toDF() > .withColumnRenamed("a1", "a2") > .withColumnRenamed("b1", "b2") > .withColumn("zipped", arrays_zip(col("a2"), col("b2"))) > ) > df.printSchema() > // root > // |-- a2: array (nullable = true) > // ||-- element: string (containsNull = true) > // |-- b2: array (nullable = true) > // ||-- element: string (containsNull = true) > // |-- zipped: array (nullable = true) > // ||-- element: struct (containsNull = false) > // |||-- a2: string (nullable = true) > // |||-- b2: string (nullable = true) > df.write.save("test.parquet") > spark.read.load("test.parquet").printSchema() > // root > // |-- a2: array (nullable = true) > // ||-- element: string (containsNull = true) > // |-- b2: array (nullable = true) > // ||-- element: string (containsNull = true) > // |-- zipped: array (nullable = true) > // ||-- element: struct (containsNull = true) > // |||-- a1: string (nullable = true) > // |||-- b1: string (nullable = true){code} > I would expect the schema of the DataFrame written to disk to be the same as > that printed out. It seems that instead of using the renamed version of the > column names, it uses the old column names. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35922) Upgrade maven-shade-plugin to 3.2.4
[ https://issues.apache.org/jira/browse/SPARK-35922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35922: Assignee: (was: Apache Spark) > Upgrade maven-shade-plugin to 3.2.4 > --- > > Key: SPARK-35922 > URL: https://issues.apache.org/jira/browse/SPARK-35922 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35922) Upgrade maven-shade-plugin to 3.2.4
[ https://issues.apache.org/jira/browse/SPARK-35922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35922: Assignee: (was: Apache Spark) > Upgrade maven-shade-plugin to 3.2.4 > --- > > Key: SPARK-35922 > URL: https://issues.apache.org/jira/browse/SPARK-35922 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35922) Upgrade maven-shade-plugin to 3.2.4
[ https://issues.apache.org/jira/browse/SPARK-35922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371014#comment-17371014 ] Apache Spark commented on SPARK-35922: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/33122 > Upgrade maven-shade-plugin to 3.2.4 > --- > > Key: SPARK-35922 > URL: https://issues.apache.org/jira/browse/SPARK-35922 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35922) Upgrade maven-shade-plugin to 3.2.4
[ https://issues.apache.org/jira/browse/SPARK-35922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35922: Assignee: Apache Spark > Upgrade maven-shade-plugin to 3.2.4 > --- > > Key: SPARK-35922 > URL: https://issues.apache.org/jira/browse/SPARK-35922 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35922) Upgrade maven-shade-plugin to 3.2.4
Dongjoon Hyun created SPARK-35922: - Summary: Upgrade maven-shade-plugin to 3.2.4 Key: SPARK-35922 URL: https://issues.apache.org/jira/browse/SPARK-35922 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.2.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34537) Repartition miss/duplicated data
[ https://issues.apache.org/jira/browse/SPARK-34537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu resolved SPARK-34537. --- Resolution: Not A Problem > Repartition miss/duplicated data > > > Key: SPARK-34537 > URL: https://issues.apache.org/jira/browse/SPARK-34537 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: angerszhu >Priority: Major > Attachments: image-2021-02-25-19-43-49-687.png, > image-2021-02-25-19-46-52-809.png, image-2021-02-25-19-47-10-005.png > > > We have a SQL > {code:java} > INSERT OVERWRITE TABLE t1 > SELECT /*+ repartition(300) */ * from t2.{code} > Below is SQL metrics of the repartition ShuffleExchange. we can see that the > shuffle record written and records read is not same. > In the result table, there are some data missing and some data duplicated. > !image-2021-02-25-19-43-49-687.png! > !image-2021-02-25-19-46-52-809.png|width=408,height=654!!image-2021-02-25-19-47-10-005.png|width=282,height=414! > We can see that *InsertIntoHadoopFsRelationCommand's output is save as > repartition Exchange's record read(reducer side)* > *and repartition Exchange's shuffle record written (mapper side written) is > same as Filter's output.* > *So we can see that repartition's Exchange return wrong data.* > > *In our env, AQE and speculation is open.* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35914) Driver can't distribute task to executor because NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-35914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371009#comment-17371009 ] Helt Long commented on SPARK-35914: --- I guess this problem is related to hadoop-version, I use CDH-5.7.1:hadoop-2.6.5, spark3 use hadoop-2.7. Because the other problem I found in spark about webui, it's caused by the version. I will try higher hadoop-version to confirm it. [SPARK-35802] Error loading the stages/stage/ page in spark UI - ASF JIRA (apache.org) > Driver can't distribute task to executor because NullPointerException > - > > Key: SPARK-35914 > URL: https://issues.apache.org/jira/browse/SPARK-35914 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1, 3.1.1, 3.1.2 > Environment: CDH 5.7.1: Hadoop 2.6.5 > Spark 3.0.1, 3.1.1, 3.1.2 >Reporter: Helt Long >Priority: Major > Attachments: stuck log.png, webui stuck.png > > > When use spark3 submit a spark job to yarn cluster, I get a problem. Once in > a while, driver can't distribute any tasks to any executors, and the stage > will stuck , the total spark job will stuck. Check driver log, I found > NullPointerException. It's like a netty problem, I can confirm this problem > only exist in spark3, because I use spark2 never happend. > > {code:java} > // Error message > 21/06/28 14:42:43 INFO TaskSetManager: Starting task 2592.0 in stage 1.0 (TID > 3494) (worker39.hadoop, executor 84, partition 2592, RACK_LOCAL, 5006 bytes) > taskResourceAssignments Map() > 21/06/28 14:42:43 INFO TaskSetManager: Finished task 4155.0 in stage 1.0 (TID > 3367) in 36670 ms on worker39.hadoop (executor 84) (3278/4249) > 21/06/28 14:42:43 INFO TaskSetManager: Finished task 2283.0 in stage 1.0 (TID > 3422) in 22371 ms on worker15.hadoop (executor 109) (3279/4249) > 21/06/28 14:42:43 ERROR Inbox: Ignoring error > java.lang.NullPointerException > at java.lang.String.length(String.java:623) > at > java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:420) > at java.lang.StringBuilder.append(StringBuilder.java:136) > at > org.apache.spark.scheduler.TaskSetManager.$anonfun$resourceOffer$5(TaskSetManager.scala:483) > at org.apache.spark.internal.Logging.logInfo(Logging.scala:57) > at org.apache.spark.internal.Logging.logInfo$(Logging.scala:56) > at > org.apache.spark.scheduler.TaskSetManager.logInfo(TaskSetManager.scala:54) > at > org.apache.spark.scheduler.TaskSetManager.$anonfun$resourceOffer$2(TaskSetManager.scala:484) > at scala.Option.map(Option.scala:230) > at > org.apache.spark.scheduler.TaskSetManager.resourceOffer(TaskSetManager.scala:444) > at > org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$2(TaskSchedulerImpl.scala:397) > at > org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$2$adapted(TaskSchedulerImpl.scala:392) > at scala.Option.foreach(Option.scala:407) > at > org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$1(TaskSchedulerImpl.scala:392) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) > at > org.apache.spark.scheduler.TaskSchedulerImpl.resourceOfferSingleTaskSet(TaskSchedulerImpl.scala:383) > at > org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$20(TaskSchedulerImpl.scala:581) > at > org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$20$adapted(TaskSchedulerImpl.scala:576) > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) > at > org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$16(TaskSchedulerImpl.scala:576) > at > org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$16$adapted(TaskSchedulerImpl.scala:547) > at > scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at > scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at > org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:547) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.$anonfun$makeOffers$5(CoarseGrainedSchedulerBackend.scala:340) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.org$apache$spark$scheduler$cluster$CoarseGrainedSchedulerBackend$$withLock(CoarseGrainedSchedulerBackend.scala:904) > at >
[jira] [Assigned] (SPARK-35921) ${spark.yarn.isHadoopProvided} in config.properties is not edited if build with SBT
[ https://issues.apache.org/jira/browse/SPARK-35921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35921: Assignee: Apache Spark (was: Kousuke Saruta) > ${spark.yarn.isHadoopProvided} in config.properties is not edited if build > with SBT > --- > > Key: SPARK-35921 > URL: https://issues.apache.org/jira/browse/SPARK-35921 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Minor > > yarn sub-module contains config.properties. > {code} > spark.yarn.isHadoopProvided = ${spark.yarn.isHadoopProvided} > {code} > The ${spark.yarn.isHadoopProvided} part is replaced with true or false in > build depending on whether Hadoop is provided or not (specified by > -Phadoop-provided). > The edited config.properties will be loaded at runtime to control how to > populate Hadoop-related classpath. > If we build with Maven, these process works but doesn't with SBT. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35921) ${spark.yarn.isHadoopProvided} in config.properties is not edited if build with SBT
[ https://issues.apache.org/jira/browse/SPARK-35921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35921: Assignee: Kousuke Saruta (was: Apache Spark) > ${spark.yarn.isHadoopProvided} in config.properties is not edited if build > with SBT > --- > > Key: SPARK-35921 > URL: https://issues.apache.org/jira/browse/SPARK-35921 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > yarn sub-module contains config.properties. > {code} > spark.yarn.isHadoopProvided = ${spark.yarn.isHadoopProvided} > {code} > The ${spark.yarn.isHadoopProvided} part is replaced with true or false in > build depending on whether Hadoop is provided or not (specified by > -Phadoop-provided). > The edited config.properties will be loaded at runtime to control how to > populate Hadoop-related classpath. > If we build with Maven, these process works but doesn't with SBT. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35921) ${spark.yarn.isHadoopProvided} in config.properties is not edited if build with SBT
[ https://issues.apache.org/jira/browse/SPARK-35921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371007#comment-17371007 ] Apache Spark commented on SPARK-35921: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/33121 > ${spark.yarn.isHadoopProvided} in config.properties is not edited if build > with SBT > --- > > Key: SPARK-35921 > URL: https://issues.apache.org/jira/browse/SPARK-35921 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > yarn sub-module contains config.properties. > {code} > spark.yarn.isHadoopProvided = ${spark.yarn.isHadoopProvided} > {code} > The ${spark.yarn.isHadoopProvided} part is replaced with true or false in > build depending on whether Hadoop is provided or not (specified by > -Phadoop-provided). > The edited config.properties will be loaded at runtime to control how to > populate Hadoop-related classpath. > If we build with Maven, these process works but doesn't with SBT. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35802) Error loading the stages/stage/ page in spark UI
[ https://issues.apache.org/jira/browse/SPARK-35802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Helt Long resolved SPARK-35802. --- Resolution: Not A Bug I try hadoop-2.7.5, the problem is not existed. So I confirm it's because I used hadoop-2.6.5. Sorry for this, I closed It. > Error loading the stages/stage/ page in spark UI > > > Key: SPARK-35802 > URL: https://issues.apache.org/jira/browse/SPARK-35802 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0, 3.0.1, 3.1.1, 3.1.2 > Environment: CDH 5.7.1: Hadoop 2.6.5 > Spark on yarn cluster mode >Reporter: Helt Long >Priority: Major > Attachments: spark3.1.2-request-20210628093538.png, > spark3.1.2-stage-20210628093549.png, spark3.1.2-webui-20210628093559.png > > > I try to load the sparkUI page for a specific stage, I get the following > error: > {quote}Unable to connect to the server. Looks like the Spark application must > have ended. Please Switch to the history UI. > {quote} > Obviously the server is still alive and process new messages. > Looking at the network tab shows one of the requests fails: > > {{curl > 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable' > > > > Error 500 Request failed. > > HTTP ERROR 500 > Problem accessing > /api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable. Reason: > Request failed. href="http://eclipse.org/jetty;>Powered by Jetty:// 9.4.z-SNAPSHOT > > }} > requests to any other object that I've tested seem to work, for example > > {{curl > 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskSummary'}} > > The exception is: > {{/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable > javax.servlet.ServletException: java.lang.NullPointerException > at > org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410) > at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205) > at > org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873) > at > org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623) > at > org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) > at > org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610) > at > org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) > at > org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) > at > org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) > at > org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247) > at > org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) > at > org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753) > at > org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220) > at > org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) > at org.sparkproject.jetty.server.Server.handle(Server.java:505) > at org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:370) > at > org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:267) > at > org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) > at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:103) > at org.sparkproject.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) > at > org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) > at >
[jira] [Assigned] (SPARK-34302) Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework
[ https://issues.apache.org/jira/browse/SPARK-34302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-34302: --- Assignee: Terry Kim (was: Max Gekk) > Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework > > > Key: SPARK-34302 > URL: https://issues.apache.org/jira/browse/SPARK-34302 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Terry Kim >Priority: Major > Fix For: 3.2.0 > > > # Create the Command logical node for ALTER TABLE .. CHANGE COLUMN > # Remove AlterTableAlterColumnStatement > # Remove the check verifyAlterTableType() from run() -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34302) Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework
[ https://issues.apache.org/jira/browse/SPARK-34302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-34302. - Resolution: Fixed Issue resolved by pull request 33113 [https://github.com/apache/spark/pull/33113] > Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework > > > Key: SPARK-34302 > URL: https://issues.apache.org/jira/browse/SPARK-34302 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.2.0 > > > # Create the Command logical node for ALTER TABLE .. CHANGE COLUMN > # Remove AlterTableAlterColumnStatement > # Remove the check verifyAlterTableType() from run() -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35888) Add dataSize field in CoalescedPartitionSpec
[ https://issues.apache.org/jira/browse/SPARK-35888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-35888. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33079 [https://github.com/apache/spark/pull/33079] > Add dataSize field in CoalescedPartitionSpec > > > Key: SPARK-35888 > URL: https://issues.apache.org/jira/browse/SPARK-35888 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Major > Fix For: 3.2.0 > > > Currently, all test suite about `CoalescedPartitionSpec` do not check the > data size due to it doesn't contains data size field. > We can add data size in `CoalescedPartitionSpec` and then add test case for > better coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35888) Add dataSize field in CoalescedPartitionSpec
[ https://issues.apache.org/jira/browse/SPARK-35888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-35888: --- Assignee: XiDuo You > Add dataSize field in CoalescedPartitionSpec > > > Key: SPARK-35888 > URL: https://issues.apache.org/jira/browse/SPARK-35888 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Major > > Currently, all test suite about `CoalescedPartitionSpec` do not check the > data size due to it doesn't contains data size field. > We can add data size in `CoalescedPartitionSpec` and then add test case for > better coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35921) ${spark.yarn.isHadoopProvided} in config.properties is not edited if build with SBT
[ https://issues.apache.org/jira/browse/SPARK-35921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-35921: --- Summary: ${spark.yarn.isHadoopProvided} in config.properties is not edited if build with SBT (was: The value of spark.yarn.isHadoopProvided property in config.properties is not edited if build with SBT) > ${spark.yarn.isHadoopProvided} in config.properties is not edited if build > with SBT > --- > > Key: SPARK-35921 > URL: https://issues.apache.org/jira/browse/SPARK-35921 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > yarn sub-module contains config.properties. > {code} > spark.yarn.isHadoopProvided = ${spark.yarn.isHadoopProvided} > {code} > The ${spark.yarn.isHadoopProvided} part is replaced with true or false in > build depending on whether Hadoop is provided or not (specified by > -Phadoop-provided). > The edited config.properties will be loaded at runtime to control how to > populate Hadoop-related classpath. > If we build with Maven, these process works but doesn't with SBT. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35921) The value of spark.yarn.isHadoopProvided property in config.properties is not edited if build with SBT
Kousuke Saruta created SPARK-35921: -- Summary: The value of spark.yarn.isHadoopProvided property in config.properties is not edited if build with SBT Key: SPARK-35921 URL: https://issues.apache.org/jira/browse/SPARK-35921 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.2.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta yarn sub-module contains config.properties. {code} spark.yarn.isHadoopProvided = ${spark.yarn.isHadoopProvided} {code} The ${spark.yarn.isHadoopProvided} part is replaced with true or false in build depending on whether Hadoop is provided or not (specified by -Phadoop-provided). The edited config.properties will be loaded at runtime to control how to populate Hadoop-related classpath. If we build with Maven, these process works but doesn't with SBT. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26764) [SPIP] Spark Relational Cache
[ https://issues.apache.org/jira/browse/SPARK-26764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370993#comment-17370993 ] Adrian Wang commented on SPARK-26764: - [~zshao] Thanks for the interest. We created an open-source plugin: [https://github.com/alibaba/SparkCube], to demonstrate the basic ideas. > [SPIP] Spark Relational Cache > - > > Key: SPARK-26764 > URL: https://issues.apache.org/jira/browse/SPARK-26764 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Adrian Wang >Priority: Major > Attachments: Relational+Cache+SPIP.pdf > > > In modern database systems, relational cache is a common technology to boost > ad-hoc queries. While Spark provides cache natively, Spark SQL should be able > to utilize the relationship between relations to boost all possible queries. > In this SPIP, we will make Spark be able to utilize all defined cached > relations if possible, without explicit substitution in user query, as well > as keep some user defined cache available in different sessions. Materialized > views in many database systems provide similar function. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35899) Add a utility to convert connector expressions to Catalyst expressions
[ https://issues.apache.org/jira/browse/SPARK-35899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370991#comment-17370991 ] Apache Spark commented on SPARK-35899: -- User 'aokolnychyi' has created a pull request for this issue: https://github.com/apache/spark/pull/33120 > Add a utility to convert connector expressions to Catalyst expressions > -- > > Key: SPARK-35899 > URL: https://issues.apache.org/jira/browse/SPARK-35899 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Anton Okolnychyi >Assignee: Anton Okolnychyi >Priority: Major > Fix For: 3.2.0 > > > There are more and more places that require converting a v2 connector > expression to an internal Catalyst expression. We need to build a utility > method to avoid having the same logic in a lot of places. > See [here|https://github.com/apache/spark/pull/32921#discussion_r653129793]. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35899) Add a utility to convert connector expressions to Catalyst expressions
[ https://issues.apache.org/jira/browse/SPARK-35899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370990#comment-17370990 ] Apache Spark commented on SPARK-35899: -- User 'aokolnychyi' has created a pull request for this issue: https://github.com/apache/spark/pull/33120 > Add a utility to convert connector expressions to Catalyst expressions > -- > > Key: SPARK-35899 > URL: https://issues.apache.org/jira/browse/SPARK-35899 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Anton Okolnychyi >Assignee: Anton Okolnychyi >Priority: Major > Fix For: 3.2.0 > > > There are more and more places that require converting a v2 connector > expression to an internal Catalyst expression. We need to build a utility > method to avoid having the same logic in a lot of places. > See [here|https://github.com/apache/spark/pull/32921#discussion_r653129793]. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33898) Support SHOW CREATE TABLE in v2
[ https://issues.apache.org/jira/browse/SPARK-33898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-33898. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32931 [https://github.com/apache/spark/pull/32931] > Support SHOW CREATE TABLE in v2 > --- > > Key: SPARK-33898 > URL: https://issues.apache.org/jira/browse/SPARK-33898 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Kent Yao >Assignee: PengLei >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33898) Support SHOW CREATE TABLE in v2
[ https://issues.apache.org/jira/browse/SPARK-33898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-33898: Assignee: PengLei > Support SHOW CREATE TABLE in v2 > --- > > Key: SPARK-33898 > URL: https://issues.apache.org/jira/browse/SPARK-33898 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Kent Yao >Assignee: PengLei >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35344) Support creating a Column of numpy literal value in pandas-on-Spark
[ https://issues.apache.org/jira/browse/SPARK-35344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-35344. --- Fix Version/s: 3.2.0 Assignee: Xinrong Meng Resolution: Fixed Issue resolved by pull request 32955 https://github.com/apache/spark/pull/32955 > Support creating a Column of numpy literal value in pandas-on-Spark > --- > > Key: SPARK-35344 > URL: https://issues.apache.org/jira/browse/SPARK-35344 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.2.0 > > > Spark (`lit` function defined in `pyspark.sql.functions`) doesn't support > creating a Column out of numpy literal value. > So `lit` function defined in `pyspark.pandas.spark.functions` should be > adjusted in order to support that in pandas-on-Spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35920) Upgrade to Chill 0.10.0
[ https://issues.apache.org/jira/browse/SPARK-35920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370971#comment-17370971 ] Apache Spark commented on SPARK-35920: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/33119 > Upgrade to Chill 0.10.0 > --- > > Key: SPARK-35920 > URL: https://issues.apache.org/jira/browse/SPARK-35920 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35920) Upgrade to Chill 0.10.0
[ https://issues.apache.org/jira/browse/SPARK-35920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35920: Assignee: (was: Apache Spark) > Upgrade to Chill 0.10.0 > --- > > Key: SPARK-35920 > URL: https://issues.apache.org/jira/browse/SPARK-35920 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35920) Upgrade to Chill 0.10.0
[ https://issues.apache.org/jira/browse/SPARK-35920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370970#comment-17370970 ] Apache Spark commented on SPARK-35920: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/33119 > Upgrade to Chill 0.10.0 > --- > > Key: SPARK-35920 > URL: https://issues.apache.org/jira/browse/SPARK-35920 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35920) Upgrade to Chill 0.10.0
[ https://issues.apache.org/jira/browse/SPARK-35920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35920: Assignee: Apache Spark > Upgrade to Chill 0.10.0 > --- > > Key: SPARK-35920 > URL: https://issues.apache.org/jira/browse/SPARK-35920 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35920) Upgrade to Chill 0.10.0
Dongjoon Hyun created SPARK-35920: - Summary: Upgrade to Chill 0.10.0 Key: SPARK-35920 URL: https://issues.apache.org/jira/browse/SPARK-35920 Project: Spark Issue Type: Task Components: Build Affects Versions: 3.2.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35919) Support pathlib.PurePath-like objects in DataFrameReader / DataFrameWriter
Andrew Grigorev created SPARK-35919: --- Summary: Support pathlib.PurePath-like objects in DataFrameReader / DataFrameWriter Key: SPARK-35919 URL: https://issues.apache.org/jira/browse/SPARK-35919 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.1.2, 2.4.8 Reporter: Andrew Grigorev It would be nice to support Path objects in `spark.\{read,write}.\{parquet,orc,csv,...etc}` methods. Without pyspark source code changes it currently seems possible only by the ugly monkeypatching hacks - https://stackoverflow.com/q/68170685/2649222. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35917) Disable push-based shuffle until the feature is complete
[ https://issues.apache.org/jira/browse/SPARK-35917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35917: Assignee: Apache Spark > Disable push-based shuffle until the feature is complete > > > Key: SPARK-35917 > URL: https://issues.apache.org/jira/browse/SPARK-35917 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Chandni Singh >Assignee: Apache Spark >Priority: Major > > Push-based shuffle is partially merged in apache master but some of the tasks > are still incomplete. Since 3.2 is going to cut soon, we will not be able to > get the pending tasks reviewed and merged. Few of the pending tasks make > protocol changes to the push-based shuffle protocols, so we would like to > prevent users from enabling push-based shuffle both on the client and the > server until push-based shuffle implementation is complete. > We can prevent push-based shuffle to be used by throwing > {{UnsupportedOperationException}} (or something like that) both on the client > and the server when the user tries to enable it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35917) Disable push-based shuffle until the feature is complete
[ https://issues.apache.org/jira/browse/SPARK-35917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35917: Assignee: (was: Apache Spark) > Disable push-based shuffle until the feature is complete > > > Key: SPARK-35917 > URL: https://issues.apache.org/jira/browse/SPARK-35917 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Chandni Singh >Priority: Major > > Push-based shuffle is partially merged in apache master but some of the tasks > are still incomplete. Since 3.2 is going to cut soon, we will not be able to > get the pending tasks reviewed and merged. Few of the pending tasks make > protocol changes to the push-based shuffle protocols, so we would like to > prevent users from enabling push-based shuffle both on the client and the > server until push-based shuffle implementation is complete. > We can prevent push-based shuffle to be used by throwing > {{UnsupportedOperationException}} (or something like that) both on the client > and the server when the user tries to enable it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35917) Disable push-based shuffle until the feature is complete
[ https://issues.apache.org/jira/browse/SPARK-35917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370877#comment-17370877 ] Apache Spark commented on SPARK-35917: -- User 'otterc' has created a pull request for this issue: https://github.com/apache/spark/pull/33118 > Disable push-based shuffle until the feature is complete > > > Key: SPARK-35917 > URL: https://issues.apache.org/jira/browse/SPARK-35917 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Chandni Singh >Priority: Major > > Push-based shuffle is partially merged in apache master but some of the tasks > are still incomplete. Since 3.2 is going to cut soon, we will not be able to > get the pending tasks reviewed and merged. Few of the pending tasks make > protocol changes to the push-based shuffle protocols, so we would like to > prevent users from enabling push-based shuffle both on the client and the > server until push-based shuffle implementation is complete. > We can prevent push-based shuffle to be used by throwing > {{UnsupportedOperationException}} (or something like that) both on the client > and the server when the user tries to enable it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35881) [SQL] AQE does not support columnar execution for the final query stage
[ https://issues.apache.org/jira/browse/SPARK-35881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated SPARK-35881: --- Description: In AdaptiveSparkPlanExec, a query is broken down into stages and these stages are executed until the entire query has been executed. These stages can be row-based or columnar. However, the final stage, produced by the private getFinalPhysicalPlan method is always assumed to be row-based. The only way to execute the final stage is by calling the various doExecute methods on AdaptiveSparkPlanExec, and doExecuteColumnar is not implemented. The supportsColumnar method also always returns false. In the RAPIDS Accelerator for Apache Spark, we currently call the private getFinalPhysicalPlan method using reflection and then determine if that plan is columnar or not, and then call the appropriate doExecute method, bypassing the doExecute methods on AdaptiveSparkPlanExec. We would like a supported mechanism for executing a columnar AQE plan so that we do not need to use reflection. was: In AdaptiveSparkPlanExec, a query is broken down into stages and these stages are executed until the entire query has been executed. These stages can be row-based or columnar. However, the final stage, produced by the private getFinalPhysicalPlan method is always assumed to be row-based. The only way to execute the final stage is by calling the various doExecute methods on AdaptiveSparkPlanExec. The supportsColumnar method also always returns false, which is another limitation. However, AQE is special because we don't know if the final stage will be columnar or not until the child stages have been executed and the final stage has been re-planned and re-optimized, so we can't easily change the behavior of supportsColumnar. We can't just implement doExecuteColumnar because we don't know whether the final stage will be columnar oir not until after we start executing the query. In the RAPIDS Accelerator for Apache Spark, we currently call the private getFinalPhysicalPlan method using reflection and then determine if that plan is columnar or not, and then calling the appropriate doExecute method, bypassing the doExecute methods on AdaptiveSparkPlanExec. I propose that we make getFinalPhysicalPlan public, and part of the developer API, so that columnar plugins can call this method and determine if the final stage is columnar or not, and execute it appropriately. This would not affect any existing Spark functionality. We also need a mechanism for invoking finalPlanUpdate after the query has been executed. > [SQL] AQE does not support columnar execution for the final query stage > --- > > Key: SPARK-35881 > URL: https://issues.apache.org/jira/browse/SPARK-35881 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Andy Grove >Priority: Major > > In AdaptiveSparkPlanExec, a query is broken down into stages and these stages > are executed until the entire query has been executed. These stages can be > row-based or columnar. However, the final stage, produced by the private > getFinalPhysicalPlan method is always assumed to be row-based. The only way > to execute the final stage is by calling the various doExecute methods on > AdaptiveSparkPlanExec, and doExecuteColumnar is not implemented. The > supportsColumnar method also always returns false. > In the RAPIDS Accelerator for Apache Spark, we currently call the private > getFinalPhysicalPlan method using reflection and then determine if that plan > is columnar or not, and then call the appropriate doExecute method, bypassing > the doExecute methods on AdaptiveSparkPlanExec. We would like a supported > mechanism for executing a columnar AQE plan so that we do not need to use > reflection. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35918) Consolidate logic between AvroSerializer/AvroDeserializer for schema mismatch handling and error messages
Erik Krogen created SPARK-35918: --- Summary: Consolidate logic between AvroSerializer/AvroDeserializer for schema mismatch handling and error messages Key: SPARK-35918 URL: https://issues.apache.org/jira/browse/SPARK-35918 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.2 Reporter: Erik Krogen While working on [PR #31490|https://github.com/apache/spark/pull/31490] for SPARK-34365, we discussed that there is room for improvement in how schema mismatch errors are reported ([comment1|https://github.com/apache/spark/pull/31490#discussion_r659970793], [comment2|https://github.com/apache/spark/pull/31490#issuecomment-869866848]). We can also consolidate more of the logic between AvroSerializer and AvroDeserializer to avoid some duplication of error handling and consolidate how these error messages are generated. This will essentially be taking the [logic from the initial proposal from PR #31490|https://github.com/apache/spark/pull/31490/commits/83a922fdff08528e59233f67930ac78bfb3fa178], but applied separately from the current set of proposed changes to cut down on PR size. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35910) Update remoteBlockBytes based on merged block info
[ https://issues.apache.org/jira/browse/SPARK-35910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-35910: - Assignee: Kent Yao > Update remoteBlockBytes based on merged block info > -- > > Key: SPARK-35910 > URL: https://issues.apache.org/jira/browse/SPARK-35910 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 3.2.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > > Currently, we calculate the `remoteBlockBytes` based on the original block > info list. If the original reducer size is big but the actual reducer size is > small due to automatically partition coalescing of AQE, the reducer will take > more time to calculate `remoteBlockBytes`. We can reduce this cost via remote > requests which contain merged block info lists. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35910) Update remoteBlockBytes based on merged block info
[ https://issues.apache.org/jira/browse/SPARK-35910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-35910. --- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33109 [https://github.com/apache/spark/pull/33109] > Update remoteBlockBytes based on merged block info > -- > > Key: SPARK-35910 > URL: https://issues.apache.org/jira/browse/SPARK-35910 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 3.2.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.2.0 > > > Currently, we calculate the `remoteBlockBytes` based on the original block > info list. If the original reducer size is big but the actual reducer size is > small due to automatically partition coalescing of AQE, the reducer will take > more time to calculate `remoteBlockBytes`. We can reduce this cost via remote > requests which contain merged block info lists. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35859) Cleanup type hints.
[ https://issues.apache.org/jira/browse/SPARK-35859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370842#comment-17370842 ] Apache Spark commented on SPARK-35859: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/33117 > Cleanup type hints. > --- > > Key: SPARK-35859 > URL: https://issues.apache.org/jira/browse/SPARK-35859 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > > - Consolidate the declaration of type vars, type aliases, etc. > - Renam type vars, like {{T_Frame}}, {{T_IndexOps}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35859) Cleanup type hints.
[ https://issues.apache.org/jira/browse/SPARK-35859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35859: Assignee: Apache Spark > Cleanup type hints. > --- > > Key: SPARK-35859 > URL: https://issues.apache.org/jira/browse/SPARK-35859 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark >Priority: Major > > - Consolidate the declaration of type vars, type aliases, etc. > - Renam type vars, like {{T_Frame}}, {{T_IndexOps}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35859) Cleanup type hints.
[ https://issues.apache.org/jira/browse/SPARK-35859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35859: Assignee: (was: Apache Spark) > Cleanup type hints. > --- > > Key: SPARK-35859 > URL: https://issues.apache.org/jira/browse/SPARK-35859 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > > - Consolidate the declaration of type vars, type aliases, etc. > - Renam type vars, like {{T_Frame}}, {{T_IndexOps}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35859) Cleanup type hints.
[ https://issues.apache.org/jira/browse/SPARK-35859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370840#comment-17370840 ] Apache Spark commented on SPARK-35859: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/33117 > Cleanup type hints. > --- > > Key: SPARK-35859 > URL: https://issues.apache.org/jira/browse/SPARK-35859 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > > - Consolidate the declaration of type vars, type aliases, etc. > - Renam type vars, like {{T_Frame}}, {{T_IndexOps}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35259) ExternalBlockHandler metrics have misleading unit in the name
[ https://issues.apache.org/jira/browse/SPARK-35259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370792#comment-17370792 ] Apache Spark commented on SPARK-35259: -- User 'xkrogen' has created a pull request for this issue: https://github.com/apache/spark/pull/33116 > ExternalBlockHandler metrics have misleading unit in the name > - > > Key: SPARK-35259 > URL: https://issues.apache.org/jira/browse/SPARK-35259 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.1.1 >Reporter: Erik Krogen >Priority: Major > > Today {{ExternalBlockHandler}} exposes a few {{Timer}} metrics: > {code} > // Time latency for open block request in ms > private final Timer openBlockRequestLatencyMillis = new Timer(); > // Time latency for executor registration latency in ms > private final Timer registerExecutorRequestLatencyMillis = new Timer(); > // Time latency for processing fetch merged blocks meta request latency > in ms > private final Timer fetchMergedBlocksMetaLatencyMillis = new Timer(); > // Time latency for processing finalize shuffle merge request latency in > ms > private final Timer finalizeShuffleMergeLatencyMillis = new Timer(); > {code} > However these Dropwizard Timers by default use nanoseconds > ([documentation|https://metrics.dropwizard.io/3.2.3/getting-started.html#timers]). > It's certainly possible to extract milliseconds from them, but it seems > misleading to have millis in the name here. > This causes {{YarnShuffleServiceMetrics}} to expose confusingly-named metrics > like {{openBlockRequestLatencyMillis_count}} and > {{openBlockRequestLatencyMillis_nanos}}. It should be up to the metrics > exporter, like {{YarnShuffleServiceMetrics}}, to decide the unit and adjust > the name accordingly, so the unit shouldn't be included in the name of the > metric itself. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35259) ExternalBlockHandler metrics have misleading unit in the name
[ https://issues.apache.org/jira/browse/SPARK-35259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35259: Assignee: Apache Spark > ExternalBlockHandler metrics have misleading unit in the name > - > > Key: SPARK-35259 > URL: https://issues.apache.org/jira/browse/SPARK-35259 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.1.1 >Reporter: Erik Krogen >Assignee: Apache Spark >Priority: Major > > Today {{ExternalBlockHandler}} exposes a few {{Timer}} metrics: > {code} > // Time latency for open block request in ms > private final Timer openBlockRequestLatencyMillis = new Timer(); > // Time latency for executor registration latency in ms > private final Timer registerExecutorRequestLatencyMillis = new Timer(); > // Time latency for processing fetch merged blocks meta request latency > in ms > private final Timer fetchMergedBlocksMetaLatencyMillis = new Timer(); > // Time latency for processing finalize shuffle merge request latency in > ms > private final Timer finalizeShuffleMergeLatencyMillis = new Timer(); > {code} > However these Dropwizard Timers by default use nanoseconds > ([documentation|https://metrics.dropwizard.io/3.2.3/getting-started.html#timers]). > It's certainly possible to extract milliseconds from them, but it seems > misleading to have millis in the name here. > This causes {{YarnShuffleServiceMetrics}} to expose confusingly-named metrics > like {{openBlockRequestLatencyMillis_count}} and > {{openBlockRequestLatencyMillis_nanos}}. It should be up to the metrics > exporter, like {{YarnShuffleServiceMetrics}}, to decide the unit and adjust > the name accordingly, so the unit shouldn't be included in the name of the > metric itself. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35259) ExternalBlockHandler metrics have misleading unit in the name
[ https://issues.apache.org/jira/browse/SPARK-35259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370790#comment-17370790 ] Apache Spark commented on SPARK-35259: -- User 'xkrogen' has created a pull request for this issue: https://github.com/apache/spark/pull/33116 > ExternalBlockHandler metrics have misleading unit in the name > - > > Key: SPARK-35259 > URL: https://issues.apache.org/jira/browse/SPARK-35259 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.1.1 >Reporter: Erik Krogen >Priority: Major > > Today {{ExternalBlockHandler}} exposes a few {{Timer}} metrics: > {code} > // Time latency for open block request in ms > private final Timer openBlockRequestLatencyMillis = new Timer(); > // Time latency for executor registration latency in ms > private final Timer registerExecutorRequestLatencyMillis = new Timer(); > // Time latency for processing fetch merged blocks meta request latency > in ms > private final Timer fetchMergedBlocksMetaLatencyMillis = new Timer(); > // Time latency for processing finalize shuffle merge request latency in > ms > private final Timer finalizeShuffleMergeLatencyMillis = new Timer(); > {code} > However these Dropwizard Timers by default use nanoseconds > ([documentation|https://metrics.dropwizard.io/3.2.3/getting-started.html#timers]). > It's certainly possible to extract milliseconds from them, but it seems > misleading to have millis in the name here. > This causes {{YarnShuffleServiceMetrics}} to expose confusingly-named metrics > like {{openBlockRequestLatencyMillis_count}} and > {{openBlockRequestLatencyMillis_nanos}}. It should be up to the metrics > exporter, like {{YarnShuffleServiceMetrics}}, to decide the unit and adjust > the name accordingly, so the unit shouldn't be included in the name of the > metric itself. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35259) ExternalBlockHandler metrics have misleading unit in the name
[ https://issues.apache.org/jira/browse/SPARK-35259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35259: Assignee: (was: Apache Spark) > ExternalBlockHandler metrics have misleading unit in the name > - > > Key: SPARK-35259 > URL: https://issues.apache.org/jira/browse/SPARK-35259 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.1.1 >Reporter: Erik Krogen >Priority: Major > > Today {{ExternalBlockHandler}} exposes a few {{Timer}} metrics: > {code} > // Time latency for open block request in ms > private final Timer openBlockRequestLatencyMillis = new Timer(); > // Time latency for executor registration latency in ms > private final Timer registerExecutorRequestLatencyMillis = new Timer(); > // Time latency for processing fetch merged blocks meta request latency > in ms > private final Timer fetchMergedBlocksMetaLatencyMillis = new Timer(); > // Time latency for processing finalize shuffle merge request latency in > ms > private final Timer finalizeShuffleMergeLatencyMillis = new Timer(); > {code} > However these Dropwizard Timers by default use nanoseconds > ([documentation|https://metrics.dropwizard.io/3.2.3/getting-started.html#timers]). > It's certainly possible to extract milliseconds from them, but it seems > misleading to have millis in the name here. > This causes {{YarnShuffleServiceMetrics}} to expose confusingly-named metrics > like {{openBlockRequestLatencyMillis_count}} and > {{openBlockRequestLatencyMillis_nanos}}. It should be up to the metrics > exporter, like {{YarnShuffleServiceMetrics}}, to decide the unit and adjust > the name accordingly, so the unit shouldn't be included in the name of the > metric itself. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35259) ExternalBlockHandler metrics have misleading unit in the name
[ https://issues.apache.org/jira/browse/SPARK-35259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated SPARK-35259: Description: Today {{ExternalBlockHandler}} exposes a few {{Timer}} metrics: {code} // Time latency for open block request in ms private final Timer openBlockRequestLatencyMillis = new Timer(); // Time latency for executor registration latency in ms private final Timer registerExecutorRequestLatencyMillis = new Timer(); // Time latency for processing fetch merged blocks meta request latency in ms private final Timer fetchMergedBlocksMetaLatencyMillis = new Timer(); // Time latency for processing finalize shuffle merge request latency in ms private final Timer finalizeShuffleMergeLatencyMillis = new Timer(); {code} However these Dropwizard Timers by default use nanoseconds ([documentation|https://metrics.dropwizard.io/3.2.3/getting-started.html#timers]). It's certainly possible to extract milliseconds from them, but it seems misleading to have millis in the name here. This causes {{YarnShuffleServiceMetrics}} to expose confusingly-named metrics like {{openBlockRequestLatencyMillis_count}} and {{openBlockRequestLatencyMillis_nanos}}. It should be up to the metrics exporter, like {{YarnShuffleServiceMetrics}}, to decide the unit and adjust the name accordingly, so the unit shouldn't be included in the name of the metric itself. was: Today {{ExternalBlockHandler}} exposes a few {{Timer}} metrics: {code} // Time latency for open block request in ms private final Timer openBlockRequestLatencyMillis = new Timer(); // Time latency for executor registration latency in ms private final Timer registerExecutorRequestLatencyMillis = new Timer(); // Time latency for processing finalize shuffle merge request latency in ms private final Timer finalizeShuffleMergeLatencyMillis = new Timer(); {code} However these Dropwizard Timers by default use nanoseconds ([documentation|https://metrics.dropwizard.io/3.2.3/getting-started.html#timers]). It's certainly possible to extract milliseconds from them, but it seems misleading to have millis in the name here. {{YarnShuffleServiceMetrics}} currently doesn't expose any incorrectly-named metrics since it doesn't export any timing information from these metrics (which I am trying to address in SPARK-35258), but these names still result in kind of misleading metric names like {{finalizeShuffleMergeLatencyMillis_count}} -- a count doesn't have a unit. It should be up to the metrics exporter, like {{YarnShuffleServiceMetrics}}, to decide the unit and adjust the name accordingly. > ExternalBlockHandler metrics have misleading unit in the name > - > > Key: SPARK-35259 > URL: https://issues.apache.org/jira/browse/SPARK-35259 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.1.1 >Reporter: Erik Krogen >Priority: Major > > Today {{ExternalBlockHandler}} exposes a few {{Timer}} metrics: > {code} > // Time latency for open block request in ms > private final Timer openBlockRequestLatencyMillis = new Timer(); > // Time latency for executor registration latency in ms > private final Timer registerExecutorRequestLatencyMillis = new Timer(); > // Time latency for processing fetch merged blocks meta request latency > in ms > private final Timer fetchMergedBlocksMetaLatencyMillis = new Timer(); > // Time latency for processing finalize shuffle merge request latency in > ms > private final Timer finalizeShuffleMergeLatencyMillis = new Timer(); > {code} > However these Dropwizard Timers by default use nanoseconds > ([documentation|https://metrics.dropwizard.io/3.2.3/getting-started.html#timers]). > It's certainly possible to extract milliseconds from them, but it seems > misleading to have millis in the name here. > This causes {{YarnShuffleServiceMetrics}} to expose confusingly-named metrics > like {{openBlockRequestLatencyMillis_count}} and > {{openBlockRequestLatencyMillis_nanos}}. It should be up to the metrics > exporter, like {{YarnShuffleServiceMetrics}}, to decide the unit and adjust > the name accordingly, so the unit shouldn't be included in the name of the > metric itself. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26764) [SPIP] Spark Relational Cache
[ https://issues.apache.org/jira/browse/SPARK-26764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370770#comment-17370770 ] Zheng Shao commented on SPARK-26764: [~adrian-wang] It has been over 2 years since this issue was created. Can you give us an update on the latest status of this effort so far? > [SPIP] Spark Relational Cache > - > > Key: SPARK-26764 > URL: https://issues.apache.org/jira/browse/SPARK-26764 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Adrian Wang >Priority: Major > Attachments: Relational+Cache+SPIP.pdf > > > In modern database systems, relational cache is a common technology to boost > ad-hoc queries. While Spark provides cache natively, Spark SQL should be able > to utilize the relationship between relations to boost all possible queries. > In this SPIP, we will make Spark be able to utilize all defined cached > relations if possible, without explicit substitution in user query, as well > as keep some user defined cache available in different sessions. Materialized > views in many database systems provide similar function. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29038) SPIP: Support Spark Materialized View
[ https://issues.apache.org/jira/browse/SPARK-29038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370768#comment-17370768 ] Zheng Shao commented on SPARK-29038: [~cltlfcjin] and [~AidenZhang]. I also recently started to look at materialized views. This is a huge opportunity for us to improve query performance. It has been almost a year since the last update. Are there any new updates from your side? > SPIP: Support Spark Materialized View > - > > Key: SPARK-29038 > URL: https://issues.apache.org/jira/browse/SPARK-29038 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Lantao Jin >Priority: Major > > Materialized view is an important approach in DBMS to cache data to > accelerate queries. By creating a materialized view through SQL, the data > that can be cached is very flexible, and needs to be configured arbitrarily > according to specific usage scenarios. The Materialization Manager > automatically updates the cache data according to changes in detail source > tables, simplifying user work. When user submit query, Spark optimizer > rewrites the execution plan based on the available materialized view to > determine the optimal execution plan. > Details in [design > doc|https://docs.google.com/document/d/1q5pjSWoTNVc9zsAfbNzJ-guHyVwPsEroIEP8Cca179A/edit?usp=sharing] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35917) Disable push-based shuffle until the feature is complete
[ https://issues.apache.org/jira/browse/SPARK-35917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated SPARK-35917: -- Description: Push-based shuffle is partially merged in apache master but some of the tasks are still incomplete. Since 3.2 is going to cut soon, we will not be able to get the pending tasks reviewed and merged. Few of the pending tasks make protocol changes to the push-based shuffle protocols, so we would like to prevent users from enabling push-based shuffle both on the client and the server until push-based shuffle implementation is complete. We can prevent push-based shuffle to be used by throwing {{UnsupportedOperationException}} (or something like that) both on the client and the server when the user tries to enable it. > Disable push-based shuffle until the feature is complete > > > Key: SPARK-35917 > URL: https://issues.apache.org/jira/browse/SPARK-35917 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Chandni Singh >Priority: Major > > Push-based shuffle is partially merged in apache master but some of the tasks > are still incomplete. Since 3.2 is going to cut soon, we will not be able to > get the pending tasks reviewed and merged. Few of the pending tasks make > protocol changes to the push-based shuffle protocols, so we would like to > prevent users from enabling push-based shuffle both on the client and the > server until push-based shuffle implementation is complete. > We can prevent push-based shuffle to be used by throwing > {{UnsupportedOperationException}} (or something like that) both on the client > and the server when the user tries to enable it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35917) Disable push-based shuffle until the feature is complete
Chandni Singh created SPARK-35917: - Summary: Disable push-based shuffle until the feature is complete Key: SPARK-35917 URL: https://issues.apache.org/jira/browse/SPARK-35917 Project: Spark Issue Type: Sub-task Components: Shuffle, Spark Core Affects Versions: 3.1.0 Reporter: Chandni Singh -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35898) Converting arrays with RowToColumnConverter triggers assertion
[ https://issues.apache.org/jira/browse/SPARK-35898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-35898. --- Fix Version/s: 3.1.3 3.2.0 Resolution: Fixed > Converting arrays with RowToColumnConverter triggers assertion > -- > > Key: SPARK-35898 > URL: https://issues.apache.org/jira/browse/SPARK-35898 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2 >Reporter: Tom van Bussel >Assignee: Tom van Bussel >Priority: Major > Fix For: 3.2.0, 3.1.3 > > > When trying to convert a row that contains an array to a ColumnVector with > RowToColumnConverter the following error is thrown: > {code:java} > java.lang.AssertionError at > org.apache.spark.sql.execution.vectorized.OffHeapColumnVector.putArray(OffHeapColumnVector.java:560) > at > org.apache.spark.sql.execution.vectorized.WritableColumnVector.appendArray(WritableColumnVector.java:622) > at > org.apache.spark.sql.execution.RowToColumnConverter$ArrayConverter.append(Columnar.scala:353) > at > org.apache.spark.sql.execution.RowToColumnConverter$BasicNullableTypeConverter.append(Columnar.scala:241) > at > org.apache.spark.sql.execution.RowToColumnConverter.convert(Columnar.scala:221) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35898) Converting arrays with RowToColumnConverter triggers assertion
[ https://issues.apache.org/jira/browse/SPARK-35898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell reassigned SPARK-35898: - Assignee: Tom van Bussel > Converting arrays with RowToColumnConverter triggers assertion > -- > > Key: SPARK-35898 > URL: https://issues.apache.org/jira/browse/SPARK-35898 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2 >Reporter: Tom van Bussel >Assignee: Tom van Bussel >Priority: Major > > When trying to convert a row that contains an array to a ColumnVector with > RowToColumnConverter the following error is thrown: > {code:java} > java.lang.AssertionError at > org.apache.spark.sql.execution.vectorized.OffHeapColumnVector.putArray(OffHeapColumnVector.java:560) > at > org.apache.spark.sql.execution.vectorized.WritableColumnVector.appendArray(WritableColumnVector.java:622) > at > org.apache.spark.sql.execution.RowToColumnConverter$ArrayConverter.append(Columnar.scala:353) > at > org.apache.spark.sql.execution.RowToColumnConverter$BasicNullableTypeConverter.append(Columnar.scala:241) > at > org.apache.spark.sql.execution.RowToColumnConverter.convert(Columnar.scala:221) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35802) Error loading the stages/stage/ page in spark UI
[ https://issues.apache.org/jira/browse/SPARK-35802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370626#comment-17370626 ] Helt Long commented on SPARK-35802: --- Sound like it's key point, I will try higher hadoop version and close it myself. Thanks [~sarutak]! > Error loading the stages/stage/ page in spark UI > > > Key: SPARK-35802 > URL: https://issues.apache.org/jira/browse/SPARK-35802 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0, 3.0.1, 3.1.1, 3.1.2 > Environment: CDH 5.7.1: Hadoop 2.6.5 > Spark on yarn cluster mode >Reporter: Helt Long >Priority: Major > Attachments: spark3.1.2-request-20210628093538.png, > spark3.1.2-stage-20210628093549.png, spark3.1.2-webui-20210628093559.png > > > I try to load the sparkUI page for a specific stage, I get the following > error: > {quote}Unable to connect to the server. Looks like the Spark application must > have ended. Please Switch to the history UI. > {quote} > Obviously the server is still alive and process new messages. > Looking at the network tab shows one of the requests fails: > > {{curl > 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable' > > > > Error 500 Request failed. > > HTTP ERROR 500 > Problem accessing > /api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable. Reason: > Request failed. href="http://eclipse.org/jetty;>Powered by Jetty:// 9.4.z-SNAPSHOT > > }} > requests to any other object that I've tested seem to work, for example > > {{curl > 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskSummary'}} > > The exception is: > {{/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable > javax.servlet.ServletException: java.lang.NullPointerException > at > org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410) > at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205) > at > org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873) > at > org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623) > at > org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) > at > org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610) > at > org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) > at > org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) > at > org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) > at > org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247) > at > org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) > at > org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753) > at > org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220) > at > org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) > at org.sparkproject.jetty.server.Server.handle(Server.java:505) > at org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:370) > at > org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:267) > at > org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) > at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:103) > at org.sparkproject.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) > at > org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) > at >
[jira] [Commented] (SPARK-35802) Error loading the stages/stage/ page in spark UI
[ https://issues.apache.org/jira/browse/SPARK-35802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370590#comment-17370590 ] Kousuke Saruta commented on SPARK-35802: Hadoop version is 2.9.0 on my environment. I ran an application with yarn-cluster mode but the issue was not reproduced. > Error loading the stages/stage/ page in spark UI > > > Key: SPARK-35802 > URL: https://issues.apache.org/jira/browse/SPARK-35802 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0, 3.0.1, 3.1.1, 3.1.2 > Environment: CDH 5.7.1: Hadoop 2.6.5 > Spark on yarn cluster mode >Reporter: Helt Long >Priority: Major > Attachments: spark3.1.2-request-20210628093538.png, > spark3.1.2-stage-20210628093549.png, spark3.1.2-webui-20210628093559.png > > > I try to load the sparkUI page for a specific stage, I get the following > error: > {quote}Unable to connect to the server. Looks like the Spark application must > have ended. Please Switch to the history UI. > {quote} > Obviously the server is still alive and process new messages. > Looking at the network tab shows one of the requests fails: > > {{curl > 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable' > > > > Error 500 Request failed. > > HTTP ERROR 500 > Problem accessing > /api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable. Reason: > Request failed. href="http://eclipse.org/jetty;>Powered by Jetty:// 9.4.z-SNAPSHOT > > }} > requests to any other object that I've tested seem to work, for example > > {{curl > 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskSummary'}} > > The exception is: > {{/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable > javax.servlet.ServletException: java.lang.NullPointerException > at > org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410) > at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205) > at > org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873) > at > org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623) > at > org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) > at > org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610) > at > org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) > at > org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) > at > org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) > at > org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247) > at > org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) > at > org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753) > at > org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220) > at > org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) > at org.sparkproject.jetty.server.Server.handle(Server.java:505) > at org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:370) > at > org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:267) > at > org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) > at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:103) > at org.sparkproject.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) > at > org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) > at >
[jira] [Assigned] (SPARK-35916) Support subtraction among Date/Timestamp/TimestampWithoutTZ
[ https://issues.apache.org/jira/browse/SPARK-35916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35916: Assignee: Gengliang Wang (was: Apache Spark) > Support subtraction among Date/Timestamp/TimestampWithoutTZ > --- > > Key: SPARK-35916 > URL: https://issues.apache.org/jira/browse/SPARK-35916 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Support the following operations: > * TimestampWithoutTZ - Date > * Date - TimestampWithoutTZ > * TimestampWithoutTZ - Timestamp > * Timestamp - TimestampWithoutTZ > * TimestampWithoutTZ - TimestampWithoutTZ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35916) Support subtraction among Date/Timestamp/TimestampWithoutTZ
[ https://issues.apache.org/jira/browse/SPARK-35916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370578#comment-17370578 ] Apache Spark commented on SPARK-35916: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/33115 > Support subtraction among Date/Timestamp/TimestampWithoutTZ > --- > > Key: SPARK-35916 > URL: https://issues.apache.org/jira/browse/SPARK-35916 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Support the following operations: > * TimestampWithoutTZ - Date > * Date - TimestampWithoutTZ > * TimestampWithoutTZ - Timestamp > * Timestamp - TimestampWithoutTZ > * TimestampWithoutTZ - TimestampWithoutTZ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35916) Support subtraction among Date/Timestamp/TimestampWithoutTZ
[ https://issues.apache.org/jira/browse/SPARK-35916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35916: Assignee: Apache Spark (was: Gengliang Wang) > Support subtraction among Date/Timestamp/TimestampWithoutTZ > --- > > Key: SPARK-35916 > URL: https://issues.apache.org/jira/browse/SPARK-35916 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > > Support the following operations: > * TimestampWithoutTZ - Date > * Date - TimestampWithoutTZ > * TimestampWithoutTZ - Timestamp > * Timestamp - TimestampWithoutTZ > * TimestampWithoutTZ - TimestampWithoutTZ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35916) Support subtraction among Date/Timestamp/TimestampWithoutTZ
Gengliang Wang created SPARK-35916: -- Summary: Support subtraction among Date/Timestamp/TimestampWithoutTZ Key: SPARK-35916 URL: https://issues.apache.org/jira/browse/SPARK-35916 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Gengliang Wang Assignee: Gengliang Wang Support the following operations: * TimestampWithoutTZ - Date * Date - TimestampWithoutTZ * TimestampWithoutTZ - Timestamp * Timestamp - TimestampWithoutTZ * TimestampWithoutTZ - TimestampWithoutTZ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35912) [SQL] JSON read behavior is different depending on the cache setting when nullable is false.
[ https://issues.apache.org/jira/browse/SPARK-35912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370512#comment-17370512 ] Fu Chen commented on SPARK-35912: - Working on this > [SQL] JSON read behavior is different depending on the cache setting when > nullable is false. > > > Key: SPARK-35912 > URL: https://issues.apache.org/jira/browse/SPARK-35912 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 >Reporter: Heedo Lee >Priority: Minor > > Below is the reproduced code. > > {code:java} > import org.apache.spark.sql.Encoders > > case class TestSchema(x: Int, y: Int) > case class BaseSchema(value: TestSchema) > > val schema = Encoders.product[BaseSchema].schema > val testDS = Seq("""{"value":{"x":1}}""", """{"value":{"x":2}}""").toDS > val jsonDS = spark.read.schema(schema).json(testDS) > jsonDS.show > +-+ > |value| > +-+ > |{1, null}| > |{2, null}| > +-+ > jsonDS.cache.show > +--+ > | value| > +--+ > |{1, 0}| > |{2, 0}| > +--+ > {code} > > The above result occurs when a schema is created with a nested StructType and > nullable of StructField is false. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35915) Kafka doesn't recover from data loss
Yuval Yellin created SPARK-35915: Summary: Kafka doesn't recover from data loss Key: SPARK-35915 URL: https://issues.apache.org/jira/browse/SPARK-35915 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 3.1.1 Reporter: Yuval Yellin I configured a strcutured streaming source for kafka with failOnDataLoss=false, Getting this error when checkopint offsets are not found : {code:java} Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 5.0 failed 1 times, most recent failure: Lost task 7.0 in stage 5.0 (TID 113) ( executor driver): java.lang.IllegalStateException: This consumer has already been closed. at org.apache.kafka.clients.consumer.KafkaConsumer.acquireAndEnsureOpen(KafkaConsumer.java:2439) at org.apache.kafka.clients.consumer.KafkaConsumer.seekToBeginning(KafkaConsumer.java:1656) at org.apache.spark.sql.kafka010.consumer.InternalKafkaConsumer.getAvailableOffsetRange(KafkaDataConsumer.scala:108) at org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer.getEarliestAvailableOffsetBetween(KafkaDataConsumer.scala:385) at org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer.$anonfun$get$1(KafkaDataConsumer.scala:332) at org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77) at org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer.runUninterruptiblyIfPossible(KafkaDataConsumer.scala:604) at org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer.get(KafkaDataConsumer.scala:287) at org.apache.spark.sql.kafka010.KafkaBatchPartitionReader.next(KafkaBatchPartitionReader.scala:63) at org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:79) at org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:112) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) {code} The issue seems to me to be related to the OffsetOutOfRange exception in (line (323 in KafkaDataConsumer): {code:java} case e: OffsetOutOfRangeException => // When there is some error thrown, it's better to use a new consumer to drop all cached // states in the old consumer. We don't need to worry about the performance because this // is not a common path. releaseConsumer() fetchedData.reset() reportDataLoss(topicPartition, groupId, failOnDataLoss, s"Cannot fetch offset $toFetchOffset", e) toFetchOffset = getEarliestAvailableOffsetBetween(consumer, toFetchOffset, untilOffset) } {code} seems like releaseConsumer will destoy the consumer , which later is used ... -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35914) Driver can't distribute task to executor because NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-35914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Helt Long updated SPARK-35914: -- Attachment: webui stuck.png > Driver can't distribute task to executor because NullPointerException > - > > Key: SPARK-35914 > URL: https://issues.apache.org/jira/browse/SPARK-35914 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1, 3.1.1, 3.1.2 > Environment: CDH 5.7.1: Hadoop 2.6.5 > Spark 3.0.1, 3.1.1, 3.1.2 >Reporter: Helt Long >Priority: Major > Attachments: stuck log.png, webui stuck.png > > > When use spark3 submit a spark job to yarn cluster, I get a problem. Once in > a while, driver can't distribute any tasks to any executors, and the stage > will stuck , the total spark job will stuck. Check driver log, I found > NullPointerException. It's like a netty problem, I can confirm this problem > only exist in spark3, because I use spark2 never happend. > > {code:java} > // Error message > 21/06/28 14:42:43 INFO TaskSetManager: Starting task 2592.0 in stage 1.0 (TID > 3494) (worker39.hadoop, executor 84, partition 2592, RACK_LOCAL, 5006 bytes) > taskResourceAssignments Map() > 21/06/28 14:42:43 INFO TaskSetManager: Finished task 4155.0 in stage 1.0 (TID > 3367) in 36670 ms on worker39.hadoop (executor 84) (3278/4249) > 21/06/28 14:42:43 INFO TaskSetManager: Finished task 2283.0 in stage 1.0 (TID > 3422) in 22371 ms on worker15.hadoop (executor 109) (3279/4249) > 21/06/28 14:42:43 ERROR Inbox: Ignoring error > java.lang.NullPointerException > at java.lang.String.length(String.java:623) > at > java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:420) > at java.lang.StringBuilder.append(StringBuilder.java:136) > at > org.apache.spark.scheduler.TaskSetManager.$anonfun$resourceOffer$5(TaskSetManager.scala:483) > at org.apache.spark.internal.Logging.logInfo(Logging.scala:57) > at org.apache.spark.internal.Logging.logInfo$(Logging.scala:56) > at > org.apache.spark.scheduler.TaskSetManager.logInfo(TaskSetManager.scala:54) > at > org.apache.spark.scheduler.TaskSetManager.$anonfun$resourceOffer$2(TaskSetManager.scala:484) > at scala.Option.map(Option.scala:230) > at > org.apache.spark.scheduler.TaskSetManager.resourceOffer(TaskSetManager.scala:444) > at > org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$2(TaskSchedulerImpl.scala:397) > at > org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$2$adapted(TaskSchedulerImpl.scala:392) > at scala.Option.foreach(Option.scala:407) > at > org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$1(TaskSchedulerImpl.scala:392) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) > at > org.apache.spark.scheduler.TaskSchedulerImpl.resourceOfferSingleTaskSet(TaskSchedulerImpl.scala:383) > at > org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$20(TaskSchedulerImpl.scala:581) > at > org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$20$adapted(TaskSchedulerImpl.scala:576) > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) > at > org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$16(TaskSchedulerImpl.scala:576) > at > org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$16$adapted(TaskSchedulerImpl.scala:547) > at > scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at > scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at > org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:547) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.$anonfun$makeOffers$5(CoarseGrainedSchedulerBackend.scala:340) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.org$apache$spark$scheduler$cluster$CoarseGrainedSchedulerBackend$$withLock(CoarseGrainedSchedulerBackend.scala:904) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.org$apache$spark$scheduler$cluster$CoarseGrainedSchedulerBackend$DriverEndpoint$$makeOffers(CoarseGrainedSchedulerBackend.scala:332) > at >
[jira] [Updated] (SPARK-35914) Driver can't distribute task to executor because NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-35914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Helt Long updated SPARK-35914: -- Attachment: stuck log.png > Driver can't distribute task to executor because NullPointerException > - > > Key: SPARK-35914 > URL: https://issues.apache.org/jira/browse/SPARK-35914 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1, 3.1.1, 3.1.2 > Environment: CDH 5.7.1: Hadoop 2.6.5 > Spark 3.0.1, 3.1.1, 3.1.2 >Reporter: Helt Long >Priority: Major > Attachments: stuck log.png, webui stuck.png > > > When use spark3 submit a spark job to yarn cluster, I get a problem. Once in > a while, driver can't distribute any tasks to any executors, and the stage > will stuck , the total spark job will stuck. Check driver log, I found > NullPointerException. It's like a netty problem, I can confirm this problem > only exist in spark3, because I use spark2 never happend. > > {code:java} > // Error message > 21/06/28 14:42:43 INFO TaskSetManager: Starting task 2592.0 in stage 1.0 (TID > 3494) (worker39.hadoop, executor 84, partition 2592, RACK_LOCAL, 5006 bytes) > taskResourceAssignments Map() > 21/06/28 14:42:43 INFO TaskSetManager: Finished task 4155.0 in stage 1.0 (TID > 3367) in 36670 ms on worker39.hadoop (executor 84) (3278/4249) > 21/06/28 14:42:43 INFO TaskSetManager: Finished task 2283.0 in stage 1.0 (TID > 3422) in 22371 ms on worker15.hadoop (executor 109) (3279/4249) > 21/06/28 14:42:43 ERROR Inbox: Ignoring error > java.lang.NullPointerException > at java.lang.String.length(String.java:623) > at > java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:420) > at java.lang.StringBuilder.append(StringBuilder.java:136) > at > org.apache.spark.scheduler.TaskSetManager.$anonfun$resourceOffer$5(TaskSetManager.scala:483) > at org.apache.spark.internal.Logging.logInfo(Logging.scala:57) > at org.apache.spark.internal.Logging.logInfo$(Logging.scala:56) > at > org.apache.spark.scheduler.TaskSetManager.logInfo(TaskSetManager.scala:54) > at > org.apache.spark.scheduler.TaskSetManager.$anonfun$resourceOffer$2(TaskSetManager.scala:484) > at scala.Option.map(Option.scala:230) > at > org.apache.spark.scheduler.TaskSetManager.resourceOffer(TaskSetManager.scala:444) > at > org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$2(TaskSchedulerImpl.scala:397) > at > org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$2$adapted(TaskSchedulerImpl.scala:392) > at scala.Option.foreach(Option.scala:407) > at > org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$1(TaskSchedulerImpl.scala:392) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) > at > org.apache.spark.scheduler.TaskSchedulerImpl.resourceOfferSingleTaskSet(TaskSchedulerImpl.scala:383) > at > org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$20(TaskSchedulerImpl.scala:581) > at > org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$20$adapted(TaskSchedulerImpl.scala:576) > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) > at > org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$16(TaskSchedulerImpl.scala:576) > at > org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$16$adapted(TaskSchedulerImpl.scala:547) > at > scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at > scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at > org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:547) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.$anonfun$makeOffers$5(CoarseGrainedSchedulerBackend.scala:340) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.org$apache$spark$scheduler$cluster$CoarseGrainedSchedulerBackend$$withLock(CoarseGrainedSchedulerBackend.scala:904) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.org$apache$spark$scheduler$cluster$CoarseGrainedSchedulerBackend$DriverEndpoint$$makeOffers(CoarseGrainedSchedulerBackend.scala:332) > at >
[jira] [Assigned] (SPARK-35904) Collapse above RebalancePartitions
[ https://issues.apache.org/jira/browse/SPARK-35904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35904: Assignee: Yuming Wang (was: Apache Spark) > Collapse above RebalancePartitions > -- > > Key: SPARK-35904 > URL: https://issues.apache.org/jira/browse/SPARK-35904 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > > Make RebalancePartitions extends RepartitionOperation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35904) Collapse above RebalancePartitions
[ https://issues.apache.org/jira/browse/SPARK-35904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35904: Assignee: Apache Spark (was: Yuming Wang) > Collapse above RebalancePartitions > -- > > Key: SPARK-35904 > URL: https://issues.apache.org/jira/browse/SPARK-35904 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > > Make RebalancePartitions extends RepartitionOperation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35904) Collapse above RebalancePartitions
[ https://issues.apache.org/jira/browse/SPARK-35904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-35904: Fix Version/s: (was: 3.2.0) > Collapse above RebalancePartitions > -- > > Key: SPARK-35904 > URL: https://issues.apache.org/jira/browse/SPARK-35904 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > > Make RebalancePartitions extends RepartitionOperation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-35904) Collapse above RebalancePartitions
[ https://issues.apache.org/jira/browse/SPARK-35904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang reopened SPARK-35904: - Reverted at https://github.com/apache/spark/commit/108635af1708173a72bec0e36bf3f2cea5b088c4 > Collapse above RebalancePartitions > -- > > Key: SPARK-35904 > URL: https://issues.apache.org/jira/browse/SPARK-35904 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.2.0 > > > Make RebalancePartitions extends RepartitionOperation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35672) Spark fails to launch executors with very large user classpath lists on YARN
[ https://issues.apache.org/jira/browse/SPARK-35672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-35672: - Fix Version/s: 3.1.3 > Spark fails to launch executors with very large user classpath lists on YARN > > > Key: SPARK-35672 > URL: https://issues.apache.org/jira/browse/SPARK-35672 > Project: Spark > Issue Type: Bug > Components: Spark Core, YARN >Affects Versions: 3.1.2 > Environment: Linux RHEL7 > Spark 3.1.1 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 3.2.0, 3.1.3 > > > When running Spark on YARN, the {{user-class-path}} argument to > {{CoarseGrainedExecutorBackend}} is used to pass a list of user JAR URIs to > executor processes. The argument is specified once for each JAR, and the URIs > are fully-qualified, so the paths can be quite long. With large user JAR > lists (say 1000+), this can result in system-level argument length limits > being exceeded, typically manifesting as the error message: > {code} > /bin/bash: Argument list too long > {code} > A [Google > search|https://www.google.com/search?q=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22] > indicates that this is not a theoretical problem and afflicts real users, > including ours. This issue was originally observed on Spark 2.3, but has been > confirmed to exist in the master branch as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35802) Error loading the stages/stage/ page in spark UI
[ https://issues.apache.org/jira/browse/SPARK-35802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370491#comment-17370491 ] Helt Long commented on SPARK-35802: --- [~sarutak] I used CDH 5.7.1, and used Spark on Yarn cluster mode, this problem happend all the time. When I google thhis problem, I found the same problem on stackoverflow, so I move the problem there [Error loading the stages/stage/ page in spark UI - Stack Overflow|https://stackoverflow.com/questions/64265444/error-loading-the-stages-stage-id-page-in-spark-ui] I can 100% recurrence problem I add some message about env like blow: CDH 5.7.1: Hadoop 2.6.5 Spark on yarn cluster mode > Error loading the stages/stage/ page in spark UI > > > Key: SPARK-35802 > URL: https://issues.apache.org/jira/browse/SPARK-35802 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0, 3.0.1, 3.1.1, 3.1.2 > Environment: CDH 5.7.1: Hadoop 2.6.5 > Spark on yarn cluster mode >Reporter: Helt Long >Priority: Major > Attachments: spark3.1.2-request-20210628093538.png, > spark3.1.2-stage-20210628093549.png, spark3.1.2-webui-20210628093559.png > > > I try to load the sparkUI page for a specific stage, I get the following > error: > {quote}Unable to connect to the server. Looks like the Spark application must > have ended. Please Switch to the history UI. > {quote} > Obviously the server is still alive and process new messages. > Looking at the network tab shows one of the requests fails: > > {{curl > 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable' > > > > Error 500 Request failed. > > HTTP ERROR 500 > Problem accessing > /api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable. Reason: > Request failed. href="http://eclipse.org/jetty;>Powered by Jetty:// 9.4.z-SNAPSHOT > > }} > requests to any other object that I've tested seem to work, for example > > {{curl > 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskSummary'}} > > The exception is: > {{/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable > javax.servlet.ServletException: java.lang.NullPointerException > at > org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410) > at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205) > at > org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873) > at > org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623) > at > org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) > at > org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610) > at > org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) > at > org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) > at > org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) > at > org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247) > at > org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) > at > org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753) > at > org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220) > at > org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) > at org.sparkproject.jetty.server.Server.handle(Server.java:505) > at org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:370) > at > org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:267) > at > org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) > at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:103) > at org.sparkproject.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) > at >
[jira] [Comment Edited] (SPARK-35802) Error loading the stages/stage/ page in spark UI
[ https://issues.apache.org/jira/browse/SPARK-35802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370491#comment-17370491 ] Helt Long edited comment on SPARK-35802 at 6/28/21, 8:33 AM: - [~sarutak] I used CDH 5.7.1, and used Spark on Yarn cluster mode, this problem happend all the time. When I google this problem, I found the same problem on stackoverflow, so I move the problem there [Error loading the stages/stage/ page in spark UI - Stack Overflow|https://stackoverflow.com/questions/64265444/error-loading-the-stages-stage-id-page-in-spark-ui] I can 100% recurrence problem I add some message about env like blow: CDH 5.7.1: Hadoop 2.6.5 Spark on yarn cluster mode was (Author: heltman): [~sarutak] I used CDH 5.7.1, and used Spark on Yarn cluster mode, this problem happend all the time. When I google thhis problem, I found the same problem on stackoverflow, so I move the problem there [Error loading the stages/stage/ page in spark UI - Stack Overflow|https://stackoverflow.com/questions/64265444/error-loading-the-stages-stage-id-page-in-spark-ui] I can 100% recurrence problem I add some message about env like blow: CDH 5.7.1: Hadoop 2.6.5 Spark on yarn cluster mode > Error loading the stages/stage/ page in spark UI > > > Key: SPARK-35802 > URL: https://issues.apache.org/jira/browse/SPARK-35802 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0, 3.0.1, 3.1.1, 3.1.2 > Environment: CDH 5.7.1: Hadoop 2.6.5 > Spark on yarn cluster mode >Reporter: Helt Long >Priority: Major > Attachments: spark3.1.2-request-20210628093538.png, > spark3.1.2-stage-20210628093549.png, spark3.1.2-webui-20210628093559.png > > > I try to load the sparkUI page for a specific stage, I get the following > error: > {quote}Unable to connect to the server. Looks like the Spark application must > have ended. Please Switch to the history UI. > {quote} > Obviously the server is still alive and process new messages. > Looking at the network tab shows one of the requests fails: > > {{curl > 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable' > > > > Error 500 Request failed. > > HTTP ERROR 500 > Problem accessing > /api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable. Reason: > Request failed. href="http://eclipse.org/jetty;>Powered by Jetty:// 9.4.z-SNAPSHOT > > }} > requests to any other object that I've tested seem to work, for example > > {{curl > 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskSummary'}} > > The exception is: > {{/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable > javax.servlet.ServletException: java.lang.NullPointerException > at > org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410) > at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205) > at > org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873) > at > org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623) > at > org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) > at > org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610) > at > org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) > at > org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) > at > org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) > at > org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247) > at > org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) > at > org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753) > at > org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220) > at > org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) > at org.sparkproject.jetty.server.Server.handle(Server.java:505) > at
[jira] [Updated] (SPARK-35802) Error loading the stages/stage/ page in spark UI
[ https://issues.apache.org/jira/browse/SPARK-35802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Helt Long updated SPARK-35802: -- Description: I try to load the sparkUI page for a specific stage, I get the following error: {quote}Unable to connect to the server. Looks like the Spark application must have ended. Please Switch to the history UI. {quote} Obviously the server is still alive and process new messages. Looking at the network tab shows one of the requests fails: {{curl 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable' Error 500 Request failed. HTTP ERROR 500 Problem accessing /api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable. Reason: Request failed.http://eclipse.org/jetty;>Powered by Jetty:// 9.4.z-SNAPSHOT }} requests to any other object that I've tested seem to work, for example {{curl 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskSummary'}} The exception is: {{/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable javax.servlet.ServletException: java.lang.NullPointerException at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410) at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205) at org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873) at org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623) at org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) at org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610) at org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) at org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) at org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345) at org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) at org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247) at org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753) at org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220) at org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.sparkproject.jetty.server.Server.handle(Server.java:505) at org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:370) at org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:267) at org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:103) at org.sparkproject.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) at org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) at org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) at org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) at org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) at org.sparkproject.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:698) at org.sparkproject.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:804) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException at org.apache.spark.status.api.v1.StagesResource.$anonfun$doPagination$1(StagesResource.scala:175) at org.apache.spark.status.api.v1.BaseAppResource.$anonfun$withUI$1(ApiRootResource.scala:140) at org.apache.spark.ui.SparkUI.withSparkUI(SparkUI.scala:107) at org.apache.spark.status.api.v1.BaseAppResource.withUI(ApiRootResource.scala:135) at org.apache.spark.status.api.v1.BaseAppResource.withUI$(ApiRootResource.scala:133) at org.apache.spark.status.api.v1.StagesResource.withUI(StagesResource.scala:28) at org.apache.spark.status.api.v1.StagesResource.doPagination(StagesResource.scala:174) at org.apache.spark.status.api.v1.StagesResource.$anonfun$taskTable$1(StagesResource.scala:129) at
[jira] [Created] (SPARK-35914) Driver can't distribute task to executor because NullPointerException
Helt Long created SPARK-35914: - Summary: Driver can't distribute task to executor because NullPointerException Key: SPARK-35914 URL: https://issues.apache.org/jira/browse/SPARK-35914 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.1.2, 3.1.1, 3.0.1 Environment: CDH 5.7.1: Hadoop 2.6.5 Spark 3.0.1, 3.1.1, 3.1.2 Reporter: Helt Long When use spark3 submit a spark job to yarn cluster, I get a problem. Once in a while, driver can't distribute any tasks to any executors, and the stage will stuck , the total spark job will stuck. Check driver log, I found NullPointerException. It's like a netty problem, I can confirm this problem only exist in spark3, because I use spark2 never happend. {code:java} // Error message 21/06/28 14:42:43 INFO TaskSetManager: Starting task 2592.0 in stage 1.0 (TID 3494) (worker39.hadoop, executor 84, partition 2592, RACK_LOCAL, 5006 bytes) taskResourceAssignments Map() 21/06/28 14:42:43 INFO TaskSetManager: Finished task 4155.0 in stage 1.0 (TID 3367) in 36670 ms on worker39.hadoop (executor 84) (3278/4249) 21/06/28 14:42:43 INFO TaskSetManager: Finished task 2283.0 in stage 1.0 (TID 3422) in 22371 ms on worker15.hadoop (executor 109) (3279/4249) 21/06/28 14:42:43 ERROR Inbox: Ignoring error java.lang.NullPointerException at java.lang.String.length(String.java:623) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:420) at java.lang.StringBuilder.append(StringBuilder.java:136) at org.apache.spark.scheduler.TaskSetManager.$anonfun$resourceOffer$5(TaskSetManager.scala:483) at org.apache.spark.internal.Logging.logInfo(Logging.scala:57) at org.apache.spark.internal.Logging.logInfo$(Logging.scala:56) at org.apache.spark.scheduler.TaskSetManager.logInfo(TaskSetManager.scala:54) at org.apache.spark.scheduler.TaskSetManager.$anonfun$resourceOffer$2(TaskSetManager.scala:484) at scala.Option.map(Option.scala:230) at org.apache.spark.scheduler.TaskSetManager.resourceOffer(TaskSetManager.scala:444) at org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$2(TaskSchedulerImpl.scala:397) at org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$2$adapted(TaskSchedulerImpl.scala:392) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$1(TaskSchedulerImpl.scala:392) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) at org.apache.spark.scheduler.TaskSchedulerImpl.resourceOfferSingleTaskSet(TaskSchedulerImpl.scala:383) at org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$20(TaskSchedulerImpl.scala:581) at org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$20$adapted(TaskSchedulerImpl.scala:576) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$16(TaskSchedulerImpl.scala:576) at org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$16$adapted(TaskSchedulerImpl.scala:547) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:547) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.$anonfun$makeOffers$5(CoarseGrainedSchedulerBackend.scala:340) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.org$apache$spark$scheduler$cluster$CoarseGrainedSchedulerBackend$$withLock(CoarseGrainedSchedulerBackend.scala:904) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.org$apache$spark$scheduler$cluster$CoarseGrainedSchedulerBackend$DriverEndpoint$$makeOffers(CoarseGrainedSchedulerBackend.scala:332) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$receive$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:157) at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
[jira] [Resolved] (SPARK-35258) Enhance ESS ExternalBlockHandler with additional block rate-based metrics and histograms
[ https://issues.apache.org/jira/browse/SPARK-35258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan resolved SPARK-35258. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32388 [https://github.com/apache/spark/pull/32388] > Enhance ESS ExternalBlockHandler with additional block rate-based metrics and > histograms > > > Key: SPARK-35258 > URL: https://issues.apache.org/jira/browse/SPARK-35258 > Project: Spark > Issue Type: Improvement > Components: Shuffle, YARN >Affects Versions: 3.1.1 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 3.2.0 > > > Today the {{ExternalBlockHandler}} component of ESS exposes some useful > metrics, but is lacking around metrics for the rate of block transfers. We > have {{blockTransferRateBytes}} to tell us the rate of _bytes_, but no metric > to tell us the rate of _blocks_, which is especially relevant when running > the ESS on HDDs that are sensitive to random reads. Many small block > transfers can have a negative impact on performance, but won't show up as a > spike in {{blockTransferRateBytes}} since the sizes are small. > We can also enhance {{YarnShuffleServiceMetrics}} to expose histogram-style > metrics from the {{Timer}} instances within {{ExternalBlockHandler}} -- today > it is only exposing the count and rate, but not timing information from the > {{Snapshot}}. > These two changes can make it easier to monitor the health of the ESS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35258) Enhance ESS ExternalBlockHandler with additional block rate-based metrics and histograms
[ https://issues.apache.org/jira/browse/SPARK-35258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan reassigned SPARK-35258: --- Assignee: Erik Krogen > Enhance ESS ExternalBlockHandler with additional block rate-based metrics and > histograms > > > Key: SPARK-35258 > URL: https://issues.apache.org/jira/browse/SPARK-35258 > Project: Spark > Issue Type: Improvement > Components: Shuffle, YARN >Affects Versions: 3.1.1 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > > Today the {{ExternalBlockHandler}} component of ESS exposes some useful > metrics, but is lacking around metrics for the rate of block transfers. We > have {{blockTransferRateBytes}} to tell us the rate of _bytes_, but no metric > to tell us the rate of _blocks_, which is especially relevant when running > the ESS on HDDs that are sensitive to random reads. Many small block > transfers can have a negative impact on performance, but won't show up as a > spike in {{blockTransferRateBytes}} since the sizes are small. > We can also enhance {{YarnShuffleServiceMetrics}} to expose histogram-style > metrics from the {{Timer}} instances within {{ExternalBlockHandler}} -- today > it is only exposing the count and rate, but not timing information from the > {{Snapshot}}. > These two changes can make it easier to monitor the health of the ESS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35064) Group exception messages in spark/sql (catalyst)
[ https://issues.apache.org/jira/browse/SPARK-35064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-35064: --- Assignee: dgd_contributor > Group exception messages in spark/sql (catalyst) > > > Key: SPARK-35064 > URL: https://issues.apache.org/jira/browse/SPARK-35064 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Assignee: dgd_contributor >Priority: Major > > Group all errors in sql/catalyst/src/main/scala/org/apache/spark/sql -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35064) Group exception messages in spark/sql (catalyst)
[ https://issues.apache.org/jira/browse/SPARK-35064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-35064. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32916 [https://github.com/apache/spark/pull/32916] > Group exception messages in spark/sql (catalyst) > > > Key: SPARK-35064 > URL: https://issues.apache.org/jira/browse/SPARK-35064 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Assignee: dgd_contributor >Priority: Major > Fix For: 3.2.0 > > > Group all errors in sql/catalyst/src/main/scala/org/apache/spark/sql -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35802) Error loading the stages/stage/ page in spark UI
[ https://issues.apache.org/jira/browse/SPARK-35802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370448#comment-17370448 ] Kousuke Saruta commented on SPARK-35802: [~Heltman]According to the URL, I guess you run your application on YARN. I ran spark-shell on YARN with Spark 3.1.2 but this issue did't happen... Could you narrow down the condition to reproduce this issue? > Error loading the stages/stage/ page in spark UI > > > Key: SPARK-35802 > URL: https://issues.apache.org/jira/browse/SPARK-35802 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0, 3.0.1, 3.1.1, 3.1.2 >Reporter: Helt Long >Priority: Major > Attachments: spark3.1.2-request-20210628093538.png, > spark3.1.2-stage-20210628093549.png, spark3.1.2-webui-20210628093559.png > > > I try to load the sparkUI page for a specific stage, I get the following > error: > {quote}Unable to connect to the server. Looks like the Spark application must > have ended. Please Switch to the history UI. > {quote} > Obviously the server is still alive and process new messages. > Looking at the network tab shows one of the requests fails: > > {{curl > 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable' > > > > Error 500 Request failed. > > HTTP ERROR 500 > Problem accessing > /api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable. Reason: > Request failed. href="http://eclipse.org/jetty;>Powered by Jetty:// 9.4.z-SNAPSHOT > > }} > requests to any other object that I've tested seem to work, for example > > {{curl > 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskSummary'}} > > The exception is: > {{/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable > javax.servlet.ServletException: java.lang.NullPointerException > at > org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410) > at > org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205) > at > org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873) > at > org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623) > at > org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) > at > org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610) > at > org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) > at > org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) > at > org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) > at > org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247) > at > org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) > at > org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753) > at > org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220) > at > org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) > at org.sparkproject.jetty.server.Server.handle(Server.java:505) > at org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:370) > at > org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:267) > at > org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) > at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:103) > at > org.sparkproject.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) > at >
[jira] [Reopened] (SPARK-35802) Error loading the stages/stage/ page in spark UI
[ https://issues.apache.org/jira/browse/SPARK-35802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta reopened SPARK-35802: > Error loading the stages/stage/ page in spark UI > > > Key: SPARK-35802 > URL: https://issues.apache.org/jira/browse/SPARK-35802 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0, 3.0.1, 3.1.1, 3.1.2 >Reporter: Helt Long >Priority: Major > Attachments: spark3.1.2-request-20210628093538.png, > spark3.1.2-stage-20210628093549.png, spark3.1.2-webui-20210628093559.png > > > I try to load the sparkUI page for a specific stage, I get the following > error: > {quote}Unable to connect to the server. Looks like the Spark application must > have ended. Please Switch to the history UI. > {quote} > Obviously the server is still alive and process new messages. > Looking at the network tab shows one of the requests fails: > > {{curl > 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable' > > > > Error 500 Request failed. > > HTTP ERROR 500 > Problem accessing > /api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable. Reason: > Request failed. href="http://eclipse.org/jetty;>Powered by Jetty:// 9.4.z-SNAPSHOT > > }} > requests to any other object that I've tested seem to work, for example > > {{curl > 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskSummary'}} > > The exception is: > {{/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable > javax.servlet.ServletException: java.lang.NullPointerException > at > org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410) > at > org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205) > at > org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873) > at > org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623) > at > org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) > at > org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610) > at > org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) > at > org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) > at > org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) > at > org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247) > at > org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) > at > org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753) > at > org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220) > at > org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) > at org.sparkproject.jetty.server.Server.handle(Server.java:505) > at org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:370) > at > org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:267) > at > org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) > at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:103) > at > org.sparkproject.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) > at > org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) > at > org.sparkproject.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:698) > at >