from:"\"Shixiong Zhu \\\\\\\(JIRA\\\\\\\)\""

[jira] [Commented] (SPARK-42966) Make memory stream implement SupportsReportStatistics

2023-03-29 Thread Shixiong Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706520#comment-17706520
 ] 

Shixiong Zhu commented on SPARK-42966:
--

cc [~kabhwan] 

> Make memory stream implement SupportsReportStatistics
> -
>
> Key: SPARK-42966
> URL: https://issues.apache.org/jira/browse/SPARK-42966
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Shixiong Zhu
>Priority: Major
>
> Memory stream is pretty convenient when writing Structured Streaming tests. 
> However, it doesn't implement SupportsReportStatistics.
> Would be great to implement SupportsReportStatistics so that we can also use 
> memory stream to write tests that need to use stats.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-42966) Make memory stream implement SupportsReportStatistics

2023-03-29 Thread Shixiong Zhu (Jira)

Shixiong Zhu created SPARK-42966:


 Summary: Make memory stream implement SupportsReportStatistics
 Key: SPARK-42966
 URL: https://issues.apache.org/jira/browse/SPARK-42966
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.5.0
Reporter: Shixiong Zhu


Memory stream is pretty convenient when writing Structured Streaming tests. 
However, it doesn't implement SupportsReportStatistics.

Would be great to implement SupportsReportStatistics so that we can also use 
memory stream to write tests that need to use stats.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41885) --packages may not work on Windows 11

2023-01-04 Thread Shixiong Zhu (Jira)

Shixiong Zhu created SPARK-41885:


 Summary: --packages may not work on Windows 11
 Key: SPARK-41885
 URL: https://issues.apache.org/jira/browse/SPARK-41885
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.2.1
 Environment: Hadoop 2.7 in windows 11
Reporter: Shixiong Zhu


Gastón Ortiz reported an issue when using spark 3.2.1 and hadoop 2.7 in windows 
11. See [https://github.com/delta-io/delta/issues/1059]

Looks like executor cannot fetch the jar files. See the critical stack trace 
below (the full stack trace is in 
[https://github.com/delta-io/delta/issues/1059] ):
{code:java}
org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:366) at 
org.apache.spark.util.Utils$.doFetchFile(Utils.scala:762) at 
org.apache.spark.util.Utils$.fetchFile(Utils.scala:549) at 
org.apache.spark.executor.Executor.$anonfun$updateDependencies$13(Executor.scala:962)
 at 
org.apache.spark.executor.Executor.$anonfun$updateDependencies$13$adapted(Executor.scala:954)
 at 
scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:985)
 at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) at 
scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) at 
scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) at 
scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) at 
scala.collection.mutable.HashMap.foreach(HashMap.scala:149) at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:984) 
at 
org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:954)
 at org.apache.spark.executor.Executor.(Executor.scala:247) at  {code}
This is not a Delta Lake issue, as this can be reproduced by running `pyspark 
--packages org.apache.kafka:kafka-clients:2.8.1` as well.

I don't have a Windows 11 environment to debug. Hence I help Gastón Ortiz 
create this ticket and it would be great if anyone who has a Windows 11 
environment can help this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-41040) Self-union streaming query may fail when using readStream.table

2022-11-08 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-41040.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request https://github.com/apache/spark/pull/38553

> Self-union streaming query may fail when using readStream.table
> ---
>
> Key: SPARK-41040
> URL: https://issues.apache.org/jira/browse/SPARK-41040
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
> Fix For: 3.4.0
>
>
> In a self-union query, the batch plan created by the source will be shared by 
> multiple nodes in the plan. When we transform the plan, the batch plan will 
> be visited multiple times. Hence, the first visit will set the CatalogTable 
> and the second visit will try to set it again and fail the query at [this 
> line|https://github.com/apache/spark/blob/406d0e243cfec9b29df946e1a0e20ed5fe25e152/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L625].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-41045) Pre-compute to eliminate ScalaReflection calls after deserializer is created

2022-11-08 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-41045.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request https://github.com/apache/spark/pull/38556

> Pre-compute to eliminate ScalaReflection calls after deserializer is created
> 
>
> Key: SPARK-41045
> URL: https://issues.apache.org/jira/browse/SPARK-41045
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently when `ScalaReflection` returns a deserializer, for a few complex 
> types, such as array, map, udt, etc, it creates functions that may still 
> touch `ScalaReflection` after the deserializer is created.
> `ScalaReflection` is a performance bottleneck for multiple threads as it 
> holds multiple global locks. We can refactor 
> `ScalaReflection.deserializerFor` to pre-compute everything that needs to 
> touch `ScalaReflection` before creating the deserializer. After this, once 
> the deserializer is created, it can be reused by multiple threads without 
> touching `ScalaReflection.deserializerFor` any more and it will be much 
> faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41045) Pre-compute to eliminate ScalaReflection calls after deserializer is created

2022-11-07 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu reassigned SPARK-41045:


Assignee: Shixiong Zhu

> Pre-compute to eliminate ScalaReflection calls after deserializer is created
> 
>
> Key: SPARK-41045
> URL: https://issues.apache.org/jira/browse/SPARK-41045
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
>
> Currently when `ScalaReflection` returns a deserializer, for a few complex 
> types, such as array, map, udt, etc, it creates functions that may still 
> touch `ScalaReflection` after the deserializer is created.
> `ScalaReflection` is a performance bottleneck for multiple threads as it 
> holds multiple global locks. We can refactor 
> `ScalaReflection.deserializerFor` to pre-compute everything that needs to 
> touch `ScalaReflection` before creating the deserializer. After this, once 
> the deserializer is created, it can be reused by multiple threads without 
> touching `ScalaReflection.deserializerFor` any more and it will be much 
> faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41045) Pre-compute to eliminate ScalaReflection calls after deserializer is created

2022-11-07 Thread Shixiong Zhu (Jira)

Shixiong Zhu created SPARK-41045:


 Summary: Pre-compute to eliminate ScalaReflection calls after 
deserializer is created
 Key: SPARK-41045
 URL: https://issues.apache.org/jira/browse/SPARK-41045
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Shixiong Zhu


Currently when `ScalaReflection` returns a deserializer, for a few complex 
types, such as array, map, udt, etc, it creates functions that may still touch 
`ScalaReflection` after the deserializer is created.

`ScalaReflection` is a performance bottleneck for multiple threads as it holds 
multiple global locks. We can refactor `ScalaReflection.deserializerFor` to 
pre-compute everything that needs to touch `ScalaReflection` before creating 
the deserializer. After this, once the deserializer is created, it can be 
reused by multiple threads without touching `ScalaReflection.deserializerFor` 
any more and it will be much faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41040) Self-union streaming query may fail when using readStream.table

2022-11-07 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu reassigned SPARK-41040:


Assignee: Shixiong Zhu

> Self-union streaming query may fail when using readStream.table
> ---
>
> Key: SPARK-41040
> URL: https://issues.apache.org/jira/browse/SPARK-41040
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
>
> In a self-union query, the batch plan created by the source will be shared by 
> multiple nodes in the plan. When we transform the plan, the batch plan will 
> be visited multiple times. Hence, the first visit will set the CatalogTable 
> and the second visit will try to set it again and fail the query at [this 
> line|https://github.com/apache/spark/blob/406d0e243cfec9b29df946e1a0e20ed5fe25e152/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L625].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41040) Self-union streaming query may fail when using readStream.table

2022-11-07 Thread Shixiong Zhu (Jira)

Shixiong Zhu created SPARK-41040:


 Summary: Self-union streaming query may fail when using 
readStream.table
 Key: SPARK-41040
 URL: https://issues.apache.org/jira/browse/SPARK-41040
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 3.4.0
Reporter: Shixiong Zhu


In a self-union query, the batch plan created by the source will be shared by 
multiple nodes in the plan. When we transform the plan, the batch plan will be 
visited multiple times. Hence, the first visit will set the CatalogTable and 
the second visit will try to set it again and fail the query at [this 
line|https://github.com/apache/spark/blob/406d0e243cfec9b29df946e1a0e20ed5fe25e152/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L625].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40937) Support programming APIs for update/delete/merge

2022-10-27 Thread Shixiong Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17625249#comment-17625249
 ] 

Shixiong Zhu commented on SPARK-40937:
--

cc [~cloud_fan] 

> Support programming APIs for update/delete/merge
> 
>
> Key: SPARK-40937
> URL: https://issues.apache.org/jira/browse/SPARK-40937
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Shixiong Zhu
>Priority: Major
>
> Today Spark supports update/delete/merge commands in SQL. It would be great 
> to add programming APIs (Scala, Java, Python etc) so that users can use these 
> commands directly rather than manually creating SQL statements in their non 
> SQL applications.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40937) Support programming APIs for update/delete/merge

2022-10-27 Thread Shixiong Zhu (Jira)

Shixiong Zhu created SPARK-40937:


 Summary: Support programming APIs for update/delete/merge
 Key: SPARK-40937
 URL: https://issues.apache.org/jira/browse/SPARK-40937
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.4.0
Reporter: Shixiong Zhu


Today Spark supports update/delete/merge commands in SQL. It would be great to 
add programming APIs (Scala, Java, Python etc) so that users can use these 
commands directly rather than manually creating SQL statements in their non SQL 
applications.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-39915) Dataset.repartition(N) may not create N partitions

2022-08-20 Thread Shixiong Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-39915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17582457#comment-17582457
 ] 

Shixiong Zhu commented on SPARK-39915:
--

Yeah. I would consider this is a bug since the doc of `repartition` explicitly 
says 

{code:java}
Returns a new Dataset that has exactly `numPartitions` partitions.
{code}


> Dataset.repartition(N) may not create N partitions
> --
>
> Key: SPARK-39915
> URL: https://issues.apache.org/jira/browse/SPARK-39915
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Shixiong Zhu
>Priority: Major
>
> Looks like there is a behavior change in Dataset.repartition in 3.3.0. For 
> example, `spark.range(10, 0).repartition(5).rdd.getNumPartitions` returns 5 
> in Spark 3.2.0, but 0 in Spark 3.3.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-39915) Dataset.repartition(N) may not create N partitions

2022-08-06 Thread Shixiong Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-39915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576319#comment-17576319
 ] 

Shixiong Zhu commented on SPARK-39915:
--


{code:java}
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.3.0
  /_/
 
Using Scala version 2.12.15 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_171)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.range(10, 0).repartition(5).rdd.getNumPartitions
res0: Int = 0
{code}


> Dataset.repartition(N) may not create N partitions
> --
>
> Key: SPARK-39915
> URL: https://issues.apache.org/jira/browse/SPARK-39915
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Shixiong Zhu
>Priority: Major
>
> Looks like there is a behavior change in Dataset.repartition in 3.3.0. For 
> example, `spark.range(10, 0).repartition(5).rdd.getNumPartitions` returns 5 
> in Spark 3.2.0, but 0 in Spark 3.3.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-39915) Dataset.repartition(N) may not create N partitions

2022-07-28 Thread Shixiong Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-39915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572685#comment-17572685
 ] 

Shixiong Zhu commented on SPARK-39915:
--

cc [~cloud_fan] 

> Dataset.repartition(N) may not create N partitions
> --
>
> Key: SPARK-39915
> URL: https://issues.apache.org/jira/browse/SPARK-39915
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Shixiong Zhu
>Priority: Major
>
> Looks like there is a behavior change in Dataset.repartition in 3.3.0. For 
> example, `spark.range(10, 0).repartition(5).rdd.getNumPartitions` returns 5 
> in Spark 3.2.0, but 0 in Spark 3.3.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-39915) Dataset.repartition(N) may not create N partitions

2022-07-28 Thread Shixiong Zhu (Jira)

Shixiong Zhu created SPARK-39915:


 Summary: Dataset.repartition(N) may not create N partitions
 Key: SPARK-39915
 URL: https://issues.apache.org/jira/browse/SPARK-39915
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.0
Reporter: Shixiong Zhu


Looks like there is a behavior change in Dataset.repartition in 3.3.0. For 
example, `spark.range(10, 0).repartition(5).rdd.getNumPartitions` returns 5 in 
Spark 3.2.0, but 0 in Spark 3.3.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34978) Support time-travel SQL syntax

2022-06-30 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-34978.
--
Resolution: Duplicate

This has been added by SPARK-37219

> Support time-travel SQL syntax
> --
>
> Key: SPARK-34978
> URL: https://issues.apache.org/jira/browse/SPARK-34978
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Yann Byron
>Priority: Major
>
> Some DataSource have the ability to query the older snapshot, like Delta.
> The syntax may be like `TIMESTAMP AS OF '
> 2018-10-18 22:15:12' ` to query the snapshot closest to this time point, or 
> `VERSION AS OF 12` to query the 12th snapshot.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38080) Flaky test: StreamingQueryManagerSuite: 'awaitAnyTermination with timeout and resetTerminated'

2022-01-31 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-38080:
-
Description: 
{code:java}
[info] - awaitAnyTermination with timeout and resetTerminated *** FAILED *** 
(14 seconds, 304 milliseconds)
[info]   Did not throw SparkException when expected. Expected exception 
org.apache.spark.sql.streaming.StreamingQueryException to be thrown, but 
org.scalatest.exceptions.TestFailedException was thrown (StreamTest.scala:935)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
[info]   at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
[info]   at 
org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1563)
[info]   at org.scalatest.Assertions.intercept(Assertions.scala:756)
[info]   at org.scalatest.Assertions.intercept$(Assertions.scala:746)
[info]   at org.scalatest.funsuite.AnyFunSuite.intercept(AnyFunSuite.scala:1563)
[info]   at 
org.apache.spark.sql.streaming.StreamTest$AwaitTerminationTester$.$anonfun$test$4(StreamTest.scala:935)
[info]   at org.scalatest.Assertions.withClue(Assertions.scala:1065)
[info]   at org.scalatest.Assertions.withClue$(Assertions.scala:1052)
[info]   at org.scalatest.funsuite.AnyFunSuite.withClue(AnyFunSuite.scala:1563)
[info]   at 
org.apache.spark.sql.streaming.StreamTest$AwaitTerminationTester$.test(StreamTest.scala:935)
[info]   at 
org.apache.spark.sql.streaming.StreamingQueryManagerSuite.testAwaitAnyTermination(StreamingQueryManagerSuite.scala:445)
[info]   at 
org.apache.spark.sql.streaming.StreamingQueryManagerSuite.$anonfun$new$10(StreamingQueryManagerSuite.scala:221)
[info]   at 
org.apache.spark.sql.streaming.StreamingQueryManagerSuite.$anonfun$new$10$adapted(StreamingQueryManagerSuite.scala:140)
[info]   at 
org.apache.spark.sql.streaming.StreamingQueryManagerSuite.$anonfun$withQueriesOn$1(StreamingQueryManagerSuite.scala:421)
[info]   at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[info]   at org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127)
[info]   at 
org.scalatest.concurrent.TimeLimits.failAfterImpl(TimeLimits.scala:239)
[info]   at 
org.scalatest.concurrent.TimeLimits.failAfterImpl$(TimeLimits.scala:233)
[info]   at 
org.apache.spark.sql.streaming.StreamingQueryManagerSuite.failAfterImpl(StreamingQueryManagerSuite.scala:39)
[info]   at org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:230)
[info]   at org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:229)
[info]   at 
org.apache.spark.sql.streaming.StreamingQueryManagerSuite.failAfter(StreamingQueryManagerSuite.scala:39)
[info]   at 
org.apache.spark.sql.streaming.StreamingQueryManagerSuite.withQueriesOn(StreamingQueryManagerSuite.scala:397)
[info]   at 
org.apache.spark.sql.streaming.StreamingQueryManagerSuite.$anonfun$new$8(StreamingQueryManagerSuite.scala:140)
[info]   at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[info]   at 
org.apache.spark.sql.catalyst.util.package$.quietly(package.scala:42)
[info]   at 
org.apache.spark.sql.test.SQLTestUtils.$anonfun$testQuietly$1(SQLTestUtils.scala:115)
[info]   at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:190)
[info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:203)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:188)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:200)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:200)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:182)
[info]   at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:64)
[info]   at 
org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
[info]   at 
org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
[info]   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:64)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:233)
[info]   at 
org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
[info]   at scala.collection.immutable.List.foreach(List.scala:431)
[info]   at org.scalatest.S

[jira] [Updated] (SPARK-38080) Flaky test: StreamingQueryManagerSuite: 'awaitAnyTermination with timeout and resetTerminated'

2022-01-31 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-38080:
-
Description: 
{code:java}
[info] - awaitAnyTermination with timeout and resetTerminated *** FAILED *** 
(14 seconds, 893 milliseconds)
[info]   Did not throw SparkException when expected. Expected exception 
org.apache.spark.sql.streaming.StreamingQueryException to be thrown, but no 
exception was thrown (StreamTest.scala:935)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
[info]   at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
[info]   at 
org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1563)
[info]   at org.scalatest.Assertions.intercept(Assertions.scala:766)
[info]   at org.scalatest.Assertions.intercept$(Assertions.scala:746)
[info]   at org.scalatest.funsuite.AnyFunSuite.intercept(AnyFunSuite.scala:1563)
[info]   at 
org.apache.spark.sql.streaming.StreamTest$AwaitTerminationTester$.$anonfun$test$4(StreamTest.scala:935)
[info]   at org.scalatest.Assertions.withClue(Assertions.scala:1065)
[info]   at org.scalatest.Assertions.withClue$(Assertions.scala:1052)
[info]   at org.scalatest.funsuite.AnyFunSuite.withClue(AnyFunSuite.scala:1563)
[info]   at 
org.apache.spark.sql.streaming.StreamTest$AwaitTerminationTester$.test(StreamTest.scala:935)
[info]   at 
org.apache.spark.sql.streaming.StreamingQueryManagerSuite.testAwaitAnyTermination(StreamingQueryManagerSuite.scala:445)
[info]   at 
org.apache.spark.sql.streaming.StreamingQueryManagerSuite.$anonfun$new$10(StreamingQueryManagerSuite.scala:207)
[info]   at 
org.apache.spark.sql.streaming.StreamingQueryManagerSuite.$anonfun$new$10$adapted(StreamingQueryManagerSuite.scala:140)
[info]   at 
org.apache.spark.sql.streaming.StreamingQueryManagerSuite.$anonfun$withQueriesOn$1(StreamingQueryManagerSuite.scala:421)
[info]   at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[info]   at org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127)
[info]   at 
org.scalatest.concurrent.TimeLimits.failAfterImpl(TimeLimits.scala:239)
[info]   at 
org.scalatest.concurrent.TimeLimits.failAfterImpl$(TimeLimits.scala:233)
[info]   at 
org.apache.spark.sql.streaming.StreamingQueryManagerSuite.failAfterImpl(StreamingQueryManagerSuite.scala:39)
[info]   at org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:230)
[info]   at org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:229)
[info]   at 
org.apache.spark.sql.streaming.StreamingQueryManagerSuite.failAfter(StreamingQueryManagerSuite.scala:39)
[info]   at 
org.apache.spark.sql.streaming.StreamingQueryManagerSuite.withQueriesOn(StreamingQueryManagerSuite.scala:397)
[info]   at 
org.apache.spark.sql.streaming.StreamingQueryManagerSuite.$anonfun$new$8(StreamingQueryManagerSuite.scala:140)
[info]   at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[info]   at 
org.apache.spark.sql.catalyst.util.package$.quietly(package.scala:42)
[info]   at 
org.apache.spark.sql.test.SQLTestUtils.$anonfun$testQuietly$1(SQLTestUtils.scala:115)
[info]   at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:190)
[info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:203)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:188)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:200)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:200)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:182)
[info]   at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:64)
[info]   at 
org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
[info]   at 
org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
[info]   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:64)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:233)
[info]   at 
org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
[info]   at scala.collection.immutable.List.foreach(List.scala:431)
[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(En

[jira] [Created] (SPARK-38080) Flaky test: StreamingQueryManagerSuite: 'awaitAnyTermination with timeout and resetTerminated'

2022-01-31 Thread Shixiong Zhu (Jira)

Shixiong Zhu created SPARK-38080:


 Summary: Flaky test: StreamingQueryManagerSuite: 
'awaitAnyTermination with timeout and resetTerminated'
 Key: SPARK-38080
 URL: https://issues.apache.org/jira/browse/SPARK-38080
 Project: Spark
  Issue Type: Test
  Components: Structured Streaming, Tests
Affects Versions: 3.2.1
Reporter: Shixiong Zhu


{code:java}
[info] - awaitAnyTermination with timeout and resetTerminated *** FAILED *** 
(18 seconds, 333 milliseconds)
[info]   Did not throw SparkException when expected. Expected exception 
org.apache.spark.sql.streaming.StreamingQueryException to be thrown, but no 
exception was thrown (StreamTest.scala:935)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
[info]   at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
[info]   at 
org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1563)
[info]   at org.scalatest.Assertions.intercept(Assertions.scala:766)
[info]   at org.scalatest.Assertions.intercept$(Assertions.scala:746)
[info]   at org.scalatest.funsuite.AnyFunSuite.intercept(AnyFunSuite.scala:1563)
[info]   at 
org.apache.spark.sql.streaming.StreamTest$AwaitTerminationTester$.$anonfun$test$4(StreamTest.scala:935)
[info]   at org.scalatest.Assertions.withClue(Assertions.scala:1065)
[info]   at org.scalatest.Assertions.withClue$(Assertions.scala:1052)
[info]   at org.scalatest.funsuite.AnyFunSuite.withClue(AnyFunSuite.scala:1563)
[info]   at 
org.apache.spark.sql.streaming.StreamTest$AwaitTerminationTester$.test(StreamTest.scala:935)
[info]   at 
org.apache.spark.sql.streaming.StreamingQueryManagerSuite.testAwaitAnyTermination(StreamingQueryManagerSuite.scala:445)
[info]   at 
org.apache.spark.sql.streaming.StreamingQueryManagerSuite.$anonfun$new$10(StreamingQueryManagerSuite.scala:207)
[info]   at 
org.apache.spark.sql.streaming.StreamingQueryManagerSuite.$anonfun$new$10$adapted(StreamingQueryManagerSuite.scala:140)
[info]   at 
org.apache.spark.sql.streaming.StreamingQueryManagerSuite.$anonfun$withQueriesOn$1(StreamingQueryManagerSuite.scala:421)
[info]   at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[info]   at org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127)
[info]   at 
org.scalatest.concurrent.TimeLimits.failAfterImpl(TimeLimits.scala:239)
[info]   at 
org.scalatest.concurrent.TimeLimits.failAfterImpl$(TimeLimits.scala:233)
[info]   at 
org.apache.spark.sql.streaming.StreamingQueryManagerSuite.failAfterImpl(StreamingQueryManagerSuite.scala:39)
[info]   at org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:230)
[info]   at org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:229)
[info]   at 
org.apache.spark.sql.streaming.StreamingQueryManagerSuite.failAfter(StreamingQueryManagerSuite.scala:39)
[info]   at 
org.apache.spark.sql.streaming.StreamingQueryManagerSuite.withQueriesOn(StreamingQueryManagerSuite.scala:397)
[info]   at 
org.apache.spark.sql.streaming.StreamingQueryManagerSuite.$anonfun$new$8(StreamingQueryManagerSuite.scala:140)
[info]   at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[info]   at 
org.apache.spark.sql.catalyst.util.package$.quietly(package.scala:42)
[info]   at 
org.apache.spark.sql.test.SQLTestUtils.$anonfun$testQuietly$1(SQLTestUtils.scala:115)
[info]   at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:190)
[info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:203)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:188)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:200)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:200)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:182)
[info]   at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:64)
[info]   at 
org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
[info]   at 
org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
[info]   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:64)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$

[jira] [Assigned] (SPARK-37646) Avoid touching Scala reflection APIs in the lit function

2021-12-14 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu reassigned SPARK-37646:


Assignee: Shixiong Zhu

> Avoid touching Scala reflection APIs in the lit function
> 
>
> Key: SPARK-37646
> URL: https://issues.apache.org/jira/browse/SPARK-37646
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
>
> Currently lit is slow when the concurrency is high as it needs to hit the 
> Scala reflection code which hits global locks. For example, running the 
> following test locally using Spark 3.2 shows the difference:
> {code:java}
> scala> :paste
> // Entering paste mode (ctrl-D to finish)import 
> org.apache.spark.sql.functions._
> import org.apache.spark.sql.Column
> import org.apache.spark.sql.catalyst.expressions.Literalval parallelism = 
> 50def testLiteral(): Unit = {
>   val ts = for (_ <- 0 until parallelism) yield {
>     new Thread() {
>       override def run() {
>          for (_ <- 0 until 50) {
>           new Column(Literal(0L))
>         }
>       }
>     }
>   }
>   ts.foreach(_.start())
>   ts.foreach(_.join())
> }def testLit(): Unit = {
>   val ts = for (_ <- 0 until parallelism) yield {
>     new Thread() {
>       override def run() {
>          for (_ <- 0 until 50) {
>           lit(0L)
>         }
>       }
>     }
>   }
>   ts.foreach(_.start())
>   ts.foreach(_.join())
> }println("warmup")
> testLiteral()
> testLit()println("lit: false")
> spark.time {
>   testLiteral()
> }
> println("lit: true")
> spark.time {
>   testLit()
> }// Exiting paste mode, now interpreting.warmup
> lit: false
> Time taken: 8 ms
> lit: true
> Time taken: 682 ms
> import org.apache.spark.sql.functions._
> import org.apache.spark.sql.Column
> import org.apache.spark.sql.catalyst.expressions.Literal
> parallelism: Int = 50
> testLiteral: ()Unit
> testLit: ()Unit {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37646) Avoid touching Scala reflection APIs in the lit function

2021-12-14 Thread Shixiong Zhu (Jira)

Shixiong Zhu created SPARK-37646:


 Summary: Avoid touching Scala reflection APIs in the lit function
 Key: SPARK-37646
 URL: https://issues.apache.org/jira/browse/SPARK-37646
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Shixiong Zhu


Currently lit is slow when the concurrency is high as it needs to hit the Scala 
reflection code which hits global locks. For example, running the following 
test locally using Spark 3.2 shows the difference:
{code:java}
scala> :paste
// Entering paste mode (ctrl-D to finish)import org.apache.spark.sql.functions._
import org.apache.spark.sql.Column
import org.apache.spark.sql.catalyst.expressions.Literalval parallelism = 50def 
testLiteral(): Unit = {
  val ts = for (_ <- 0 until parallelism) yield {
    new Thread() {
      override def run() {
         for (_ <- 0 until 50) {
          new Column(Literal(0L))
        }
      }
    }
  }
  ts.foreach(_.start())
  ts.foreach(_.join())
}def testLit(): Unit = {
  val ts = for (_ <- 0 until parallelism) yield {
    new Thread() {
      override def run() {
         for (_ <- 0 until 50) {
          lit(0L)
        }
      }
    }
  }
  ts.foreach(_.start())
  ts.foreach(_.join())
}println("warmup")
testLiteral()
testLit()println("lit: false")
spark.time {
  testLiteral()
}
println("lit: true")
spark.time {
  testLit()
}// Exiting paste mode, now interpreting.warmup
lit: false
Time taken: 8 ms
lit: true
Time taken: 682 ms
import org.apache.spark.sql.functions._
import org.apache.spark.sql.Column
import org.apache.spark.sql.catalyst.expressions.Literal
parallelism: Int = 50
testLiteral: ()Unit
testLit: ()Unit {code}
 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36519) Store the RocksDB format in the checkpoint for a streaming query

2021-08-16 Thread Shixiong Zhu (Jira)

Shixiong Zhu created SPARK-36519:


 Summary: Store the RocksDB format in the checkpoint for a 
streaming query
 Key: SPARK-36519
 URL: https://issues.apache.org/jira/browse/SPARK-36519
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.2.0
Reporter: Shixiong Zhu
Assignee: Shixiong Zhu


RocksDB provides backward compatibility but it doesn't always provide forward 
compatibility. It's better to store the RocksDB format version in the 
checkpoint so that it would give us more information to provide the rollback 
guarantee when we upgrade the RocksDB version that may introduce incompatible 
change in a new Spark version.

A typical case is when a user upgrades their query to a new Spark version, and 
this new Spark version has a new RocksDB version which may use a new format. 
But the user hits some bug and decide to rollback. But in the old Spark 
version, the old RocksDB version cannot read the new format.

In order to handle this case, we will write the RocksDB format version to the 
checkpoint. When restarting from a checkpoint, we will force RocksDB to use the 
format version stored in the checkpoint. This will ensure the user can rollback 
their Spark version if needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31923) Event log cannot be generated when some internal accumulators use unexpected types

2021-04-12 Thread Shixiong Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17319631#comment-17319631
 ] 

Shixiong Zhu commented on SPARK-31923:
--

Do you have a reproduction? It's weird to see 
`java.util.Collections$SynchronizedSet cannot be cast to java.util.List` since 
before calling `v.asScala.toList`, the pattern match `case v: 
java.util.List[_]` should not accept `java.util.Collections$SynchronizedSet`.

> Event log cannot be generated when some internal accumulators use unexpected 
> types
> --
>
> Key: SPARK-31923
> URL: https://issues.apache.org/jira/browse/SPARK-31923
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 
> 2.4.3, 2.4.4, 2.4.5, 2.4.6
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
> Fix For: 2.4.7, 3.0.1, 3.1.0
>
>
> A user may use internal accumulators by adding the "internal.metrics." prefix 
> to the accumulator name to hide sensitive information from UI (Accumulators 
> except internal ones will be shown in Spark UI).
> However, *org.apache.spark.util.JsonProtocol.accumValueToJson* assumes an 
> internal accumulator has only 3 possible types: int, long, and 
> java.util.List[(BlockId, BlockStatus)]. When an internal accumulator uses an 
> unexpected type, it will crash.
> An event log that contains such accumulator will be dropped because it cannot 
> be converted to JSON, and it will cause weird UI issue when rendering in 
> Spark History Server. For example, if `SparkListenerTaskEnd` is dropped 
> because of this issue, the user will see the task is still running even if it 
> was finished.
> It's better to make *accumValueToJson* more robust.
> 
> How to reproduce it:
> - Enable Spark event log
> - Run the following command:
> {code}
> scala> val accu = sc.doubleAccumulator("internal.metrics.foo")
> accu: org.apache.spark.util.DoubleAccumulator = DoubleAccumulator(id: 0, 
> name: Some(internal.metrics.foo), value: 0.0)
> scala> sc.parallelize(1 to 1, 1).foreach { _ => accu.add(1.0) }
> 20/06/06 16:11:27 ERROR AsyncEventQueue: Listener EventLoggingListener threw 
> an exception
> java.lang.ClassCastException: java.lang.Double cannot be cast to 
> java.util.List
>   at 
> org.apache.spark.util.JsonProtocol$.accumValueToJson(JsonProtocol.scala:330)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.util.JsonProtocol$.accumulableInfoToJson(JsonProtocol.scala:306)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
>   at scala.collection.immutable.List.map(List.scala:284)
>   at 
> org.apache.spark.util.JsonProtocol$.accumulablesToJson(JsonProtocol.scala:299)
>   at 
> org.apache.spark.util.JsonProtocol$.taskInfoToJson(JsonProtocol.scala:291)
>   at 
> org.apache.spark.util.JsonProtocol$.taskEndToJson(JsonProtocol.scala:145)
>   at 
> org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:76)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:138)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.onTaskEnd(EventLoggingListener.scala:158)
>   at 
> org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:45)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at 
> org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEvent

[jira] [Commented] (SPARK-19256) Hive bucketing write support

2021-03-12 Thread Shixiong Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-19256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17300594#comment-17300594
 ] 

Shixiong Zhu commented on SPARK-19256:
--

[~chengsu] Thanks for the answers. Sounds like Metastore is a requirement to 
distinguish two formats. If I use Spark to write to a table using the table 
path directly, it will break the table. Right?

> Hive bucketing write support
> 
>
> Key: SPARK-19256
> URL: https://issues.apache.org/jira/browse/SPARK-19256
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.0, 2.3.0, 2.4.0, 3.0.0, 3.1.0
>Reporter: Tejas Patil
>Priority: Minor
>
> Update (2020 by Cheng Su):
> We use this JIRA to track progress for Hive bucketing write support in Spark. 
> The goal is for Spark to write Hive bucketed table, to be compatible with 
> other compute engines (Hive and Presto).
>  
> Current status for Hive bucketed table in Spark:
> Not support for reading Hive bucketed table: read bucketed table as 
> non-bucketed table.
> Wrong behavior for writing Hive ORC and Parquet bucketed table: write 
> orc/parquet bucketed table as non-bucketed table (code path: 
> InsertIntoHadoopFsRelationCommand -> FileFormatWriter).
> Do not allow for writing Hive non-ORC/Parquet bucketed table: throw exception 
> by default if writing non-orc/parquet bucketed table (code path: 
> InsertIntoHiveTable), and exception can be disabled by setting config 
> `hive.enforce.bucketing`=false and `hive.enforce.sorting`=false, which will 
> write as non-bucketed table.
>  
> Current status for Hive bucketed table in Hive:
> Hive 3.0.0 and after: support writing bucketed table with Hive murmur3hash 
> (https://issues.apache.org/jira/browse/HIVE-18910).
> Hive 1.x.y and 2.x.y: support writing bucketed table with Hive hivehash.
> Hive on Tez: support zero and multiple files per bucket 
> (https://issues.apache.org/jira/browse/HIVE-14014). And more code pointer on 
> read path - 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java#L183-L212]
>  .
>  
> Current status for Hive bucketed table in Presto (take presto-sql here):
> Support writing bucketed table with Hive murmur3hash and hivehash 
> ([https://github.com/prestosql/presto/pull/1697]).
> Support zero and multiple files per bucket 
> ([https://github.com/prestosql/presto/pull/822]).
>  
> TLDR is to achieve Hive bucketed table compatibility across Spark, Presto and 
> Hive. Here with this JIRA, we need to add support writing Hive bucketed table 
> with Hive murmur3hash (for Hive 3.x.y) and hivehash (for Hive 1.x.y and 
> 2.x.y).
>  
> To allow Spark efficiently read Hive bucketed table, this needs more radical 
> change and we decide to wait until data source v2 supports bucketing, and do 
> the read path on data source v2. Read path will not covered by this JIRA.
>  
> Original description (2017 by Tejas Patil):
> JIRA to track design discussions and tasks related to Hive bucketing support 
> in Spark.
> Proposal : 
> [https://docs.google.com/document/d/1a8IDh23RAkrkg9YYAeO51F4aGO8-xAlupKwdshve2fc/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19256) Hive bucketing write support

2021-03-12 Thread Shixiong Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-19256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17300578#comment-17300578
 ] 

Shixiong Zhu commented on SPARK-19256:
--

I read 
[https://docs.google.com/document/d/1a8IDh23RAkrkg9YYAeO51F4aGO8-xAlupKwdshve2fc/edit?usp=sharing]
 but didn't find any section about compatibility. I'm curious about both the 
forward and backward compatibility. For example, 
 * What happens if using an old Spark version to read this new format? Does it 
treat it as a non-bucketed table, or read it a a bucketed table but may return 
different results?
 * How does Spark know which bucket format when reading an existing table, and 
how does it decide whether to use the old format or the new format to read and 
write?

> Hive bucketing write support
> 
>
> Key: SPARK-19256
> URL: https://issues.apache.org/jira/browse/SPARK-19256
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.0, 2.3.0, 2.4.0, 3.0.0, 3.1.0
>Reporter: Tejas Patil
>Priority: Minor
>
> Update (2020 by Cheng Su):
> We use this JIRA to track progress for Hive bucketing write support in Spark. 
> The goal is for Spark to write Hive bucketed table, to be compatible with 
> other compute engines (Hive and Presto).
>  
> Current status for Hive bucketed table in Spark:
> Not support for reading Hive bucketed table: read bucketed table as 
> non-bucketed table.
> Wrong behavior for writing Hive ORC and Parquet bucketed table: write 
> orc/parquet bucketed table as non-bucketed table (code path: 
> InsertIntoHadoopFsRelationCommand -> FileFormatWriter).
> Do not allow for writing Hive non-ORC/Parquet bucketed table: throw exception 
> by default if writing non-orc/parquet bucketed table (code path: 
> InsertIntoHiveTable), and exception can be disabled by setting config 
> `hive.enforce.bucketing`=false and `hive.enforce.sorting`=false, which will 
> write as non-bucketed table.
>  
> Current status for Hive bucketed table in Hive:
> Hive 3.0.0 and after: support writing bucketed table with Hive murmur3hash 
> (https://issues.apache.org/jira/browse/HIVE-18910).
> Hive 1.x.y and 2.x.y: support writing bucketed table with Hive hivehash.
> Hive on Tez: support zero and multiple files per bucket 
> (https://issues.apache.org/jira/browse/HIVE-14014). And more code pointer on 
> read path - 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java#L183-L212]
>  .
>  
> Current status for Hive bucketed table in Presto (take presto-sql here):
> Support writing bucketed table with Hive murmur3hash and hivehash 
> ([https://github.com/prestosql/presto/pull/1697]).
> Support zero and multiple files per bucket 
> ([https://github.com/prestosql/presto/pull/822]).
>  
> TLDR is to achieve Hive bucketed table compatibility across Spark, Presto and 
> Hive. Here with this JIRA, we need to add support writing Hive bucketed table 
> with Hive murmur3hash (for Hive 3.x.y) and hivehash (for Hive 1.x.y and 
> 2.x.y).
>  
> To allow Spark efficiently read Hive bucketed table, this needs more radical 
> change and we decide to wait until data source v2 supports bucketing, and do 
> the read path on data source v2. Read path will not covered by this JIRA.
>  
> Original description (2017 by Tejas Patil):
> JIRA to track design discussions and tasks related to Hive bucketing support 
> in Spark.
> Proposal : 
> [https://docs.google.com/document/d/1a8IDh23RAkrkg9YYAeO51F4aGO8-xAlupKwdshve2fc/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34626) UnresolvedAttribute.sql may return an incorrect sql

2021-03-04 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-34626:
-
Description: 
UnresolvedAttribute("a" :: "b" :: Nil).sql

returns
{code:java}
`a.b`{code}
The correct one should be either a.b or
{code:java}
`a`.`b`{code}
 

  was:
UnresolvedAttribute("a" :: "b" :: Nil).sql returns

 

{{`a.b`}}

The correct one should be either a.b or

 

{{`a`.`b`}}

 


> UnresolvedAttribute.sql may return an incorrect sql
> ---
>
> Key: SPARK-34626
> URL: https://issues.apache.org/jira/browse/SPARK-34626
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.2, 3.1.1
>Reporter: Shixiong Zhu
>Priority: Major
>
> UnresolvedAttribute("a" :: "b" :: Nil).sql
> returns
> {code:java}
> `a.b`{code}
> The correct one should be either a.b or
> {code:java}
> `a`.`b`{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34626) UnresolvedAttribute.sql may return an incorrect sql

2021-03-04 Thread Shixiong Zhu (Jira)

Shixiong Zhu created SPARK-34626:


 Summary: UnresolvedAttribute.sql may return an incorrect sql
 Key: SPARK-34626
 URL: https://issues.apache.org/jira/browse/SPARK-34626
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.1, 3.0.2
Reporter: Shixiong Zhu


UnresolvedAttribute("a" :: "b" :: Nil).sql returns

 

{{`a.b`}}

The correct one should be either a.b or

 

{{`a`.`b`}}

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34599) INSERT INTO OVERWRITE doesn't support partition columns containing dot for DSv2

2021-03-02 Thread Shixiong Zhu (Jira)

Shixiong Zhu created SPARK-34599:


 Summary: INSERT INTO OVERWRITE doesn't support partition columns 
containing dot for DSv2
 Key: SPARK-34599
 URL: https://issues.apache.org/jira/browse/SPARK-34599
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.1, 3.0.2
Reporter: Shixiong Zhu


ResolveInsertInto.staticDeleteExpression is using an incorrect method to create 
UnresolvedAttribute, which makes it not support partition columns containing 
dot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34556) Checking duplicate static partition columns doesn't respect case sensitive conf

2021-02-26 Thread Shixiong Zhu (Jira)

Shixiong Zhu created SPARK-34556:


 Summary: Checking duplicate static partition columns doesn't 
respect case sensitive conf
 Key: SPARK-34556
 URL: https://issues.apache.org/jira/browse/SPARK-34556
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.2, 2.4.7, 3.1.1
Reporter: Shixiong Zhu


When parsing the partition spec, Spark will call 
`org.apache.spark.sql.catalyst.parser.ParserUtils.checkDuplicateKeys` to check 
if there are duplicate partition column names in the list. But this method is 
always case sensitive and doesn't discover duplicate partition column names 
when using different cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34034) "show create table" doesn't work for v2 table

2021-01-06 Thread Shixiong Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260095#comment-17260095
 ] 

Shixiong Zhu commented on SPARK-34034:
--

NVM. I just realized `show create table` didn't return the correct create table 
command for v2 table in Spark 3.0.1 (missing the table schema). Yep, it makes 
sense to disallow it since it returns an incorrect answer.

> "show create table" doesn't work for v2 table
> -
>
> Key: SPARK-34034
> URL: https://issues.apache.org/jira/browse/SPARK-34034
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Shixiong Zhu
>Priority: Blocker
>  Labels: regression
>
> I was QAing Spark 3.1.0 RC1 and found one regression: "show create table" 
> doesn't work for v2 table.
> But when using Spark 3.0.1, "show create table" works for v2 table.
> Steps to test:
> {code:java}
> /bin/spark-shell --packages io.delta:delta-core_2.12:0.7.0 --conf 
> "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf 
> "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
> scala> spark.sql("create table foo(i INT) using delta")
>  res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("show create table foo").show(false)
> +---+
> |createtab_stmt                                 |
> +---+
> |CREATE TABLE `default`.`foo` (
>   )
> USING delta
> |
> +---+
> {code}
> Looks like it's caused by [https://github.com/apache/spark/pull/30321]
>  which blocks "show create table" for v2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34034) "show create table" doesn't work for v2 table

2021-01-06 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-34034.
--
Resolution: Not A Problem

> "show create table" doesn't work for v2 table
> -
>
> Key: SPARK-34034
> URL: https://issues.apache.org/jira/browse/SPARK-34034
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Shixiong Zhu
>Priority: Blocker
>  Labels: regression
>
> I was QAing Spark 3.1.0 RC1 and found one regression: "show create table" 
> doesn't work for v2 table.
> But when using Spark 3.0.1, "show create table" works for v2 table.
> Steps to test:
> {code:java}
> /bin/spark-shell --packages io.delta:delta-core_2.12:0.7.0 --conf 
> "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf 
> "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
> scala> spark.sql("create table foo(i INT) using delta")
>  res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("show create table foo").show(false)
> +---+
> |createtab_stmt                                 |
> +---+
> |CREATE TABLE `default`.`foo` (
>   )
> USING delta
> |
> +---+
> {code}
> Looks like it's caused by [https://github.com/apache/spark/pull/30321]
>  which blocks "show create table" for v2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34034) "show create table" doesn't work for v2 table

2021-01-06 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-34034:
-
Description: 
I was QAing Spark 3.1.0 RC1 and found one regression: "show create table" 
doesn't work for v2 table.

But when using Spark 3.0.1, "show create table" works for v2 table.

Steps to test:
{code:java}
/bin/spark-shell --packages io.delta:delta-core_2.12:0.7.0 --conf 
"spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf 
"spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"

scala> spark.sql("create table foo(i INT) using delta")
 res0: org.apache.spark.sql.DataFrame = []

scala> spark.sql("show create table foo").show(false)

+---+
|createtab_stmt                                 |
+---+
|CREATE TABLE `default`.`foo` (
  )
USING delta
|
+---+
{code}
Looks like it's caused by [https://github.com/apache/spark/pull/30321]

 which blocks "show create table" for v2.

  was:
I was QAing Spark 3.1.0 RC1 and found one regression: "show create table" 
doesn't work for v2 table.

But when using Spark 3.0.1, "show create table" works for v2 table.

Steps to test:
{code:java}
/bin/spark-shell --packages io.delta:delta-core_2.12:0.7.0 --conf 
"spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf 
"spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"

scala> spark.sql("create table foo(i INT) using delta")
 res0: org.apache.spark.sql.DataFrame = []

scala> spark.sql("show create table foo").show(false)

+---+
|createtab_stmt                                 |
+---+
|CREATE TABLE `default`.`foo` (
  )
USING delta
|
+---+
{code}
Looks like it's caused by [https://github.com/apache/spark/pull/30321]

 


> "show create table" doesn't work for v2 table
> -
>
> Key: SPARK-34034
> URL: https://issues.apache.org/jira/browse/SPARK-34034
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Shixiong Zhu
>Priority: Blocker
>  Labels: regression
>
> I was QAing Spark 3.1.0 RC1 and found one regression: "show create table" 
> doesn't work for v2 table.
> But when using Spark 3.0.1, "show create table" works for v2 table.
> Steps to test:
> {code:java}
> /bin/spark-shell --packages io.delta:delta-core_2.12:0.7.0 --conf 
> "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf 
> "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
> scala> spark.sql("create table foo(i INT) using delta")
>  res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("show create table foo").show(false)
> +---+
> |createtab_stmt                                 |
> +---+
> |CREATE TABLE `default`.`foo` (
>   )
> USING delta
> |
> +---+
> {code}
> Looks like it's caused by [https://github.com/apache/spark/pull/30321]
>  which blocks "show create table" for v2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34034) "show create table" doesn't work for v2 table

2021-01-06 Thread Shixiong Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260061#comment-17260061
 ] 

Shixiong Zhu commented on SPARK-34034:
--

[~cloud_fan]

> "show create table" doesn't work for v2 table
> -
>
> Key: SPARK-34034
> URL: https://issues.apache.org/jira/browse/SPARK-34034
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Shixiong Zhu
>Priority: Blocker
>
> I was QAing Spark 3.1.0 RC1 and found one regression: "show create table" 
> doesn't work for v2 table.
> But when using Spark 3.0.1, "show create table" works for v2 table.
> Steps to test:
> {code:java}
> /bin/spark-shell --packages io.delta:delta-core_2.12:0.7.0 --conf 
> "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf 
> "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
> scala> spark.sql("create table foo(i INT) using delta")
>  res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("show create table foo").show(false)
> +---+
> |createtab_stmt                                 |
> +---+
> |CREATE TABLE `default`.`foo` (
>   )
> USING delta
> |
> +---+
> {code}
> Looks like it's caused by [https://github.com/apache/spark/pull/30321]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34034) "show create table" doesn't work for v2 table

2021-01-06 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-34034:
-
Labels: regression  (was: regresion)

> "show create table" doesn't work for v2 table
> -
>
> Key: SPARK-34034
> URL: https://issues.apache.org/jira/browse/SPARK-34034
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Shixiong Zhu
>Priority: Blocker
>  Labels: regression
>
> I was QAing Spark 3.1.0 RC1 and found one regression: "show create table" 
> doesn't work for v2 table.
> But when using Spark 3.0.1, "show create table" works for v2 table.
> Steps to test:
> {code:java}
> /bin/spark-shell --packages io.delta:delta-core_2.12:0.7.0 --conf 
> "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf 
> "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
> scala> spark.sql("create table foo(i INT) using delta")
>  res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("show create table foo").show(false)
> +---+
> |createtab_stmt                                 |
> +---+
> |CREATE TABLE `default`.`foo` (
>   )
> USING delta
> |
> +---+
> {code}
> Looks like it's caused by [https://github.com/apache/spark/pull/30321]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34034) "show create table" doesn't work for v2 table

2021-01-06 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-34034:
-
Labels: regresion  (was: )

> "show create table" doesn't work for v2 table
> -
>
> Key: SPARK-34034
> URL: https://issues.apache.org/jira/browse/SPARK-34034
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Shixiong Zhu
>Priority: Blocker
>  Labels: regresion
>
> I was QAing Spark 3.1.0 RC1 and found one regression: "show create table" 
> doesn't work for v2 table.
> But when using Spark 3.0.1, "show create table" works for v2 table.
> Steps to test:
> {code:java}
> /bin/spark-shell --packages io.delta:delta-core_2.12:0.7.0 --conf 
> "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf 
> "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
> scala> spark.sql("create table foo(i INT) using delta")
>  res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("show create table foo").show(false)
> +---+
> |createtab_stmt                                 |
> +---+
> |CREATE TABLE `default`.`foo` (
>   )
> USING delta
> |
> +---+
> {code}
> Looks like it's caused by [https://github.com/apache/spark/pull/30321]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34034) "show create table" doesn't work for v2 table

2021-01-06 Thread Shixiong Zhu (Jira)

Shixiong Zhu created SPARK-34034:


 Summary: "show create table" doesn't work for v2 table
 Key: SPARK-34034
 URL: https://issues.apache.org/jira/browse/SPARK-34034
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.0
Reporter: Shixiong Zhu


I was QAing Spark 3.1.0 RC1 and found one regression: "show create table" 
doesn't work for v2 table.

But when using Spark 3.0.1, "show create table" works for v2 table.

Steps to test:
{code:java}
/bin/spark-shell --packages io.delta:delta-core_2.12:0.7.0 --conf 
"spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf 
"spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"

scala> spark.sql("create table foo(i INT) using delta")
 res0: org.apache.spark.sql.DataFrame = []

scala> spark.sql("show create table foo").show(false)

+---+
|createtab_stmt                                 |
+---+
|CREATE TABLE `default`.`foo` (
  )
USING delta
|
+---+
{code}
Looks like it's caused by [https://github.com/apache/spark/pull/30321]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32896) Add DataStreamWriter.toTable API

2020-12-02 Thread Shixiong Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242844#comment-17242844
 ] 

Shixiong Zhu commented on SPARK-32896:
--

The method name is changed to `toTable` in 
[https://github.com/apache/spark/pull/30571]

> Add DataStreamWriter.toTable API
> 
>
> Key: SPARK-32896
> URL: https://issues.apache.org/jira/browse/SPARK-32896
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.1.0
>
>
> For now, there's no way to write to the table (especially catalog table) even 
> the table is capable to handle streaming write.
> We can add DataStreamWriter. -table- toTable API to let end users specify 
> table as provider, and let streaming query write into the table. That is just 
> to specify the table, and the overall usage of DataStreamWriter isn't changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32896) Add DataStreamWriter.toTable API

2020-12-02 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-32896:
-
Description: 
For now, there's no way to write to the table (especially catalog table) even 
the table is capable to handle streaming write.

We can add DataStreamWriter. -table- toTable API to let end users specify table 
as provider, and let streaming query write into the table. That is just to 
specify the table, and the overall usage of DataStreamWriter isn't changed.

  was:
For now, there's no way to write to the table (especially catalog table) even 
the table is capable to handle streaming write.

We can add DataStreamWriter.table API to let end users specify table as 
provider, and let streaming query write into the table. That is just to specify 
the table, and the overall usage of DataStreamWriter isn't changed.


> Add DataStreamWriter.toTable API
> 
>
> Key: SPARK-32896
> URL: https://issues.apache.org/jira/browse/SPARK-32896
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.1.0
>
>
> For now, there's no way to write to the table (especially catalog table) even 
> the table is capable to handle streaming write.
> We can add DataStreamWriter. -table- toTable API to let end users specify 
> table as provider, and let streaming query write into the table. That is just 
> to specify the table, and the overall usage of DataStreamWriter isn't changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32896) Add DataStreamWriter.toTable API

2020-12-02 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-32896:
-
Summary: Add DataStreamWriter.toTable API  (was: Add DataStreamWriter.table 
API)

> Add DataStreamWriter.toTable API
> 
>
> Key: SPARK-32896
> URL: https://issues.apache.org/jira/browse/SPARK-32896
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.1.0
>
>
> For now, there's no way to write to the table (especially catalog table) even 
> the table is capable to handle streaming write.
> We can add DataStreamWriter.table API to let end users specify table as 
> provider, and let streaming query write into the table. That is just to 
> specify the table, and the overall usage of DataStreamWriter isn't changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31953) Add Spark Structured Streaming History Server Support

2020-12-02 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-31953.
--
Fix Version/s: 3.1.0
   Resolution: Done

> Add Spark Structured Streaming History Server Support
> -
>
> Key: SPARK-31953
> URL: https://issues.apache.org/jira/browse/SPARK-31953
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 2.4.6, 3.0.0
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31953) Add Spark Structured Streaming History Server Support

2020-12-02 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu reassigned SPARK-31953:


Assignee: Genmao Yu

> Add Spark Structured Streaming History Server Support
> -
>
> Key: SPARK-31953
> URL: https://issues.apache.org/jira/browse/SPARK-31953
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 2.4.6, 3.0.0
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33587) Kill the executor on nested fatal errors

2020-11-28 Thread Shixiong Zhu (Jira)

Shixiong Zhu created SPARK-33587:


 Summary: Kill the executor on nested fatal errors
 Key: SPARK-33587
 URL: https://issues.apache.org/jira/browse/SPARK-33587
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.0.1
Reporter: Shixiong Zhu


Currently we kill the executor when hitting a fatal error. However, if the 
fatal error is wrapped by another exception, such as
- java.util.concurrent.ExecutionException, 
com.google.common.util.concurrent.UncheckedExecutionException, 
com.google.common.util.concurrent.ExecutionError when using Guava cache and 
java thread pool.
- SparkException thrown from this line: 
https://github.com/apache/spark/blob/cf98a761de677c733f3c33230e1c63ddb785d5c5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L231

We will still keep the executor running. Fatal errors are usually unrecoverable 
(such as OutOfMemoryError), some components may be in a broken state when 
hitting a fatal error. Hence, it's better to detect the nested fatal error as 
well and kill the executor. Then we can rely on Spark's fault tolerance to 
recover.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21065) Spark Streaming concurrentJobs + StreamingJobProgressListener conflict

2020-10-18 Thread Shixiong Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-21065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216375#comment-17216375
 ] 

Shixiong Zhu commented on SPARK-21065:
--

If you are seeing many active batches, it's likely your streaming application 
is too slow. You can try to look at UI and see if there are anything obvious 
that you can optimize.

> Spark Streaming concurrentJobs + StreamingJobProgressListener conflict
> --
>
> Key: SPARK-21065
> URL: https://issues.apache.org/jira/browse/SPARK-21065
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams, Scheduler, Spark Core, Web UI
>Affects Versions: 2.1.0
>Reporter: Dan Dutrow
>Priority: Major
>
> My streaming application has 200+ output operations, many of them stateful 
> and several of them windowed. In an attempt to reduce the processing times, I 
> set "spark.streaming.concurrentJobs" to 2+. Initial results are very 
> positive, cutting our processing time from ~3 minutes to ~1 minute, but 
> eventually we encounter an exception as follows:
> (Note that 149697756 ms is 2017-06-09 03:06:00, so it's trying to get a 
> batch from 45 minutes before the exception is thrown.)
> 2017-06-09 03:50:28,259 [Spark Listener Bus] ERROR 
> org.apache.spark.streaming.scheduler.StreamingListenerBus - Listener 
> StreamingJobProgressListener threw an exception
> java.util.NoSuchElementException: key not found 149697756 ms
> at scala.collection.MalLike$class.default(MapLike.scala:228)
> at scala.collection.AbstractMap.default(Map.scala:59)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:65)
> at 
> org.apache.spark.streaming.ui.StreamingJobProgressListener.onOutputOperationCompleted(StreamingJobProgressListener.scala:128)
> at 
> org.apache.spark.streaming.scheduler.StreamingListenerBus.doPostEvent(StreamingListenerBus.scala:67)
> at 
> org.apache.spark.streaming.scheduler.StreamingListenerBus.doPostEvent(StreamingListenerBus.scala:29)
> at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:63)
> at 
> org.apache.spark.streaming.scheduler.StreamingListenerBus.postToAll(StreamingListenerBus.scala:29)
> at 
> org.apache.spark.streaming.scheduler.StreamingListenerBus.onOtherEvent(StreamingListenerBus.scala:43)
> ...
> The Spark code causing the exception is here:
> https://github.com/apache/spark/blob/branch-2.1/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala#LC125
>   override def onOutputOperationCompleted(
>   outputOperationCompleted: StreamingListenerOutputOperationCompleted): 
> Unit = synchronized {
> // This method is called before onBatchCompleted
> {color:red}runningBatchUIData(outputOperationCompleted.outputOperationInfo.batchTime).{color}
>   updateOutputOperationInfo(outputOperationCompleted.outputOperationInfo)
> }
> It seems to me that it may be caused by that batch being removed earlier.
> https://github.com/apache/spark/blob/branch-2.1/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala#LC102
>   override def onBatchCompleted(batchCompleted: 
> StreamingListenerBatchCompleted): Unit = {
> synchronized {
>   waitingBatchUIData.remove(batchCompleted.batchInfo.batchTime)
>   
> {color:red}runningBatchUIData.remove(batchCompleted.batchInfo.batchTime){color}
>   val batchUIData = BatchUIData(batchCompleted.batchInfo)
>   completedBatchUIData.enqueue(batchUIData)
>   if (completedBatchUIData.size > batchUIDataLimit) {
> val removedBatch = completedBatchUIData.dequeue()
> batchTimeToOutputOpIdSparkJobIdPair.remove(removedBatch.batchTime)
>   }
>   totalCompletedBatches += 1L
>   totalProcessedRecords += batchUIData.numRecords
> }
> }
> What is the solution here? Should I make my spark streaming context remember 
> duration a lot longer? ssc.remember(batchDuration * rememberMultiple)
> Otherwise, it seems like there should be some kind of existence check on 
> runningBatchUIData before dereferencing it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21065) Spark Streaming concurrentJobs + StreamingJobProgressListener conflict

2020-10-18 Thread Shixiong Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-21065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216341#comment-17216341
 ] 

Shixiong Zhu commented on SPARK-21065:
--

`spark.streaming.concurrentJobs` is not safe. Fixing it requires fundamental 
system changes. We don't have any plan for this.

> Spark Streaming concurrentJobs + StreamingJobProgressListener conflict
> --
>
> Key: SPARK-21065
> URL: https://issues.apache.org/jira/browse/SPARK-21065
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams, Scheduler, Spark Core, Web UI
>Affects Versions: 2.1.0
>Reporter: Dan Dutrow
>Priority: Major
>
> My streaming application has 200+ output operations, many of them stateful 
> and several of them windowed. In an attempt to reduce the processing times, I 
> set "spark.streaming.concurrentJobs" to 2+. Initial results are very 
> positive, cutting our processing time from ~3 minutes to ~1 minute, but 
> eventually we encounter an exception as follows:
> (Note that 149697756 ms is 2017-06-09 03:06:00, so it's trying to get a 
> batch from 45 minutes before the exception is thrown.)
> 2017-06-09 03:50:28,259 [Spark Listener Bus] ERROR 
> org.apache.spark.streaming.scheduler.StreamingListenerBus - Listener 
> StreamingJobProgressListener threw an exception
> java.util.NoSuchElementException: key not found 149697756 ms
> at scala.collection.MalLike$class.default(MapLike.scala:228)
> at scala.collection.AbstractMap.default(Map.scala:59)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:65)
> at 
> org.apache.spark.streaming.ui.StreamingJobProgressListener.onOutputOperationCompleted(StreamingJobProgressListener.scala:128)
> at 
> org.apache.spark.streaming.scheduler.StreamingListenerBus.doPostEvent(StreamingListenerBus.scala:67)
> at 
> org.apache.spark.streaming.scheduler.StreamingListenerBus.doPostEvent(StreamingListenerBus.scala:29)
> at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:63)
> at 
> org.apache.spark.streaming.scheduler.StreamingListenerBus.postToAll(StreamingListenerBus.scala:29)
> at 
> org.apache.spark.streaming.scheduler.StreamingListenerBus.onOtherEvent(StreamingListenerBus.scala:43)
> ...
> The Spark code causing the exception is here:
> https://github.com/apache/spark/blob/branch-2.1/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala#LC125
>   override def onOutputOperationCompleted(
>   outputOperationCompleted: StreamingListenerOutputOperationCompleted): 
> Unit = synchronized {
> // This method is called before onBatchCompleted
> {color:red}runningBatchUIData(outputOperationCompleted.outputOperationInfo.batchTime).{color}
>   updateOutputOperationInfo(outputOperationCompleted.outputOperationInfo)
> }
> It seems to me that it may be caused by that batch being removed earlier.
> https://github.com/apache/spark/blob/branch-2.1/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala#LC102
>   override def onBatchCompleted(batchCompleted: 
> StreamingListenerBatchCompleted): Unit = {
> synchronized {
>   waitingBatchUIData.remove(batchCompleted.batchInfo.batchTime)
>   
> {color:red}runningBatchUIData.remove(batchCompleted.batchInfo.batchTime){color}
>   val batchUIData = BatchUIData(batchCompleted.batchInfo)
>   completedBatchUIData.enqueue(batchUIData)
>   if (completedBatchUIData.size > batchUIDataLimit) {
> val removedBatch = completedBatchUIData.dequeue()
> batchTimeToOutputOpIdSparkJobIdPair.remove(removedBatch.batchTime)
>   }
>   totalCompletedBatches += 1L
>   totalProcessedRecords += batchUIData.numRecords
> }
> }
> What is the solution here? Should I make my spark streaming context remember 
> duration a lot longer? ssc.remember(batchDuration * rememberMultiple)
> Otherwise, it seems like there should be some kind of existence check on 
> runningBatchUIData before dereferencing it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32256) Hive may fail to detect Hadoop version when using isolated classloader

2020-07-09 Thread Shixiong Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155018#comment-17155018
 ] 

Shixiong Zhu commented on SPARK-32256:
--

Yep. This doesn't happen in Hadoop 3.1.0 and above 

> Hive may fail to detect Hadoop version when using isolated classloader
> --
>
> Key: SPARK-32256
> URL: https://issues.apache.org/jira/browse/SPARK-32256
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Blocker
>
> Spark allows the user to set `spark.sql.hive.metastore.jars` to specify jars 
> to access Hive Metastore. These jars are loaded by the isolated classloader. 
> Because we also share Hadoop classes with the isolated classloader, the user 
> doesn't need to add Hadoop jars to `spark.sql.hive.metastore.jars`, which 
> means when we are using the isolated classloader, hadoop-common jar is not 
> available in this case. If Hadoop VersionInfo is not initialized before we 
> switch to the isolated classloader, and we try to initialize it using the 
> isolated classloader (the current thread context classloader), it will fail 
> and report `Unknown` which causes Hive to throw the following exception:
> {code}
> java.lang.RuntimeException: Illegal Hadoop Version: Unknown (expected A.B.* 
> format)
>   at 
> org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:147)
>   at 
> org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:122)
>   at 
> org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:88)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(ObjectStore.java:377)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:268)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:58)
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:517)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:482)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:544)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:370)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:78)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:219)
>   at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:67)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1548)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3080)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3108)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3349)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:217)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:204)
>   at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:331)
>   at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:292)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:262)
>   at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:247)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:543)
>   at 
> org.apache.hadoop.hive.ql.session.S

[jira] [Updated] (SPARK-32256) Hive may fail to detect Hadoop version when using isolated classloader

2020-07-09 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-32256:
-
Description: 
Spark allows the user to set `spark.sql.hive.metastore.jars` to specify jars to 
access Hive Metastore. These jars are loaded by the isolated classloader. 
Because we also share Hadoop classes with the isolated classloader, the user 
doesn't need to add Hadoop jars to `spark.sql.hive.metastore.jars`, which means 
when we are using the isolated classloader, hadoop-common jar is not available 
in this case. If Hadoop VersionInfo is not initialized before we switch to the 
isolated classloader, and we try to initialize it using the isolated 
classloader (the current thread context classloader), it will fail and report 
`Unknown` which causes Hive to throw the following exception:

{code}
java.lang.RuntimeException: Illegal Hadoop Version: Unknown (expected A.B.* 
format)
at 
org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:147)
at 
org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:122)
at 
org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:88)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(ObjectStore.java:377)
at 
org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:268)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:58)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:517)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:482)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:544)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:370)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:78)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:219)
at 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:67)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1548)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
at 
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3080)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3108)
at 
org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3349)
at 
org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:217)
at 
org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:204)
at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:331)
at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:292)
at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:262)
at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:247)
at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:543)
at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:511)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:175)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:128)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.spark.sql.hive.client.Isolate

[jira] [Updated] (SPARK-32256) Hive may fail to detect Hadoop version when using isolated classloader

2020-07-09 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-32256:
-
Description: 
Spark allows the user to set `spark.sql.hive.metastore.jars` to specify jars to 
access Hive Metastore. These jars are loaded by the isolated classloader. 
Because we also share Hadoop classes with the isolated classloader, the user 
doesn't need to add Hadoop jars to `spark.sql.hive.metastore.jars`, which means 
when we are using the isolated classloader, hadoop-common jar is not available 
in this case. If Hadoop VersionInfo is not initialized before we switch to the 
isolated classloader, and we try to initialize it using the isolated 
classloader (the current thread context classloader), it will fail and report 
`Unknown` which causes Hive to throw the following exception:

{code}
08:49:33.242 ERROR org.apache.hadoop.hive.shims.ShimLoader: Error loading shims
java.lang.RuntimeException: Illegal Hadoop Version: Unknown (expected A.B.* 
format)
at 
org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:147)
at 
org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:122)
at 
org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:88)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(ObjectStore.java:377)
at 
org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:268)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:58)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:517)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:482)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:544)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:370)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:78)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:219)
at 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:67)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1548)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
at 
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3080)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3108)
at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:543)
at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:511)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:175)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:128)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:301)
at 
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:431)
at 
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:324)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:72)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:71)
{code}

Technically, T

[jira] [Assigned] (SPARK-32256) Hive may fail to detect Hadoop version when using isolated classloader

2020-07-09 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu reassigned SPARK-32256:


Assignee: Shixiong Zhu

> Hive may fail to detect Hadoop version when using isolated classloader
> --
>
> Key: SPARK-32256
> URL: https://issues.apache.org/jira/browse/SPARK-32256
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Blocker
>
> TODO
> {code}
> 08:49:33.242 ERROR org.apache.hadoop.hive.shims.ShimLoader: Error loading 
> shims
> java.lang.RuntimeException: Illegal Hadoop Version: Unknown (expected A.B.* 
> format)
>   at 
> org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:147)
>   at 
> org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:122)
>   at 
> org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:88)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(ObjectStore.java:377)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:268)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:58)
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:517)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:482)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:544)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:370)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:78)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:219)
>   at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:67)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1548)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3080)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3108)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:543)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:511)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:175)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:128)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:301)
>   at 
> org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:431)
>   at 
> org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:324)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:72)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:71)
> {code}
> I marked this is a blocker because it's a regression in 3.0.0.



--
This message was sent by Atlassian Jira
(v8.3.4#8

[jira] [Created] (SPARK-32256) Hive may fail to detect Hadoop version when using isolated classloader

2020-07-09 Thread Shixiong Zhu (Jira)

Shixiong Zhu created SPARK-32256:


 Summary: Hive may fail to detect Hadoop version when using 
isolated classloader
 Key: SPARK-32256
 URL: https://issues.apache.org/jira/browse/SPARK-32256
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Shixiong Zhu


TODO

{code}
08:49:33.242 ERROR org.apache.hadoop.hive.shims.ShimLoader: Error loading shims
java.lang.RuntimeException: Illegal Hadoop Version: Unknown (expected A.B.* 
format)
at 
org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:147)
at 
org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:122)
at 
org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:88)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(ObjectStore.java:377)
at 
org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:268)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:58)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:517)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:482)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:544)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:370)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:78)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:219)
at 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:67)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1548)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
at 
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3080)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3108)
at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:543)
at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:511)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:175)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:128)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:301)
at 
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:431)
at 
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:324)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:72)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:71)
{code}

I marked this is a blocker because it's a regression in 3.0.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28489) KafkaOffsetRangeCalculator.getRanges may drop offsets

2020-06-25 Thread Shixiong Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145719#comment-17145719
 ] 

Shixiong Zhu commented on SPARK-28489:
--

[~klaustelenius] No. The "minPartitions" option for batch queries is added in 
Spark 3.0. Batch queries before Spark 3.0 will ignore the "minPartitions" 
option.

> KafkaOffsetRangeCalculator.getRanges may drop offsets
> -
>
> Key: SPARK-28489
> URL: https://issues.apache.org/jira/browse/SPARK-28489
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Blocker
>  Labels: correctness, dataloss
> Fix For: 2.4.4, 3.0.0
>
>
> KafkaOffsetRangeCalculator.getRanges may drop offsets due to round off errors.
>  
> This only affects queries using "minPartitions" option. A workaround is just 
> removing the "minPartitions" option from the query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31923) Event log cannot be generated when some internal accumulators use unexpected types

2020-06-08 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-31923:
-
Affects Version/s: 2.3.0
   2.3.1
   2.3.2
   2.3.3
   2.3.4

> Event log cannot be generated when some internal accumulators use unexpected 
> types
> --
>
> Key: SPARK-31923
> URL: https://issues.apache.org/jira/browse/SPARK-31923
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 
> 2.4.3, 2.4.4, 2.4.5, 2.4.6
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
> Fix For: 3.0.1, 3.1.0, 2.4.7
>
>
> A user may use internal accumulators by adding the "internal.metrics." prefix 
> to the accumulator name to hide sensitive information from UI (Accumulators 
> except internal ones will be shown in Spark UI).
> However, *org.apache.spark.util.JsonProtocol.accumValueToJson* assumes an 
> internal accumulator has only 3 possible types: int, long, and 
> java.util.List[(BlockId, BlockStatus)]. When an internal accumulator uses an 
> unexpected type, it will crash.
> An event log that contains such accumulator will be dropped because it cannot 
> be converted to JSON, and it will cause weird UI issue when rendering in 
> Spark History Server. For example, if `SparkListenerTaskEnd` is dropped 
> because of this issue, the user will see the task is still running even if it 
> was finished.
> It's better to make *accumValueToJson* more robust.
> 
> How to reproduce it:
> - Enable Spark event log
> - Run the following command:
> {code}
> scala> val accu = sc.doubleAccumulator("internal.metrics.foo")
> accu: org.apache.spark.util.DoubleAccumulator = DoubleAccumulator(id: 0, 
> name: Some(internal.metrics.foo), value: 0.0)
> scala> sc.parallelize(1 to 1, 1).foreach { _ => accu.add(1.0) }
> 20/06/06 16:11:27 ERROR AsyncEventQueue: Listener EventLoggingListener threw 
> an exception
> java.lang.ClassCastException: java.lang.Double cannot be cast to 
> java.util.List
>   at 
> org.apache.spark.util.JsonProtocol$.accumValueToJson(JsonProtocol.scala:330)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.util.JsonProtocol$.accumulableInfoToJson(JsonProtocol.scala:306)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
>   at scala.collection.immutable.List.map(List.scala:284)
>   at 
> org.apache.spark.util.JsonProtocol$.accumulablesToJson(JsonProtocol.scala:299)
>   at 
> org.apache.spark.util.JsonProtocol$.taskInfoToJson(JsonProtocol.scala:291)
>   at 
> org.apache.spark.util.JsonProtocol$.taskEndToJson(JsonProtocol.scala:145)
>   at 
> org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:76)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:138)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.onTaskEnd(EventLoggingListener.scala:158)
>   at 
> org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:45)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at 
> org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83)
>   at org.apach

[jira] [Updated] (SPARK-31923) Event log cannot be generated when some internal accumulators use unexpected types

2020-06-08 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-31923:
-
Affects Version/s: 2.4.0
   2.4.1
   2.4.2
   2.4.3
   2.4.4
   2.4.5

> Event log cannot be generated when some internal accumulators use unexpected 
> types
> --
>
> Key: SPARK-31923
> URL: https://issues.apache.org/jira/browse/SPARK-31923
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 2.4.5, 2.4.6
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
> Fix For: 3.0.1, 3.1.0, 2.4.7
>
>
> A user may use internal accumulators by adding the "internal.metrics." prefix 
> to the accumulator name to hide sensitive information from UI (Accumulators 
> except internal ones will be shown in Spark UI).
> However, *org.apache.spark.util.JsonProtocol.accumValueToJson* assumes an 
> internal accumulator has only 3 possible types: int, long, and 
> java.util.List[(BlockId, BlockStatus)]. When an internal accumulator uses an 
> unexpected type, it will crash.
> An event log that contains such accumulator will be dropped because it cannot 
> be converted to JSON, and it will cause weird UI issue when rendering in 
> Spark History Server. For example, if `SparkListenerTaskEnd` is dropped 
> because of this issue, the user will see the task is still running even if it 
> was finished.
> It's better to make *accumValueToJson* more robust.
> 
> How to reproduce it:
> - Enable Spark event log
> - Run the following command:
> {code}
> scala> val accu = sc.doubleAccumulator("internal.metrics.foo")
> accu: org.apache.spark.util.DoubleAccumulator = DoubleAccumulator(id: 0, 
> name: Some(internal.metrics.foo), value: 0.0)
> scala> sc.parallelize(1 to 1, 1).foreach { _ => accu.add(1.0) }
> 20/06/06 16:11:27 ERROR AsyncEventQueue: Listener EventLoggingListener threw 
> an exception
> java.lang.ClassCastException: java.lang.Double cannot be cast to 
> java.util.List
>   at 
> org.apache.spark.util.JsonProtocol$.accumValueToJson(JsonProtocol.scala:330)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.util.JsonProtocol$.accumulableInfoToJson(JsonProtocol.scala:306)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
>   at scala.collection.immutable.List.map(List.scala:284)
>   at 
> org.apache.spark.util.JsonProtocol$.accumulablesToJson(JsonProtocol.scala:299)
>   at 
> org.apache.spark.util.JsonProtocol$.taskInfoToJson(JsonProtocol.scala:291)
>   at 
> org.apache.spark.util.JsonProtocol$.taskEndToJson(JsonProtocol.scala:145)
>   at 
> org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:76)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:138)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.onTaskEnd(EventLoggingListener.scala:158)
>   at 
> org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:45)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at 
> org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83)
>   at org.apache.spark.u

[jira] [Issue Comment Deleted] (SPARK-31923) Event log cannot be generated when some internal accumulators use unexpected types

2020-06-08 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-31923:
-
Comment: was deleted

(was: User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/28744)

> Event log cannot be generated when some internal accumulators use unexpected 
> types
> --
>
> Key: SPARK-31923
> URL: https://issues.apache.org/jira/browse/SPARK-31923
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.6
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
> Fix For: 3.0.1, 3.1.0, 2.4.7
>
>
> A user may use internal accumulators by adding the "internal.metrics." prefix 
> to the accumulator name to hide sensitive information from UI (Accumulators 
> except internal ones will be shown in Spark UI).
> However, *org.apache.spark.util.JsonProtocol.accumValueToJson* assumes an 
> internal accumulator has only 3 possible types: int, long, and 
> java.util.List[(BlockId, BlockStatus)]. When an internal accumulator uses an 
> unexpected type, it will crash.
> An event log that contains such accumulator will be dropped because it cannot 
> be converted to JSON, and it will cause weird UI issue when rendering in 
> Spark History Server. For example, if `SparkListenerTaskEnd` is dropped 
> because of this issue, the user will see the task is still running even if it 
> was finished.
> It's better to make *accumValueToJson* more robust.
> 
> How to reproduce it:
> - Enable Spark event log
> - Run the following command:
> {code}
> scala> val accu = sc.doubleAccumulator("internal.metrics.foo")
> accu: org.apache.spark.util.DoubleAccumulator = DoubleAccumulator(id: 0, 
> name: Some(internal.metrics.foo), value: 0.0)
> scala> sc.parallelize(1 to 1, 1).foreach { _ => accu.add(1.0) }
> 20/06/06 16:11:27 ERROR AsyncEventQueue: Listener EventLoggingListener threw 
> an exception
> java.lang.ClassCastException: java.lang.Double cannot be cast to 
> java.util.List
>   at 
> org.apache.spark.util.JsonProtocol$.accumValueToJson(JsonProtocol.scala:330)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.util.JsonProtocol$.accumulableInfoToJson(JsonProtocol.scala:306)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
>   at scala.collection.immutable.List.map(List.scala:284)
>   at 
> org.apache.spark.util.JsonProtocol$.accumulablesToJson(JsonProtocol.scala:299)
>   at 
> org.apache.spark.util.JsonProtocol$.taskInfoToJson(JsonProtocol.scala:291)
>   at 
> org.apache.spark.util.JsonProtocol$.taskEndToJson(JsonProtocol.scala:145)
>   at 
> org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:76)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:138)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.onTaskEnd(EventLoggingListener.scala:158)
>   at 
> org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:45)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at 
> org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83)
>   at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302)
>   at 
> org.apache.spark.

[jira] [Resolved] (SPARK-31923) Event log cannot be generated when some internal accumulators use unexpected types

2020-06-08 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-31923.
--
  Assignee: Shixiong Zhu  (was: Apache Spark)
Resolution: Fixed

> Event log cannot be generated when some internal accumulators use unexpected 
> types
> --
>
> Key: SPARK-31923
> URL: https://issues.apache.org/jira/browse/SPARK-31923
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.6
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
> Fix For: 3.0.1, 3.1.0, 2.4.7
>
>
> A user may use internal accumulators by adding the "internal.metrics." prefix 
> to the accumulator name to hide sensitive information from UI (Accumulators 
> except internal ones will be shown in Spark UI).
> However, *org.apache.spark.util.JsonProtocol.accumValueToJson* assumes an 
> internal accumulator has only 3 possible types: int, long, and 
> java.util.List[(BlockId, BlockStatus)]. When an internal accumulator uses an 
> unexpected type, it will crash.
> An event log that contains such accumulator will be dropped because it cannot 
> be converted to JSON, and it will cause weird UI issue when rendering in 
> Spark History Server. For example, if `SparkListenerTaskEnd` is dropped 
> because of this issue, the user will see the task is still running even if it 
> was finished.
> It's better to make *accumValueToJson* more robust.
> 
> How to reproduce it:
> - Enable Spark event log
> - Run the following command:
> {code}
> scala> val accu = sc.doubleAccumulator("internal.metrics.foo")
> accu: org.apache.spark.util.DoubleAccumulator = DoubleAccumulator(id: 0, 
> name: Some(internal.metrics.foo), value: 0.0)
> scala> sc.parallelize(1 to 1, 1).foreach { _ => accu.add(1.0) }
> 20/06/06 16:11:27 ERROR AsyncEventQueue: Listener EventLoggingListener threw 
> an exception
> java.lang.ClassCastException: java.lang.Double cannot be cast to 
> java.util.List
>   at 
> org.apache.spark.util.JsonProtocol$.accumValueToJson(JsonProtocol.scala:330)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.util.JsonProtocol$.accumulableInfoToJson(JsonProtocol.scala:306)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
>   at scala.collection.immutable.List.map(List.scala:284)
>   at 
> org.apache.spark.util.JsonProtocol$.accumulablesToJson(JsonProtocol.scala:299)
>   at 
> org.apache.spark.util.JsonProtocol$.taskInfoToJson(JsonProtocol.scala:291)
>   at 
> org.apache.spark.util.JsonProtocol$.taskEndToJson(JsonProtocol.scala:145)
>   at 
> org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:76)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:138)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.onTaskEnd(EventLoggingListener.scala:158)
>   at 
> org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:45)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at 
> org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83)
>   at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:

[jira] [Assigned] (SPARK-31923) Event log cannot be generated when some internal accumulators use unexpected types

2020-06-08 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu reassigned SPARK-31923:


Assignee: Shixiong Zhu

> Event log cannot be generated when some internal accumulators use unexpected 
> types
> --
>
> Key: SPARK-31923
> URL: https://issues.apache.org/jira/browse/SPARK-31923
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.6
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
> Fix For: 3.0.1, 3.1.0, 2.4.7
>
>
> A user may use internal accumulators by adding the "internal.metrics." prefix 
> to the accumulator name to hide sensitive information from UI (Accumulators 
> except internal ones will be shown in Spark UI).
> However, *org.apache.spark.util.JsonProtocol.accumValueToJson* assumes an 
> internal accumulator has only 3 possible types: int, long, and 
> java.util.List[(BlockId, BlockStatus)]. When an internal accumulator uses an 
> unexpected type, it will crash.
> An event log that contains such accumulator will be dropped because it cannot 
> be converted to JSON, and it will cause weird UI issue when rendering in 
> Spark History Server. For example, if `SparkListenerTaskEnd` is dropped 
> because of this issue, the user will see the task is still running even if it 
> was finished.
> It's better to make *accumValueToJson* more robust.
> 
> How to reproduce it:
> - Enable Spark event log
> - Run the following command:
> {code}
> scala> val accu = sc.doubleAccumulator("internal.metrics.foo")
> accu: org.apache.spark.util.DoubleAccumulator = DoubleAccumulator(id: 0, 
> name: Some(internal.metrics.foo), value: 0.0)
> scala> sc.parallelize(1 to 1, 1).foreach { _ => accu.add(1.0) }
> 20/06/06 16:11:27 ERROR AsyncEventQueue: Listener EventLoggingListener threw 
> an exception
> java.lang.ClassCastException: java.lang.Double cannot be cast to 
> java.util.List
>   at 
> org.apache.spark.util.JsonProtocol$.accumValueToJson(JsonProtocol.scala:330)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.util.JsonProtocol$.accumulableInfoToJson(JsonProtocol.scala:306)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
>   at scala.collection.immutable.List.map(List.scala:284)
>   at 
> org.apache.spark.util.JsonProtocol$.accumulablesToJson(JsonProtocol.scala:299)
>   at 
> org.apache.spark.util.JsonProtocol$.taskInfoToJson(JsonProtocol.scala:291)
>   at 
> org.apache.spark.util.JsonProtocol$.taskEndToJson(JsonProtocol.scala:145)
>   at 
> org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:76)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:138)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.onTaskEnd(EventLoggingListener.scala:158)
>   at 
> org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:45)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at 
> org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83)
>   at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)
> {code}
>  



--
This message was

[jira] [Updated] (SPARK-31923) Event log cannot be generated when some internal accumulators use unexpected types

2020-06-08 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-31923:
-
Fix Version/s: 2.4.7

> Event log cannot be generated when some internal accumulators use unexpected 
> types
> --
>
> Key: SPARK-31923
> URL: https://issues.apache.org/jira/browse/SPARK-31923
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.6
>Reporter: Shixiong Zhu
>Priority: Major
> Fix For: 3.0.1, 3.1.0, 2.4.7
>
>
> A user may use internal accumulators by adding the "internal.metrics." prefix 
> to the accumulator name to hide sensitive information from UI (Accumulators 
> except internal ones will be shown in Spark UI).
> However, *org.apache.spark.util.JsonProtocol.accumValueToJson* assumes an 
> internal accumulator has only 3 possible types: int, long, and 
> java.util.List[(BlockId, BlockStatus)]. When an internal accumulator uses an 
> unexpected type, it will crash.
> An event log that contains such accumulator will be dropped because it cannot 
> be converted to JSON, and it will cause weird UI issue when rendering in 
> Spark History Server. For example, if `SparkListenerTaskEnd` is dropped 
> because of this issue, the user will see the task is still running even if it 
> was finished.
> It's better to make *accumValueToJson* more robust.
> 
> How to reproduce it:
> - Enable Spark event log
> - Run the following command:
> {code}
> scala> val accu = sc.doubleAccumulator("internal.metrics.foo")
> accu: org.apache.spark.util.DoubleAccumulator = DoubleAccumulator(id: 0, 
> name: Some(internal.metrics.foo), value: 0.0)
> scala> sc.parallelize(1 to 1, 1).foreach { _ => accu.add(1.0) }
> 20/06/06 16:11:27 ERROR AsyncEventQueue: Listener EventLoggingListener threw 
> an exception
> java.lang.ClassCastException: java.lang.Double cannot be cast to 
> java.util.List
>   at 
> org.apache.spark.util.JsonProtocol$.accumValueToJson(JsonProtocol.scala:330)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.util.JsonProtocol$.accumulableInfoToJson(JsonProtocol.scala:306)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
>   at scala.collection.immutable.List.map(List.scala:284)
>   at 
> org.apache.spark.util.JsonProtocol$.accumulablesToJson(JsonProtocol.scala:299)
>   at 
> org.apache.spark.util.JsonProtocol$.taskInfoToJson(JsonProtocol.scala:291)
>   at 
> org.apache.spark.util.JsonProtocol$.taskEndToJson(JsonProtocol.scala:145)
>   at 
> org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:76)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:138)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.onTaskEnd(EventLoggingListener.scala:158)
>   at 
> org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:45)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at 
> org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83)
>   at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-

[jira] [Updated] (SPARK-31923) Event log cannot be generated when some internal accumulators use unexpected types

2020-06-08 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-31923:
-
Fix Version/s: 3.1.0
   3.0.1

> Event log cannot be generated when some internal accumulators use unexpected 
> types
> --
>
> Key: SPARK-31923
> URL: https://issues.apache.org/jira/browse/SPARK-31923
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.6
>Reporter: Shixiong Zhu
>Priority: Major
> Fix For: 3.0.1, 3.1.0
>
>
> A user may use internal accumulators by adding the "internal.metrics." prefix 
> to the accumulator name to hide sensitive information from UI (Accumulators 
> except internal ones will be shown in Spark UI).
> However, *org.apache.spark.util.JsonProtocol.accumValueToJson* assumes an 
> internal accumulator has only 3 possible types: int, long, and 
> java.util.List[(BlockId, BlockStatus)]. When an internal accumulator uses an 
> unexpected type, it will crash.
> An event log that contains such accumulator will be dropped because it cannot 
> be converted to JSON, and it will cause weird UI issue when rendering in 
> Spark History Server. For example, if `SparkListenerTaskEnd` is dropped 
> because of this issue, the user will see the task is still running even if it 
> was finished.
> It's better to make *accumValueToJson* more robust.
> 
> How to reproduce it:
> - Enable Spark event log
> - Run the following command:
> {code}
> scala> val accu = sc.doubleAccumulator("internal.metrics.foo")
> accu: org.apache.spark.util.DoubleAccumulator = DoubleAccumulator(id: 0, 
> name: Some(internal.metrics.foo), value: 0.0)
> scala> sc.parallelize(1 to 1, 1).foreach { _ => accu.add(1.0) }
> 20/06/06 16:11:27 ERROR AsyncEventQueue: Listener EventLoggingListener threw 
> an exception
> java.lang.ClassCastException: java.lang.Double cannot be cast to 
> java.util.List
>   at 
> org.apache.spark.util.JsonProtocol$.accumValueToJson(JsonProtocol.scala:330)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.util.JsonProtocol$.accumulableInfoToJson(JsonProtocol.scala:306)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
>   at scala.collection.immutable.List.map(List.scala:284)
>   at 
> org.apache.spark.util.JsonProtocol$.accumulablesToJson(JsonProtocol.scala:299)
>   at 
> org.apache.spark.util.JsonProtocol$.taskInfoToJson(JsonProtocol.scala:291)
>   at 
> org.apache.spark.util.JsonProtocol$.taskEndToJson(JsonProtocol.scala:145)
>   at 
> org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:76)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:138)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.onTaskEnd(EventLoggingListener.scala:158)
>   at 
> org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:45)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at 
> org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83)
>   at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.

[jira] [Updated] (SPARK-31923) Event log cannot be generated when some internal accumulators use unexpected types

2020-06-06 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-31923:
-
Description: 
A user may use internal accumulators by adding the "internal.metrics." prefix 
to the accumulator name to hide sensitive information from UI (Accumulators 
except internal ones will be shown in Spark UI).

However, *org.apache.spark.util.JsonProtocol.accumValueToJson* assumes an 
internal accumulator has only 3 possible types: int, long, and 
java.util.List[(BlockId, BlockStatus)]. When an internal accumulator uses an 
unexpected type, it will crash.

An event log that contains such accumulator will be dropped because it cannot 
be converted to JSON, and it will cause weird UI issue when rendering in Spark 
History Server. For example, if `SparkListenerTaskEnd` is dropped because of 
this issue, the user will see the task is still running even if it was finished.

It's better to make *accumValueToJson* more robust.


How to reproduce it:

- Enable Spark event log
- Run the following command:

{code}
scala> val accu = sc.doubleAccumulator("internal.metrics.foo")
accu: org.apache.spark.util.DoubleAccumulator = DoubleAccumulator(id: 0, name: 
Some(internal.metrics.foo), value: 0.0)

scala> sc.parallelize(1 to 1, 1).foreach { _ => accu.add(1.0) }
20/06/06 16:11:27 ERROR AsyncEventQueue: Listener EventLoggingListener threw an 
exception
java.lang.ClassCastException: java.lang.Double cannot be cast to java.util.List
at 
org.apache.spark.util.JsonProtocol$.accumValueToJson(JsonProtocol.scala:330)
at 
org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
at 
org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.util.JsonProtocol$.accumulableInfoToJson(JsonProtocol.scala:306)
at 
org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
at 
org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
at scala.collection.immutable.List.map(List.scala:284)
at 
org.apache.spark.util.JsonProtocol$.accumulablesToJson(JsonProtocol.scala:299)
at 
org.apache.spark.util.JsonProtocol$.taskInfoToJson(JsonProtocol.scala:291)
at 
org.apache.spark.util.JsonProtocol$.taskEndToJson(JsonProtocol.scala:145)
at 
org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:76)
at 
org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:138)
at 
org.apache.spark.scheduler.EventLoggingListener.onTaskEnd(EventLoggingListener.scala:158)
at 
org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:45)
at 
org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at 
org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at 
org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
at 
org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at 
org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)
{code}

 

  was:
A user may use internal accumulators by adding the "internal.metrics." prefix 
to the accumulator name to hide sensitive information from UI (Accumulators 
except internal ones will be shown in Spark UI). However, 
org.apache.spark.util.JsonProtocol.accumValueToJson assumes an internal 
accumulator has only 3 possible types: int, long, and java.util.List[(BlockId, 
BlockStatus)]. When an internal accumulator uses an unexpected type, it will 
crash. An event log that contains such accumulator will be dropped because it 
cannot be converted to JSON, and it will cause weird UI issue when rendering in 
Spark History Server. For example, if `SparkListenerTaskEnd` is dropped

[jira] [Updated] (SPARK-31923) Event log cannot be generated when some internal accumulators use unexpected types

2020-06-06 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-31923:
-
Description: 
A user may use internal accumulators by adding the "internal.metrics." prefix 
to the accumulator name to hide sensitive information from UI (Accumulators 
except internal ones will be shown in Spark UI). However, 
org.apache.spark.util.JsonProtocol.accumValueToJson assumes an internal 
accumulator has only 3 possible types: int, long, and java.util.List[(BlockId, 
BlockStatus)]. When an internal accumulator uses an unexpected type, it will 
crash. An event log that contains such accumulator will be dropped because it 
cannot be converted to JSON, and it will cause weird UI issue when rendering in 
Spark History Server. For example, if `SparkListenerTaskEnd` is dropped because 
of this issue, the user will see the task is still running even if it was 
finished.

It's better to make accumValueToJson more robust.

How to reproduce it:

- Enable Spark event log
- Run the following command:

{code}
scala> val accu = sc.doubleAccumulator("internal.metrics.foo")
accu: org.apache.spark.util.DoubleAccumulator = DoubleAccumulator(id: 0, name: 
Some(internal.metrics.foo), value: 0.0)

scala> sc.parallelize(1 to 1, 1).foreach { _ => accu.add(1.0) }
20/06/06 16:11:27 ERROR AsyncEventQueue: Listener EventLoggingListener threw an 
exception
java.lang.ClassCastException: java.lang.Double cannot be cast to java.util.List
at 
org.apache.spark.util.JsonProtocol$.accumValueToJson(JsonProtocol.scala:330)
at 
org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
at 
org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.util.JsonProtocol$.accumulableInfoToJson(JsonProtocol.scala:306)
at 
org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
at 
org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
at scala.collection.immutable.List.map(List.scala:284)
at 
org.apache.spark.util.JsonProtocol$.accumulablesToJson(JsonProtocol.scala:299)
at 
org.apache.spark.util.JsonProtocol$.taskInfoToJson(JsonProtocol.scala:291)
at 
org.apache.spark.util.JsonProtocol$.taskEndToJson(JsonProtocol.scala:145)
at 
org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:76)
at 
org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:138)
at 
org.apache.spark.scheduler.EventLoggingListener.onTaskEnd(EventLoggingListener.scala:158)
at 
org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:45)
at 
org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at 
org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at 
org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
at 
org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at 
org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)
{code}

 

  was:
A user may use internal accumulators by adding the "internal.metrics." prefix 
to the accumulator name to hide sensitive information from UI (Accumulators 
will be shown in Spark UI by default). However, 
org.apache.spark.util.JsonProtocol.accumValueToJson assumes an internal 
accumulator has only 3 possible types: int, long, and java.util.List[(BlockId, 
BlockStatus)]. When an internal accumulator uses an unexpected type, it will 
crash. An event log that contains such accumulator will be dropped because it 
cannot be converted to JSON, and it will cause weird UI issue when rendering in 
Spark History Server. For example, if `SparkListenerTaskEnd` is dropped because 
of this is

[jira] [Updated] (SPARK-31923) Event log cannot be generated when some internal accumulators use unexpected types

2020-06-06 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-31923:
-
Description: 
A user may use internal accumulators by adding the "internal.metrics." prefix 
to the accumulator name to hide sensitive information from UI (Accumulators 
will be shown in Spark UI by default). However, 
org.apache.spark.util.JsonProtocol.accumValueToJson assumes an internal 
accumulator has only 3 possible types: int, long, and java.util.List[(BlockId, 
BlockStatus)]. When an internal accumulator uses an unexpected type, it will 
crash. An event log that contains such accumulator will be dropped because it 
cannot be converted to JSON, and it will cause weird UI issue when rendering in 
Spark History Server. For example, if `SparkListenerTaskEnd` is dropped because 
of this issue, the user will see the task is still running even if it was 
finished.

It's better to make accumValueToJson more robust.

How to reproduce it:

- Enable Spark event log
- Run the following command:

{code}
scala> val accu = sc.doubleAccumulator("internal.metrics.foo")
accu: org.apache.spark.util.DoubleAccumulator = DoubleAccumulator(id: 0, name: 
Some(internal.metrics.foo), value: 0.0)

scala> sc.parallelize(1 to 1, 1).foreach { _ => accu.add(1.0) }
20/06/06 16:11:27 ERROR AsyncEventQueue: Listener EventLoggingListener threw an 
exception
java.lang.ClassCastException: java.lang.Double cannot be cast to java.util.List
at 
org.apache.spark.util.JsonProtocol$.accumValueToJson(JsonProtocol.scala:330)
at 
org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
at 
org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.util.JsonProtocol$.accumulableInfoToJson(JsonProtocol.scala:306)
at 
org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
at 
org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
at scala.collection.immutable.List.map(List.scala:284)
at 
org.apache.spark.util.JsonProtocol$.accumulablesToJson(JsonProtocol.scala:299)
at 
org.apache.spark.util.JsonProtocol$.taskInfoToJson(JsonProtocol.scala:291)
at 
org.apache.spark.util.JsonProtocol$.taskEndToJson(JsonProtocol.scala:145)
at 
org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:76)
at 
org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:138)
at 
org.apache.spark.scheduler.EventLoggingListener.onTaskEnd(EventLoggingListener.scala:158)
at 
org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:45)
at 
org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at 
org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at 
org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
at 
org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at 
org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)
{code}

 

  was:
A user may use internal accumulators by adding the "internal.metrics." prefix 
to the accumulator name to hide sensitive information from UI (Accumulators 
will be shown in Spark UI by default). However, 
org.apache.spark.util.JsonProtocol.accumValueToJson assumes an internal 
accumulator has only 3 possible types: int, long, and java.util.List[(BlockId, 
BlockStatus)]. When an internal accumulator uses an unexpected type, it will 
crash. An event log that contains such accumulator will not be able to convert 
to json, and it will cause weird UI issue when rendering in Spark History 
Server. For example, if `SparkListenerTaskEnd` is dropped because of this 
issue, the user will see the ta

[jira] [Updated] (SPARK-31923) Event log cannot be generated when some internal accumulators use unexpected types

2020-06-06 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-31923:
-
Description: 
A user may use internal accumulators by adding the "internal.metrics." prefix 
to the accumulator name to hide sensitive information from UI (Accumulators 
will be shown in Spark UI by default). However, 
org.apache.spark.util.JsonProtocol.accumValueToJson assumes an internal 
accumulator has only 3 possible types: int, long, and java.util.List[(BlockId, 
BlockStatus)]. When an internal accumulator uses an unexpected type, it will 
crash. An event log that contains such accumulator will not be able to convert 
to json, and it will cause weird UI issue when rendering in Spark History 
Server. For example, if `SparkListenerTaskEnd` is dropped because of this 
issue, the user will see the task is still running even if it was finished.

It's better to make accumValueToJson more robust.

How to reproduce it:

- Enable Spark event log
- Run the following command:

{code}
scala> val accu = sc.doubleAccumulator("internal.metrics.foo")
accu: org.apache.spark.util.DoubleAccumulator = DoubleAccumulator(id: 0, name: 
Some(internal.metrics.foo), value: 0.0)

scala> sc.parallelize(1 to 1, 1).foreach { _ => accu.add(1.0) }
20/06/06 16:11:27 ERROR AsyncEventQueue: Listener EventLoggingListener threw an 
exception
java.lang.ClassCastException: java.lang.Double cannot be cast to java.util.List
at 
org.apache.spark.util.JsonProtocol$.accumValueToJson(JsonProtocol.scala:330)
at 
org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
at 
org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.util.JsonProtocol$.accumulableInfoToJson(JsonProtocol.scala:306)
at 
org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
at 
org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
at scala.collection.immutable.List.map(List.scala:284)
at 
org.apache.spark.util.JsonProtocol$.accumulablesToJson(JsonProtocol.scala:299)
at 
org.apache.spark.util.JsonProtocol$.taskInfoToJson(JsonProtocol.scala:291)
at 
org.apache.spark.util.JsonProtocol$.taskEndToJson(JsonProtocol.scala:145)
at 
org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:76)
at 
org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:138)
at 
org.apache.spark.scheduler.EventLoggingListener.onTaskEnd(EventLoggingListener.scala:158)
at 
org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:45)
at 
org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at 
org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at 
org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
at 
org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at 
org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)
{code}

 

  was:
A user may use internal accumulators by adding the "internal.metrics." prefix 
to the accumulator name to hide sensitive information from UI (Accumulators 
will be shown in Spark UI by default). However, 
org.apache.spark.util.JsonProtocol.accumValueToJson assumes an internal 
accumulator has only 3 possible types: int, long, and java.util.List[(BlockId, 
BlockStatus)]. When an internal accumulator uses an unexpected type, it will 
crash.

It's better to make accumValueToJson more robust.

How to reproduce it:

- Enable Spark event log
- Run the following command:

{code}
scala> val accu = sc.doubleAccumulator("internal.metrics.foo")
accu: org.apache.spark.util.DoubleAccumulator = DoubleAccumulator(id: 0, na

[jira] [Created] (SPARK-31923) Event log cannot be generated when some internal accumulators use unexpected types

2020-06-06 Thread Shixiong Zhu (Jira)

Shixiong Zhu created SPARK-31923:


 Summary: Event log cannot be generated when some internal 
accumulators use unexpected types
 Key: SPARK-31923
 URL: https://issues.apache.org/jira/browse/SPARK-31923
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.6
Reporter: Shixiong Zhu


A user may use internal accumulators by adding the "internal.metrics." prefix 
to the accumulator name to hide sensitive information from UI (Accumulators 
will be shown in Spark UI by default). However, 
org.apache.spark.util.JsonProtocol.accumValueToJson assumes an internal 
accumulator has only 3 possible types: int, long, and java.util.List[(BlockId, 
BlockStatus)]. When an internal accumulator uses an unexpected type, it will 
crash.

It's better to make accumValueToJson more robust.

How to reproduce it:

- Enable Spark event log
- Run the following command:

{code}
scala> val accu = sc.doubleAccumulator("internal.metrics.foo")
accu: org.apache.spark.util.DoubleAccumulator = DoubleAccumulator(id: 0, name: 
Some(internal.metrics.foo), value: 0.0)

scala> sc.parallelize(1 to 1, 1).foreach { _ => accu.add(1.0) }
20/06/06 16:11:27 ERROR AsyncEventQueue: Listener EventLoggingListener threw an 
exception
java.lang.ClassCastException: java.lang.Double cannot be cast to java.util.List
at 
org.apache.spark.util.JsonProtocol$.accumValueToJson(JsonProtocol.scala:330)
at 
org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
at 
org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.util.JsonProtocol$.accumulableInfoToJson(JsonProtocol.scala:306)
at 
org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
at 
org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
at scala.collection.immutable.List.map(List.scala:284)
at 
org.apache.spark.util.JsonProtocol$.accumulablesToJson(JsonProtocol.scala:299)
at 
org.apache.spark.util.JsonProtocol$.taskInfoToJson(JsonProtocol.scala:291)
at 
org.apache.spark.util.JsonProtocol$.taskEndToJson(JsonProtocol.scala:145)
at 
org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:76)
at 
org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:138)
at 
org.apache.spark.scheduler.EventLoggingListener.onTaskEnd(EventLoggingListener.scala:158)
at 
org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:45)
at 
org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at 
org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at 
org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
at 
org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at 
org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)
{code}

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-30915) FileStreamSinkLog: Avoid reading the metadata log file when finding the latest batch ID

2020-05-22 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-30915.
--
Fix Version/s: 3.1.0
 Assignee: Jungtaek Lim
   Resolution: Fixed

> FileStreamSinkLog: Avoid reading the metadata log file when finding the 
> latest batch ID
> ---
>
> Key: SPARK-30915
> URL: https://issues.apache.org/jira/browse/SPARK-30915
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.1.0
>
>
> FileStreamSink.addBatch checks the latest batch ID before writing outputs to 
> skip writing batch if the batch was committed before.
> While it's valid to compare the current batch with the latest batch ID, 
> getLatest() method is designed to return both the batch ID as well as content 
> which denotes that the latest metadata log file is being read and 
> deserialized. This would introduces heavy latency when the latest batch is a 
> compacted batch.
> We could just find the metadata log file for latest batch ID, and only do the 
> minimal check without reading content.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31542) Backport SPARK-25692 Remove static initialization of worker eventLoop handling chunk fetch requests within TransportContext. This fixes ChunkFetchIntegrationSuite

2020-04-23 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-31542.
--
Resolution: Not A Problem

> Backport SPARK-25692   Remove static initialization of worker eventLoop 
> handling chunk fetch requests within TransportContext. This fixes 
> ChunkFetchIntegrationSuite as well
> 
>
> Key: SPARK-31542
> URL: https://issues.apache.org/jira/browse/SPARK-31542
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.6
>Reporter: Holden Karau
>Priority: Major
>
> Backport SPARK-25692       Remove static initialization of worker eventLoop 
> handling chunk fetch requests within TransportContext. This fixes 
> ChunkFetchIntegrationSuite as well.
> While the test was only flaky in the 3.0 branch, it seems possible the same 
> code path could be triggered in 2.4 so consider for backport.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31542) Backport SPARK-25692 Remove static initialization of worker eventLoop handling chunk fetch requests within TransportContext. This fixes ChunkFetchIntegrationSuit

2020-04-23 Thread Shixiong Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091060#comment-17091060
 ] 

Shixiong Zhu commented on SPARK-31542:
--

[~holden] The flaky test was caused by a new improvement in 3.0: SPARK-24355 It 
doesn't impact branch-2.4.

> Backport SPARK-25692   Remove static initialization of worker eventLoop 
> handling chunk fetch requests within TransportContext. This fixes 
> ChunkFetchIntegrationSuite as well
> 
>
> Key: SPARK-31542
> URL: https://issues.apache.org/jira/browse/SPARK-31542
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.6
>Reporter: Holden Karau
>Priority: Major
>
> Backport SPARK-25692       Remove static initialization of worker eventLoop 
> handling chunk fetch requests within TransportContext. This fixes 
> ChunkFetchIntegrationSuite as well.
> While the test was only flaky in the 3.0 branch, it seems possible the same 
> code path could be triggered in 2.4 so consider for backport.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31245) Document the backward incompatibility issue for Encoders.kryo and Encoders.javaSerialization

2020-03-24 Thread Shixiong Zhu (Jira)

Shixiong Zhu created SPARK-31245:


 Summary: Document the backward incompatibility issue for 
Encoders.kryo and Encoders.javaSerialization
 Key: SPARK-31245
 URL: https://issues.apache.org/jira/browse/SPARK-31245
 Project: Spark
  Issue Type: Documentation
  Components: Structured Streaming
Affects Versions: 3.0.0
Reporter: Shixiong Zhu


We should document that when a user uses Encoders.kryo or 
Encoders.javaSerialization for state store, we don't guarantee binary 
compatibility across different Spark versions because these two types of 
encoders can be broken easily. Maybe add a new section after 
http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#recovery-semantics-after-changes-in-a-streaming-query



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28556) Error should also be sent to QueryExecutionListener.onFailure

2020-03-14 Thread Shixiong Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17059259#comment-17059259
 ] 

Shixiong Zhu commented on SPARK-28556:
--

This change has been reverted. See SPARK-31144 for the new fix.

> Error should also be sent to QueryExecutionListener.onFailure
> -
>
> Key: SPARK-28556
> URL: https://issues.apache.org/jira/browse/SPARK-28556
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
> Fix For: 3.0.0
>
>
> Right now Error is not sent to QueryExecutionListener.onFailure. If there is 
> any Error when running a query, QueryExecutionListener.onFailure cannot be 
> triggered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28556) Error should also be sent to QueryExecutionListener.onFailure

2020-03-14 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-28556:
-
Docs Text:   (was: In Spark 3.0, the type of "error" parameter in the 
"org.apache.spark.sql.util.QueryExecutionListener.onFailure" method is changed 
to "java.lang.Throwable" from "java.lang.Exception" to accept more types of 
failures such as "java.lang.Error" and its subclasses.)

> Error should also be sent to QueryExecutionListener.onFailure
> -
>
> Key: SPARK-28556
> URL: https://issues.apache.org/jira/browse/SPARK-28556
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
> Fix For: 3.0.0
>
>
> Right now Error is not sent to QueryExecutionListener.onFailure. If there is 
> any Error when running a query, QueryExecutionListener.onFailure cannot be 
> triggered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28556) Error should also be sent to QueryExecutionListener.onFailure

2020-03-14 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-28556:
-
Labels:   (was: release-notes)

> Error should also be sent to QueryExecutionListener.onFailure
> -
>
> Key: SPARK-28556
> URL: https://issues.apache.org/jira/browse/SPARK-28556
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
> Fix For: 3.0.0
>
>
> Right now Error is not sent to QueryExecutionListener.onFailure. If there is 
> any Error when running a query, QueryExecutionListener.onFailure cannot be 
> triggered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31144) Wrap java.lang.Error with an exception for QueryExecutionListener.onFailure

2020-03-12 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-31144:
-
Issue Type: Bug  (was: Test)

> Wrap java.lang.Error with an exception for QueryExecutionListener.onFailure
> ---
>
> Key: SPARK-31144
> URL: https://issues.apache.org/jira/browse/SPARK-31144
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Shixiong Zhu
>Priority: Major
>
> SPARK-28556 changed the method QueryExecutionListener.onFailure to allow 
> Spark sending java.lang.Error to this method. As this change breaks APIs, we 
> cannot fix branch-2.4.
> [~marmbrus] suggested to wrap java.lang.Error with an exception instead to 
> avoid a breaking change. A bonus of this solution is we can also fix the 
> issue (if a query throws java.lang.Error, QueryExecutionListener doesn't get 
> notified) in branch-2.4.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31144) Wrap java.lang.Error with an exception for QueryExecutionListener.onFailure

2020-03-12 Thread Shixiong Zhu (Jira)

Shixiong Zhu created SPARK-31144:


 Summary: Wrap java.lang.Error with an exception for 
QueryExecutionListener.onFailure
 Key: SPARK-31144
 URL: https://issues.apache.org/jira/browse/SPARK-31144
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.0.0
Reporter: Shixiong Zhu


SPARK-28556 changed the method QueryExecutionListener.onFailure to allow Spark 
sending java.lang.Error to this method. As this change breaks APIs, we cannot 
fix branch-2.4.

[~marmbrus] suggested to wrap java.lang.Error with an exception instead to 
avoid a breaking change. A bonus of this solution is we can also fix the issue 
(if a query throws java.lang.Error, QueryExecutionListener doesn't get 
notified) in branch-2.4.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30984) Add UI test for Structured Streaming UI

2020-02-28 Thread Shixiong Zhu (Jira)

Shixiong Zhu created SPARK-30984:


 Summary: Add UI test for Structured Streaming UI
 Key: SPARK-30984
 URL: https://issues.apache.org/jira/browse/SPARK-30984
 Project: Spark
  Issue Type: Test
  Components: Structured Streaming
Affects Versions: 3.0.0
Reporter: Shixiong Zhu
Assignee: Shixiong Zhu


It's better to add some basic UI test for Structured Streaming UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-30943) Show "batch ID" in tool tip string for Structured Streaming UI graphs

2020-02-25 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-30943.
--
Fix Version/s: 3.0.0
 Assignee: Jungtaek Lim
   Resolution: Fixed

> Show "batch ID" in tool tip string for Structured Streaming UI graphs
> -
>
> Key: SPARK-30943
> URL: https://issues.apache.org/jira/browse/SPARK-30943
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> While we introduced the new UI for Structured Streaming, it was basically the 
> port of Streaming UI to Structured Streaming.
> That missed the major difference between twos, ID of the "batch". In Spark 
> Streaming (DStream), timestamp is the key of the batch and you will need to 
> find any relevant information with such timestamp.
> In Structured Streaming, batchId is the key of the batch. It's not a simple 
> difference because there're some known latency issues on periodical batches 
> (at least in FileStreamSource and FileStreamSink) and it's bothering to find 
> the right batchId if you only know about timestamp where the most of things 
> are logged with the batchId in Structured Streaming world.
> This issue tracks the efforts on showing batchId as main aspect of x axis, 
> but only in tooltip string as changing x axis breaks coupling between Spark 
> Streaming and Structured Streaming and lots of code should be duplicated and 
> rewritten.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30936) Forwards-compatibility in JsonProtocol in broken

2020-02-23 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-30936:
-
Summary: Forwards-compatibility in JsonProtocol in broken  (was: Fix the 
broken forwards-compatibility in JsonProtocol)

> Forwards-compatibility in JsonProtocol in broken
> 
>
> Key: SPARK-30936
> URL: https://issues.apache.org/jira/browse/SPARK-30936
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Shixiong Zhu
>Priority: Major
>
> JsonProtocol is supposed to provide strong backwards-compatibility and 
> forwards-compatibility guarantees: any version of Spark should be able to 
> read JSON output written by any other version, including newer versions.
> However, the forwards-compatibility guarantee is broken for events parsed by 
> "ObjectMapper". If a new field is added to an event parsed by "ObjectMapper" 
> (e.g., 
> https://github.com/apache/spark/commit/6dc5921e66d56885b95c07e56e687f9f6c1eaca7#diff-dc5c7a41fbb7479cef48b67eb41ad254R33),
>  this event cannot be parsed by an old version of Spark History Server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30936) Fix the broken forwards-compatibility in JsonProtocol

2020-02-23 Thread Shixiong Zhu (Jira)

Shixiong Zhu created SPARK-30936:


 Summary: Fix the broken forwards-compatibility in JsonProtocol
 Key: SPARK-30936
 URL: https://issues.apache.org/jira/browse/SPARK-30936
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: Shixiong Zhu


JsonProtocol is supposed to provide strong backwards-compatibility and 
forwards-compatibility guarantees: any version of Spark should be able to read 
JSON output written by any other version, including newer versions.

However, the forwards-compatibility guarantee is broken for events parsed by 
"ObjectMapper". If a new field is added to an event parsed by "ObjectMapper" 
(e.g., 
https://github.com/apache/spark/commit/6dc5921e66d56885b95c07e56e687f9f6c1eaca7#diff-dc5c7a41fbb7479cef48b67eb41ad254R33),
 this event cannot be parsed by an old version of Spark History Server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30927) StreamingQueryManager should avoid keeping reference to terminated StreamingQuery

2020-02-23 Thread Shixiong Zhu (Jira)

Shixiong Zhu created SPARK-30927:


 Summary: StreamingQueryManager should avoid keeping reference to 
terminated StreamingQuery
 Key: SPARK-30927
 URL: https://issues.apache.org/jira/browse/SPARK-30927
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.0.0
Reporter: Shixiong Zhu


Right now StreamingQueryManager will keep the last terminated query until 
"resetTerminated" is called. When the last terminated query has lots of states 
(a large sql plan, cached RDDs, etc.), it will waste these memory unnecessarily.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15006) Generated JavaDoc should hide package private objects

2020-02-10 Thread Shixiong Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-15006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17033886#comment-17033886
 ] 

Shixiong Zhu commented on SPARK-15006:
--

cc [~jodersky] has this fixed in the upstream?

> Generated JavaDoc should hide package private objects
> -
>
> Key: SPARK-15006
> URL: https://issues.apache.org/jira/browse/SPARK-15006
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Documentation
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Priority: Major
>  Labels: bulk-closed
>
> After we switch to the official release of genjavadoc in SPARK-14511, package 
> private objects are no longer hidden in the generated JavaDoc. This JIRA is 
> to track this upstream issue and update genjavadoc in Spark when there comes 
> a fix in the upstream.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-15006) Generated JavaDoc should hide package private objects

2020-02-10 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-15006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu reopened SPARK-15006:
--

Reopening this since the issue still exists (See 
http://spark.apache.org/docs/2.4.5/api/java/org/apache/spark/util/Utils.html )

It would be great that we can fix this.

> Generated JavaDoc should hide package private objects
> -
>
> Key: SPARK-15006
> URL: https://issues.apache.org/jira/browse/SPARK-15006
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Documentation
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Priority: Major
>  Labels: bulk-closed
>
> After we switch to the official release of genjavadoc in SPARK-14511, package 
> private objects are no longer hidden in the generated JavaDoc. This JIRA is 
> to track this upstream issue and update genjavadoc in Spark when there comes 
> a fix in the upstream.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30779) Review and fix issues in Structured Streaming API docs

2020-02-10 Thread Shixiong Zhu (Jira)

Shixiong Zhu created SPARK-30779:


 Summary: Review and fix issues in Structured Streaming API docs
 Key: SPARK-30779
 URL: https://issues.apache.org/jira/browse/SPARK-30779
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 3.0.0
Reporter: Shixiong Zhu


We should do an API review to ensure we don't leak public APIs unintentionally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30778) Javadoc of Java classes doesn't show up in Scala doc

2020-02-10 Thread Shixiong Zhu (Jira)

Shixiong Zhu created SPARK-30778:


 Summary: Javadoc of Java classes doesn't show up in Scala doc
 Key: SPARK-30778
 URL: https://issues.apache.org/jira/browse/SPARK-30778
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 2.4.5
Reporter: Shixiong Zhu


There are some APIs written in Java. However, Javadoc of Java classes doesn't 
show up in Scala doc. For example, 
http://spark.apache.org/docs/2.4.5/api/scala/index.html#org.apache.spark.sql.SaveMode

Scala users need to look at Javadoc for these Java classes right now.

This looks like a unidoc bug, but it's worth to investigate whether we can fix 
it since we are adding more and more Java classes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30657) Streaming limit after streaming dropDuplicates can throw error

2020-02-01 Thread Shixiong Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17028223#comment-17028223
 ] 

Shixiong Zhu commented on SPARK-30657:
--

[~tdas] Make sense. Agreed that the risk is high but the benefit is pretty low. 
We can backport it later whenever needed.

> Streaming limit after streaming dropDuplicates can throw error
> --
>
> Key: SPARK-30657
> URL: https://issues.apache.org/jira/browse/SPARK-30657
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 
> 2.4.3, 2.4.4
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Critical
> Fix For: 3.0.0
>
>
> {{LocalLimitExec}} does not consume the iterator of the child plan. So if 
> there is a limit after a stateful operator like streaming dedup in append 
> mode (e.g. {{streamingdf.dropDuplicates().limit(5}})), the state changes of 
> streaming duplicate may not be committed (most stateful ops commit state 
> changes only after the generated iterator is fully consumed). This leads to 
> the next batch failing with {{java.lang.IllegalStateException: Error reading 
> delta file .../N.delta does not exist}} as the state store delta file was 
> never generated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30658) Limit after on streaming dataframe before streaming agg returns wrong results

2020-01-31 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-30658:
-
Fix Version/s: 3.0.0

> Limit after on streaming dataframe before streaming agg returns wrong results
> -
>
> Key: SPARK-30658
> URL: https://issues.apache.org/jira/browse/SPARK-30658
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 
> 2.4.3, 2.4.4
>Reporter: Tathagata Das
>Priority: Critical
> Fix For: 3.0.0
>
>
> Limit before a streaming aggregate (i.e. [[df.limit(5).groupBy().count()}}) 
> in complete mode was not being planned as a streaming limit. The planner rule 
> planned a logical limit with a stateful streaming limit plan only if the 
> query is in append mode. As a result, instead of allowing max 5 rows across 
> batches, the planned streaming query was allowing 5 rows in every batch thus 
> producing incorrect results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30657) Streaming limit after streaming dropDuplicates can throw error

2020-01-31 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-30657:
-
Fix Version/s: 3.0.0

> Streaming limit after streaming dropDuplicates can throw error
> --
>
> Key: SPARK-30657
> URL: https://issues.apache.org/jira/browse/SPARK-30657
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 
> 2.4.3, 2.4.4
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Critical
> Fix For: 3.0.0
>
>
> {{LocalLimitExec}} does not consume the iterator of the child plan. So if 
> there is a limit after a stateful operator like streaming dedup in append 
> mode (e.g. {{streamingdf.dropDuplicates().limit(5}})), the state changes of 
> streaming duplicate may not be committed (most stateful ops commit state 
> changes only after the generated iterator is fully consumed). This leads to 
> the next batch failing with {{java.lang.IllegalStateException: Error reading 
> delta file .../N.delta does not exist}} as the state store delta file was 
> never generated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-30656) Support the "minPartitions" option in Kafka batch source and streaming source v1

2020-01-30 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-30656.
--
Fix Version/s: 3.0.0
   Resolution: Done

> Support the "minPartitions" option in Kafka batch source and streaming source 
> v1
> 
>
> Key: SPARK-30656
> URL: https://issues.apache.org/jira/browse/SPARK-30656
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
> Fix For: 3.0.0
>
>
> Right now, the "minPartitions" option only works in Kafka streaming source 
> v2. It would be great that we can support it in batch and streaming source v1 
> (v1 is the fallback mode when a user hits a regression in v2) as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29543) Support Structured Streaming UI

2020-01-29 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-29543.
--
Fix Version/s: 3.0.0
 Assignee: Genmao Yu
   Resolution: Done

> Support Structured Streaming UI
> ---
>
> Key: SPARK-29543
> URL: https://issues.apache.org/jira/browse/SPARK-29543
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming, Web UI
>Affects Versions: 3.0.0
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Fix For: 3.0.0
>
>
> Open this jira to support structured streaming UI



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28556) Error should also be sent to QueryExecutionListener.onFailure

2020-01-29 Thread Shixiong Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026098#comment-17026098
 ] 

Shixiong Zhu commented on SPARK-28556:
--

This also reminds me that we should also review all public APIs to see if there 
is any similar issue, since 3.0.0 is a good chance to fix API bugs.

> Error should also be sent to QueryExecutionListener.onFailure
> -
>
> Key: SPARK-28556
> URL: https://issues.apache.org/jira/browse/SPARK-28556
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
>  Labels: release-notes
> Fix For: 3.0.0
>
>
> Right now Error is not sent to QueryExecutionListener.onFailure. If there is 
> any Error when running a query, QueryExecutionListener.onFailure cannot be 
> triggered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28556) Error should also be sent to QueryExecutionListener.onFailure

2020-01-28 Thread Shixiong Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17025655#comment-17025655
 ] 

Shixiong Zhu commented on SPARK-28556:
--

This is not deprecating the API. We just fixed an issue in an experimental API. 
Hence, I don't think we need any deprecation note.

> Error should also be sent to QueryExecutionListener.onFailure
> -
>
> Key: SPARK-28556
> URL: https://issues.apache.org/jira/browse/SPARK-28556
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
>  Labels: release-notes
> Fix For: 3.0.0
>
>
> Right now Error is not sent to QueryExecutionListener.onFailure. If there is 
> any Error when running a query, QueryExecutionListener.onFailure cannot be 
> triggered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30656) Support the "minPartitions" option in Kafka batch source and streaming source v1

2020-01-28 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-30656:
-
Issue Type: Improvement  (was: Bug)

> Support the "minPartitions" option in Kafka batch source and streaming source 
> v1
> 
>
> Key: SPARK-30656
> URL: https://issues.apache.org/jira/browse/SPARK-30656
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
>
> Right now, the "minPartitions" option only works in Kafka streaming source 
> v2. It would be great that we can support it in batch and streaming source v1 
> (v1 is the fallback mode when a user hits a regression in v2) as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26060) Track SparkConf entries and make SET command reject such entries.

2020-01-27 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-26060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-26060:
-
Labels: release-notes  (was: )

> Track SparkConf entries and make SET command reject such entries.
> -
>
> Key: SPARK-26060
> URL: https://issues.apache.org/jira/browse/SPARK-26060
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 2.4.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
>  Labels: release-notes
> Fix For: 3.0.0
>
>
> Currently the {{SET}} command works without any warnings even if the 
> specified key is for {{SparkConf}} entries and it has no effect because the 
> command does not update {{SparkConf}}, but the behavior might confuse users. 
> We should track {{SparkConf}} entries and make the command reject for such 
> entries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19287) JavaPairRDD flatMapValues requires function returning Iterable, not Iterator

2020-01-27 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-19287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-19287:
-
Docs Text: JavaPairRDD/JavaPairDStream.flatMapValues() now requires a 
FlatMapFunction as an argument. This means this function now must return an 
Iterator, not Iterable. This corrects a long-standing inconsistency between the 
Scala and Java API, and allows the caller to supply merely an Iterator, not a 
full Iterable. Existing functions passed to this method can simply invoke 
".iterator()" on their existing return value to comply with the new signature.  
(was: JavaPairRDD.flatMapValues() now requires a FlatMapFunction as an 
argument. This means this function now must return an Iterator, not Iterable. 
This corrects a long-standing inconsistency between the Scala and Java API, and 
allows the caller to supply merely an Iterator, not a full Iterable. Existing 
functions passed to this method can simply invoke ".iterator()" on their 
existing return value to comply with the new signature.)

> JavaPairRDD flatMapValues requires function returning Iterable, not Iterator
> 
>
> Key: SPARK-19287
> URL: https://issues.apache.org/jira/browse/SPARK-19287
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.1.1
>Reporter: Sean R. Owen
>Assignee: Sean R. Owen
>Priority: Minor
>  Labels: release-notes
> Fix For: 3.0.0
>
>
> SPARK-3369 corrected an old oversight in the Java API, wherein 
> {{FlatMapFunction}} required an {{Iterable}} rather than {{Iterator}}. As 
> reported by [~akrim], it seems that this same type of problem was overlooked 
> also in {{JavaPairRDD}} 
> (https://github.com/apache/spark/blob/6c00c069e3c3f5904abd122cea1d56683031cca0/core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala#L677
>  ):
> {code}
> def flatMapValues[U](f: JFunction[V, java.lang.Iterable[U]]): JavaPairRDD[K, 
> U] =
> {code}
> As in {{PairRDDFunctions.scala}}, whose {{flatMapValues}} operates on 
> {{TraversableOnce}}, this should really take a function that returns an 
> {{Iterator}} -- really, {{FlatMapFunction}}.
> We can easily add an overload and deprecate the existing method.
> {code}
> def flatMapValues[U](f: FlatMapFunction[V, U]): JavaPairRDD[K, U]
> {code}
> This is source- and binary-backwards-compatible, in Java 7. It's 
> binary-backwards-compatible in Java 8, but not source-compatible. The 
> following natural usage with Java 8 lambdas becomes ambiguous and won't 
> compile -- Java won't figure out which to implement even based on the return 
> type unfortunately:
> {code}
> JavaPairRDD pairRDD = ...
> JavaPairRDD mappedRDD = 
>   pairRDD.flatMapValues(s -> Arrays.asList(s.length()).iterator());
> {code}
> It can be resolved by explicitly casting the lambda.
> We can at least document this. One day in Spark 3.x this can just be changed 
> outright.
> It's conceivable to resolve this by making the new method called 
> "flatMapValues2" or something ugly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30656) Support the "minPartitions" option in Kafka batch source and streaming source v1

2020-01-27 Thread Shixiong Zhu (Jira)

Shixiong Zhu created SPARK-30656:


 Summary: Support the "minPartitions" option in Kafka batch source 
and streaming source v1
 Key: SPARK-30656
 URL: https://issues.apache.org/jira/browse/SPARK-30656
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 3.0.0
Reporter: Shixiong Zhu
Assignee: Shixiong Zhu


Right now, the "minPartitions" option only works in Kafka streaming source v2. 
It would be great that we can support it in batch and streaming source v1 (v1 
is the fallback mode when a user hits a regression in v2) as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-25382) Remove ImageSchema.readImages in 3.0

2020-01-24 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-25382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-25382:
-
Docs Text: In Spark 3.0.0, the deprecated ImageSchema class and its 
readImages methods have been removed. Use 
`spark.read.format("image").load(path)` instead.  (was: In Spark 3.0.0, the 
deprecated ImageSchema class and its readImages methods have been removed. Use 
`spark.read.format(\"image\").load(path)` instead.)

> Remove ImageSchema.readImages in 3.0
> 
>
> Key: SPARK-25382
> URL: https://issues.apache.org/jira/browse/SPARK-25382
> Project: Spark
>  Issue Type: Task
>  Components: ML
>Affects Versions: 3.0.0
>Reporter: Xiangrui Meng
>Assignee: Weichen Xu
>Priority: Major
>  Labels: release-notes
> Fix For: 3.0.0
>
>
> A follow-up task from SPARK-25345. We might need to support sampling 
> (SPARK-25383) in order to remove readImages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-25382) Remove ImageSchema.readImages in 3.0

2020-01-24 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-25382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-25382:
-
Docs Text: In Spark 3.0.0, the deprecated ImageSchema class and its 
readImages methods have been removed. Use 
`spark.read.format(\"image\").load(path)` instead.  (was: In Spark 3.0.0, the 
deprecated ImageSchema class and its readImages methods have been removed.)

> Remove ImageSchema.readImages in 3.0
> 
>
> Key: SPARK-25382
> URL: https://issues.apache.org/jira/browse/SPARK-25382
> Project: Spark
>  Issue Type: Task
>  Components: ML
>Affects Versions: 3.0.0
>Reporter: Xiangrui Meng
>Assignee: Weichen Xu
>Priority: Major
>  Labels: release-notes
> Fix For: 3.0.0
>
>
> A follow-up task from SPARK-25345. We might need to support sampling 
> (SPARK-25383) in order to remove readImages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26132) Remove support for Scala 2.11 in Spark 3.0.0

2020-01-22 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-26132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-26132:
-
Docs Text: Scala 2.11 support is removed in Apache Spark 3.0.0.

> Remove support for Scala 2.11 in Spark 3.0.0
> 
>
> Key: SPARK-26132
> URL: https://issues.apache.org/jira/browse/SPARK-26132
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Spark Core
>Affects Versions: 3.0.0
>Reporter: Sean R. Owen
>Assignee: Sean R. Owen
>Priority: Major
>  Labels: release-notes
> Fix For: 3.0.0
>
>
> Per some discussion on the mailing list, we are_considering_ formally not 
> supporting Scala 2.11 in Spark 3.0. This JIRA tracks that discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26132) Remove support for Scala 2.11 in Spark 3.0.0

2020-01-22 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-26132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-26132:
-
Labels: release-notes  (was: )

> Remove support for Scala 2.11 in Spark 3.0.0
> 
>
> Key: SPARK-26132
> URL: https://issues.apache.org/jira/browse/SPARK-26132
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Spark Core
>Affects Versions: 3.0.0
>Reporter: Sean R. Owen
>Assignee: Sean R. Owen
>Priority: Major
>  Labels: release-notes
> Fix For: 3.0.0
>
>
> Per some discussion on the mailing list, we are_considering_ formally not 
> supporting Scala 2.11 in Spark 3.0. This JIRA tracks that discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24516) PySpark Bindings for K8S - make Python 3 the default

2020-01-22 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-24516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-24516:
-
Labels: release-no  (was: )

> PySpark Bindings for K8S - make Python 3 the default
> 
>
> Key: SPARK-24516
> URL: https://issues.apache.org/jira/browse/SPARK-24516
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, PySpark
>Affects Versions: 2.4.0
>Reporter: Ondrej Kokes
>Assignee: Ilan Filonenko
>Priority: Minor
>  Labels: release-no
> Fix For: 3.0.0
>
>
> Initial PySpark-k8s bindings have just been resolved (SPARK-23984), but the 
> default Python version there is 2. While you can override this by setting it 
> to 3, I think we should have sensible defaults.
> Python 3 has been around for ten years and is the clear successor, Python 2 
> has only 18 months left in terms of support. There isn't a good reason to 
> suggest Python 2 should be used, not in 2018 and not when both versions are 
> supported.
> The relevant commit [is 
> here|https://github.com/apache/spark/commit/1a644afbac35c204f9ad55f86999319a9ab458c6#diff-6e882d5561424e7e6651eb46f10104b8R194],
>  the version is also [in the 
> documentation|https://github.com/apache/spark/commit/1a644afbac35c204f9ad55f86999319a9ab458c6#diff-b5527f236b253e0d9f5db5164bdb43e9R643].
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24516) PySpark Bindings for K8S - make Python 3 the default

2020-01-22 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-24516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-24516:
-
Labels: release-notes  (was: release-no)

> PySpark Bindings for K8S - make Python 3 the default
> 
>
> Key: SPARK-24516
> URL: https://issues.apache.org/jira/browse/SPARK-24516
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, PySpark
>Affects Versions: 2.4.0
>Reporter: Ondrej Kokes
>Assignee: Ilan Filonenko
>Priority: Minor
>  Labels: release-notes
> Fix For: 3.0.0
>
>
> Initial PySpark-k8s bindings have just been resolved (SPARK-23984), but the 
> default Python version there is 2. While you can override this by setting it 
> to 3, I think we should have sensible defaults.
> Python 3 has been around for ten years and is the clear successor, Python 2 
> has only 18 months left in terms of support. There isn't a good reason to 
> suggest Python 2 should be used, not in 2018 and not when both versions are 
> supported.
> The relevant commit [is 
> here|https://github.com/apache/spark/commit/1a644afbac35c204f9ad55f86999319a9ab458c6#diff-6e882d5561424e7e6651eb46f10104b8R194],
>  the version is also [in the 
> documentation|https://github.com/apache/spark/commit/1a644afbac35c204f9ad55f86999319a9ab458c6#diff-b5527f236b253e0d9f5db5164bdb43e9R643].
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-25016) remove Support for hadoop 2.6

2020-01-22 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-25016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-25016:
-
Docs Text: Hadoop 2.6.x support is removed. The default Hadoop version in 
Spark 3.0.0 in 2.7.x.

> remove Support for hadoop 2.6
> -
>
> Key: SPARK-25016
> URL: https://issues.apache.org/jira/browse/SPARK-25016
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Thomas Graves
>Assignee: Sean R. Owen
>Priority: Major
>  Labels: release_notes
> Fix For: 3.0.0
>
>
> Hadoop 2.6  is now old, no releases have been done in 2 year and it doesn't 
> look like they are patching it for security or bug fixes. We should stop 
> supporting it in spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-25016) remove Support for hadoop 2.6

2020-01-22 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-25016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-25016:
-
Labels: release_notes  (was: )

> remove Support for hadoop 2.6
> -
>
> Key: SPARK-25016
> URL: https://issues.apache.org/jira/browse/SPARK-25016
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Thomas Graves
>Assignee: Sean R. Owen
>Priority: Major
>  Labels: release_notes
> Fix For: 3.0.0
>
>
> Hadoop 2.6  is now old, no releases have been done in 2 year and it doesn't 
> look like they are patching it for security or bug fixes. We should stop 
> supporting it in spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29930) Remove SQL configs declared to be removed in Spark 3.0

2020-01-22 Thread Shixiong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-29930:
-
Labels: release-notes  (was: )

> Remove SQL configs declared to be removed in Spark 3.0
> --
>
> Key: SPARK-29930
> URL: https://issues.apache.org/jira/browse/SPARK-29930
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
>  Labels: release-notes
> Fix For: 3.0.0
>
>
> Need to remove the following SQL configs:
> * spark.sql.fromJsonForceNullableSchema
> * spark.sql.legacy.compareDateTimestampInTimestamp



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2420 matches

Mail list logo