[jira] [Updated] (SPARK-23788) Race condition in StreamingQuerySuite

2018-03-24 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-23788:
-
Fix Version/s: 2.2.2

> Race condition in StreamingQuerySuite
> -
>
> Key: SPARK-23788
> URL: https://issues.apache.org/jira/browse/SPARK-23788
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jose Torres
>Assignee: Jose Torres
>Priority: Minor
> Fix For: 2.2.2, 2.3.1, 2.4.0
>
>
> The serializability test uses the same MemoryStream instance for 3 different 
> queries. If any of those queries ask it to commit before the others have run, 
> the rest will see empty dataframes. This can fail the test if q3 is affected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23788) Race condition in StreamingQuerySuite

2018-03-24 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-23788.
--
   Resolution: Fixed
 Assignee: Jose Torres
Fix Version/s: 2.4.0
   2.3.1

> Race condition in StreamingQuerySuite
> -
>
> Key: SPARK-23788
> URL: https://issues.apache.org/jira/browse/SPARK-23788
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jose Torres
>Assignee: Jose Torres
>Priority: Minor
> Fix For: 2.3.1, 2.4.0
>
>
> The serializability test uses the same MemoryStream instance for 3 different 
> queries. If any of those queries ask it to commit before the others have run, 
> the rest will see empty dataframes. This can fail the test if q3 is affected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23623) Avoid concurrent use of cached KafkaConsumer in CachedKafkaConsumer (kafka-0-10-sql)

2018-03-17 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-23623:
-
Fix Version/s: 2.3.1

> Avoid concurrent use of cached KafkaConsumer in CachedKafkaConsumer 
> (kafka-0-10-sql)
> 
>
> Key: SPARK-23623
> URL: https://issues.apache.org/jira/browse/SPARK-23623
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Critical
> Fix For: 2.3.1, 2.4.0
>
>
> CacheKafkaConsumer in the project `kafka-0-10-sql` is designed to maintain a 
> pool of KafkaConsumers that can be reused. However, it was built with the 
> assumption there will be only one task using trying to read the same Kafka 
> TopicPartition at the same time. Hence, the cache was keyed by the 
> TopicPartition a consumer is supposed to read. And any cases where this 
> assumption may not be true, we have SparkPlan flag to disable the use of a 
> cache. So it was up to the planner to correctly identify when it was not safe 
> to use the cache and set the flag accordingly. 
> Fundamentally, this is the wrong way to approach the problem. It is HARD for 
> a high-level planner to reason about the low-level execution model, whether 
> there will be multiple tasks in the same query trying to read the same 
> partition. Case in point, 2.3.0 introduced stream-stream joins, and you can 
> build a streaming self-join query on Kafka. It's pretty non-trivial to figure 
> out how this leads to two tasks reading the same partition twice, possibly 
> concurrently. And due to the non-triviality, it is hard to figure this out in 
> the planner and set the flag to avoid the cache / consumer pool. And this can 
> inadvertently lead to {{ConcurrentModificationException}} ,or worse, silent 
> reading of incorrect data.
> Here is a better way to design this. The planner shouldnt have to understand 
> these low-level optimizations. Rather the consumer pool should be smart 
> enough avoid concurrent use of a cached consumer. Currently, it tries to do 
> so but incorrectly (the flag {{inuse}} is not checked when returning a cached 
> consumer, see 
> [this|[https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala#L403]).]
>  If there is another request for the same partition as a currently in-use 
> consumer, the pool should automatically return a fresh consumer that should 
> be closed when the task is done.
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23623) Avoid concurrent use of cached KafkaConsumer in CachedKafkaConsumer (kafka-0-10-sql)

2018-03-16 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-23623.
--
   Resolution: Fixed
Fix Version/s: 2.4.0

> Avoid concurrent use of cached KafkaConsumer in CachedKafkaConsumer 
> (kafka-0-10-sql)
> 
>
> Key: SPARK-23623
> URL: https://issues.apache.org/jira/browse/SPARK-23623
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Critical
> Fix For: 2.4.0
>
>
> CacheKafkaConsumer in the project `kafka-0-10-sql` is designed to maintain a 
> pool of KafkaConsumers that can be reused. However, it was built with the 
> assumption there will be only one task using trying to read the same Kafka 
> TopicPartition at the same time. Hence, the cache was keyed by the 
> TopicPartition a consumer is supposed to read. And any cases where this 
> assumption may not be true, we have SparkPlan flag to disable the use of a 
> cache. So it was up to the planner to correctly identify when it was not safe 
> to use the cache and set the flag accordingly. 
> Fundamentally, this is the wrong way to approach the problem. It is HARD for 
> a high-level planner to reason about the low-level execution model, whether 
> there will be multiple tasks in the same query trying to read the same 
> partition. Case in point, 2.3.0 introduced stream-stream joins, and you can 
> build a streaming self-join query on Kafka. It's pretty non-trivial to figure 
> out how this leads to two tasks reading the same partition twice, possibly 
> concurrently. And due to the non-triviality, it is hard to figure this out in 
> the planner and set the flag to avoid the cache / consumer pool. And this can 
> inadvertently lead to {{ConcurrentModificationException}} ,or worse, silent 
> reading of incorrect data.
> Here is a better way to design this. The planner shouldnt have to understand 
> these low-level optimizations. Rather the consumer pool should be smart 
> enough avoid concurrent use of a cached consumer. Currently, it tries to do 
> so but incorrectly (the flag {{inuse}} is not checked when returning a cached 
> consumer, see 
> [this|[https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala#L403]).]
>  If there is another request for the same partition as a currently in-use 
> consumer, the pool should automatically return a fresh consumer that should 
> be closed when the task is done.
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23533) Add support for changing ContinuousDataReader's startOffset

2018-03-15 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu reassigned SPARK-23533:


Assignee: Li Yuanjian

> Add support for changing ContinuousDataReader's startOffset
> ---
>
> Key: SPARK-23533
> URL: https://issues.apache.org/jira/browse/SPARK-23533
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Li Yuanjian
>Assignee: Li Yuanjian
>Priority: Major
> Fix For: 2.4.0
>
>
> As discussion in [https://github.com/apache/spark/pull/20675], we need add a 
> new interface `ContinuousDataReaderFactory` to support the requirements of 
> setting start offset in Continuous Processing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23533) Add support for changing ContinuousDataReader's startOffset

2018-03-15 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-23533.
--
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 20689
[https://github.com/apache/spark/pull/20689]

> Add support for changing ContinuousDataReader's startOffset
> ---
>
> Key: SPARK-23533
> URL: https://issues.apache.org/jira/browse/SPARK-23533
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Li Yuanjian
>Assignee: Li Yuanjian
>Priority: Major
> Fix For: 2.4.0
>
>
> As discussion in [https://github.com/apache/spark/pull/20675], we need add a 
> new interface `ContinuousDataReaderFactory` to support the requirements of 
> setting start offset in Continuous Processing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18057) Update structured streaming kafka from 0.10.0.1 to 1.1.0

2018-02-28 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-18057:
-
Summary: Update structured streaming kafka from 0.10.0.1 to 1.1.0  (was: 
Update structured streaming kafka from 10.0.1 to 10.2.0)

> Update structured streaming kafka from 0.10.0.1 to 1.1.0
> 
>
> Key: SPARK-18057
> URL: https://issues.apache.org/jira/browse/SPARK-18057
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Reporter: Cody Koeninger
>Priority: Major
>
> There are a couple of relevant KIPs here, 
> https://archive.apache.org/dist/kafka/0.10.1.0/RELEASE_NOTES.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23475) The "stages" page doesn't show any completed stages

2018-02-21 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-23475:
-
Target Version/s: 2.3.0

> The "stages" page doesn't show any completed stages
> ---
>
> Key: SPARK-23475
> URL: https://issues.apache.org/jira/browse/SPARK-23475
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Priority: Blocker
> Attachments: Screen Shot 2018-02-21 at 12.39.39 AM.png
>
>
> Run "bin/spark-shell --conf spark.ui.retainedJobs=10 --conf 
> spark.ui.retainedStages=10", type the following codes and click the "stages" 
> page, it will not show completed stages:
> {code}
> val rdd = sc.parallelize(0 to 100, 100).repartition(10).cache()
> (1 to 20).foreach { i =>
>rdd.repartition(10).count()
> }
> {code}
> Please see the attached screenshots.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23481) The job page shows wrong stages when some of stages are evicted

2018-02-21 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-23481.
--
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 20654
[https://github.com/apache/spark/pull/20654]

> The job page shows wrong stages when some of stages are evicted
> ---
>
> Key: SPARK-23481
> URL: https://issues.apache.org/jira/browse/SPARK-23481
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Blocker
> Fix For: 2.3.0
>
> Attachments: Screen Shot 2018-02-21 at 12.39.46 AM.png
>
>
> Run "bin/spark-shell --conf spark.ui.retainedJobs=10 --conf 
> spark.ui.retainedStages=10", type the following codes and click the job 19 
> page, it will show wrong stage ids:
> {code}
> val rdd = sc.parallelize(0 to 100, 100).repartition(10).cache()
> (1 to 20).foreach { i =>
>rdd.repartition(10).count()
> }
> {code}
> Please see the attached screenshots.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23475) The "stages" page doesn't show any completed stages

2018-02-21 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-23475:
-
Attachment: (was: Screen Shot 2018-02-21 at 12.39.46 AM.png)

> The "stages" page doesn't show any completed stages
> ---
>
> Key: SPARK-23475
> URL: https://issues.apache.org/jira/browse/SPARK-23475
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Priority: Blocker
> Attachments: Screen Shot 2018-02-21 at 12.39.39 AM.png
>
>
> Run "bin/spark-shell --conf spark.ui.retainedJobs=10 --conf 
> spark.ui.retainedStages=10", type the following codes and click the "stages" 
> page, it will not show completed stages:
> {code}
> val rdd = sc.parallelize(0 to 100, 100).repartition(10).cache()
> (1 to 20).foreach { i =>
>rdd.repartition(10).count()
> }
> {code}
> The stages in the job page is also wrong. Please see the attached screenshots.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23481) The job page shows wrong stages when some of stages are evicted

2018-02-21 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371902#comment-16371902
 ] 

Shixiong Zhu commented on SPARK-23481:
--

[~vanzin] Yep. Here is the fix with a regression test: 
https://github.com/apache/spark/pull/20654

> The job page shows wrong stages when some of stages are evicted
> ---
>
> Key: SPARK-23481
> URL: https://issues.apache.org/jira/browse/SPARK-23481
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Blocker
> Attachments: Screen Shot 2018-02-21 at 12.39.46 AM.png
>
>
> Run "bin/spark-shell --conf spark.ui.retainedJobs=10 --conf 
> spark.ui.retainedStages=10", type the following codes and click the job 19 
> page, it will show wrong stage ids:
> {code}
> val rdd = sc.parallelize(0 to 100, 100).repartition(10).cache()
> (1 to 20).foreach { i =>
>rdd.repartition(10).count()
> }
> {code}
> Please see the attached screenshots.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23481) The job page shows wrong stages when some of stages are evicted

2018-02-21 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-23481:


 Summary: The job page shows wrong stages when some of stages are 
evicted
 Key: SPARK-23481
 URL: https://issues.apache.org/jira/browse/SPARK-23481
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 2.3.0
Reporter: Shixiong Zhu
Assignee: Shixiong Zhu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23475) The "stages" page doesn't show any completed stages

2018-02-21 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371897#comment-16371897
 ] 

Shixiong Zhu commented on SPARK-23475:
--

The job page issue is a separated issue. Created SPARK-23481 to track it 
instead.

> The "stages" page doesn't show any completed stages
> ---
>
> Key: SPARK-23475
> URL: https://issues.apache.org/jira/browse/SPARK-23475
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Priority: Blocker
> Attachments: Screen Shot 2018-02-21 at 12.39.39 AM.png, Screen Shot 
> 2018-02-21 at 12.39.46 AM.png
>
>
> Run "bin/spark-shell --conf spark.ui.retainedJobs=10 --conf 
> spark.ui.retainedStages=10", type the following codes and click the "stages" 
> page, it will not show completed stages:
> {code}
> val rdd = sc.parallelize(0 to 100, 100).repartition(10).cache()
> (1 to 20).foreach { i =>
>rdd.repartition(10).count()
> }
> {code}
> The stages in the job page is also wrong. Please see the attached screenshots.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23481) The job page shows wrong stages when some of stages are evicted

2018-02-21 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-23481:
-
Description: 
Run "bin/spark-shell --conf spark.ui.retainedJobs=10 --conf 
spark.ui.retainedStages=10", type the following codes and click the job 19 
page, it will show wrong stage ids:

{code}
val rdd = sc.parallelize(0 to 100, 100).repartition(10).cache()

(1 to 20).foreach { i =>
   rdd.repartition(10).count()
}
{code}

Please see the attached screenshots.

  was:
Run "bin/spark-shell --conf spark.ui.retainedJobs=10 --conf 
spark.ui.retainedStages=10", type the following codes and click the job 19 
page, it will not wrong stage ids:

{code}
val rdd = sc.parallelize(0 to 100, 100).repartition(10).cache()

(1 to 20).foreach { i =>
   rdd.repartition(10).count()
}
{code}

Please see the attached screenshots.


> The job page shows wrong stages when some of stages are evicted
> ---
>
> Key: SPARK-23481
> URL: https://issues.apache.org/jira/browse/SPARK-23481
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Blocker
> Attachments: Screen Shot 2018-02-21 at 12.39.46 AM.png
>
>
> Run "bin/spark-shell --conf spark.ui.retainedJobs=10 --conf 
> spark.ui.retainedStages=10", type the following codes and click the job 19 
> page, it will show wrong stage ids:
> {code}
> val rdd = sc.parallelize(0 to 100, 100).repartition(10).cache()
> (1 to 20).foreach { i =>
>rdd.repartition(10).count()
> }
> {code}
> Please see the attached screenshots.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23475) The "stages" page doesn't show any completed stages

2018-02-21 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-23475:
-
Description: 
Run "bin/spark-shell --conf spark.ui.retainedJobs=10 --conf 
spark.ui.retainedStages=10", type the following codes and click the "stages" 
page, it will not show completed stages:

{code}
val rdd = sc.parallelize(0 to 100, 100).repartition(10).cache()

(1 to 20).foreach { i =>
   rdd.repartition(10).count()
}
{code}

Please see the attached screenshots.

  was:
Run "bin/spark-shell --conf spark.ui.retainedJobs=10 --conf 
spark.ui.retainedStages=10", type the following codes and click the "stages" 
page, it will not show completed stages:

{code}
val rdd = sc.parallelize(0 to 100, 100).repartition(10).cache()

(1 to 20).foreach { i =>
   rdd.repartition(10).count()
}
{code}

The stages in the job page is also wrong. Please see the attached screenshots.


> The "stages" page doesn't show any completed stages
> ---
>
> Key: SPARK-23475
> URL: https://issues.apache.org/jira/browse/SPARK-23475
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Priority: Blocker
> Attachments: Screen Shot 2018-02-21 at 12.39.39 AM.png
>
>
> Run "bin/spark-shell --conf spark.ui.retainedJobs=10 --conf 
> spark.ui.retainedStages=10", type the following codes and click the "stages" 
> page, it will not show completed stages:
> {code}
> val rdd = sc.parallelize(0 to 100, 100).repartition(10).cache()
> (1 to 20).foreach { i =>
>rdd.repartition(10).count()
> }
> {code}
> Please see the attached screenshots.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23481) The job page shows wrong stages when some of stages are evicted

2018-02-21 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-23481:
-
Description: 
Run "bin/spark-shell --conf spark.ui.retainedJobs=10 --conf 
spark.ui.retainedStages=10", type the following codes and click the job 19 
page, it will not wrong stage ids:

{code}
val rdd = sc.parallelize(0 to 100, 100).repartition(10).cache()

(1 to 20).foreach { i =>
   rdd.repartition(10).count()
}
{code}

Please see the attached screenshots.

> The job page shows wrong stages when some of stages are evicted
> ---
>
> Key: SPARK-23481
> URL: https://issues.apache.org/jira/browse/SPARK-23481
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Blocker
> Attachments: Screen Shot 2018-02-21 at 12.39.46 AM.png
>
>
> Run "bin/spark-shell --conf spark.ui.retainedJobs=10 --conf 
> spark.ui.retainedStages=10", type the following codes and click the job 19 
> page, it will not wrong stage ids:
> {code}
> val rdd = sc.parallelize(0 to 100, 100).repartition(10).cache()
> (1 to 20).foreach { i =>
>rdd.repartition(10).count()
> }
> {code}
> Please see the attached screenshots.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23481) The job page shows wrong stages when some of stages are evicted

2018-02-21 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-23481:
-
Attachment: Screen Shot 2018-02-21 at 12.39.46 AM.png

> The job page shows wrong stages when some of stages are evicted
> ---
>
> Key: SPARK-23481
> URL: https://issues.apache.org/jira/browse/SPARK-23481
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Blocker
> Attachments: Screen Shot 2018-02-21 at 12.39.46 AM.png
>
>
> Run "bin/spark-shell --conf spark.ui.retainedJobs=10 --conf 
> spark.ui.retainedStages=10", type the following codes and click the job 19 
> page, it will not wrong stage ids:
> {code}
> val rdd = sc.parallelize(0 to 100, 100).repartition(10).cache()
> (1 to 20).foreach { i =>
>rdd.repartition(10).count()
> }
> {code}
> Please see the attached screenshots.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23406) Stream-stream self joins does not work

2018-02-21 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-23406:
-
Fix Version/s: (was: 3.0.0)
   2.4.0

> Stream-stream self joins does not work
> --
>
> Key: SPARK-23406
> URL: https://issues.apache.org/jira/browse/SPARK-23406
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Major
> Fix For: 2.4.0
>
>
> Currently stream-stream self join throws the following error
> {code}
> val df = spark.readStream.format("rate").option("numRowsPerSecond", 
> "1").option("numPartitions", "1").load()
> display(df.withColumn("key", $"value" / 10).join(df.withColumn("key", 
> $"value" / 5), "key"))
> {code}
> error:
> {code}
> Failure when resolving conflicting references in Join:
> 'Join UsingJoin(Inner,List(key))
> :- Project [timestamp#850, value#851L, (cast(value#851L as double) / cast(10 
> as double)) AS key#855]
> : +- StreamingRelation 
> DataSource(org.apache.spark.sql.SparkSession@7f1d2a68,rate,List(),None,List(),None,Map(numPartitions
>  -> 1, numRowsPerSecond -> 1),None), rate, [timestamp#850, value#851L]
> +- Project [timestamp#850, value#851L, (cast(value#851L as double) / cast(5 
> as double)) AS key#860]
>  +- StreamingRelation 
> DataSource(org.apache.spark.sql.SparkSession@7f1d2a68,rate,List(),None,List(),None,Map(numPartitions
>  -> 1, numRowsPerSecond -> 1),None), rate, [timestamp#850, value#851L]
> Conflicting attributes: timestamp#850,value#851L
> ;;
> 'Join UsingJoin(Inner,List(key))
> :- Project [timestamp#850, value#851L, (cast(value#851L as double) / cast(10 
> as double)) AS key#855]
> : +- StreamingRelation 
> DataSource(org.apache.spark.sql.SparkSession@7f1d2a68,rate,List(),None,List(),None,Map(numPartitions
>  -> 1, numRowsPerSecond -> 1),None), rate, [timestamp#850, value#851L]
> +- Project [timestamp#850, value#851L, (cast(value#851L as double) / cast(5 
> as double)) AS key#860]
>  +- StreamingRelation 
> DataSource(org.apache.spark.sql.SparkSession@7f1d2a68,rate,List(),None,List(),None,Map(numPartitions
>  -> 1, numRowsPerSecond -> 1),None), rate, [timestamp#850, value#851L]
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:39)
>  at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:101)
>  at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:378)
>  at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:98)
>  at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:148)
>  at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:98)
>  at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:101)
>  at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:71)
>  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:73)
>  at 
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:3063)
>  at org.apache.spark.sql.Dataset.join(Dataset.scala:787)
>  at org.apache.spark.sql.Dataset.join(Dataset.scala:756)
>  at org.apache.spark.sql.Dataset.join(Dataset.scala:731)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23475) The "stages" page doesn't show any completed stages

2018-02-21 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371098#comment-16371098
 ] 

Shixiong Zhu commented on SPARK-23475:
--

cc [~vanzin]

> The "stages" page doesn't show any completed stages
> ---
>
> Key: SPARK-23475
> URL: https://issues.apache.org/jira/browse/SPARK-23475
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Priority: Blocker
> Attachments: Screen Shot 2018-02-21 at 12.39.39 AM.png, Screen Shot 
> 2018-02-21 at 12.39.46 AM.png
>
>
> Run "bin/spark-shell --conf spark.ui.retainedJobs=10 --conf 
> spark.ui.retainedStages=10", type the following codes and click the "stages" 
> page, it will not show completed stages:
> {code}
> val rdd = sc.parallelize(0 to 100, 100).repartition(10).cache()
> (1 to 20).foreach { i =>
>rdd.repartition(10).count()
> }
> {code}
> The stages in the job page is also wrong. Please see the attached screenshots.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23475) The "stages" page doesn't show any completed stages

2018-02-21 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-23475:


 Summary: The "stages" page doesn't show any completed stages
 Key: SPARK-23475
 URL: https://issues.apache.org/jira/browse/SPARK-23475
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 2.3.0
Reporter: Shixiong Zhu
 Attachments: Screen Shot 2018-02-21 at 12.39.39 AM.png, Screen Shot 
2018-02-21 at 12.39.46 AM.png

Run "bin/spark-shell --conf spark.ui.retainedJobs=10 --conf 
spark.ui.retainedStages=10", type the following codes and click the "stages" 
page, it will not show completed stages:

{code}
val rdd = sc.parallelize(0 to 100, 100).repartition(10).cache()

(1 to 20).foreach { i =>
   rdd.repartition(10).count()
}
{code}

The stages in the job page is also wrong. Please see the attached screenshots.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23475) The "stages" page doesn't show any completed stages

2018-02-21 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-23475:
-
Attachment: Screen Shot 2018-02-21 at 12.39.46 AM.png
Screen Shot 2018-02-21 at 12.39.39 AM.png

> The "stages" page doesn't show any completed stages
> ---
>
> Key: SPARK-23475
> URL: https://issues.apache.org/jira/browse/SPARK-23475
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Priority: Blocker
> Attachments: Screen Shot 2018-02-21 at 12.39.39 AM.png, Screen Shot 
> 2018-02-21 at 12.39.46 AM.png
>
>
> Run "bin/spark-shell --conf spark.ui.retainedJobs=10 --conf 
> spark.ui.retainedStages=10", type the following codes and click the "stages" 
> page, it will not show completed stages:
> {code}
> val rdd = sc.parallelize(0 to 100, 100).repartition(10).cache()
> (1 to 20).foreach { i =>
>rdd.repartition(10).count()
> }
> {code}
> The stages in the job page is also wrong. Please see the attached screenshots.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23434) Spark should not warn `metadata directory` for a HDFS file path

2018-02-20 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu reassigned SPARK-23434:


Assignee: Dongjoon Hyun

> Spark should not warn `metadata directory` for a HDFS file path
> ---
>
> Key: SPARK-23434
> URL: https://issues.apache.org/jira/browse/SPARK-23434
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.2, 2.2.1, 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 2.4.0
>
>
> In a kerberized cluster, when Spark reads a file path (e.g. `people.json`), 
> it warns with a wrong error message during looking up 
> `people.json/_spark_metadata`. The root cause of this istuation is the 
> difference between `LocalFileSystem` and `DistributedFileSystem`. 
> `LocalFileSystem.exists()` returns `false`, but 
> `DistributedFileSystem.exists` raises Exception.
> {code}
> scala> spark.version
> res0: String = 2.4.0-SNAPSHOT
> scala> 
> spark.read.json("file:///usr/hdp/current/spark-client/examples/src/main/resources/people.json").show
> ++---+
> | age|   name|
> ++---+
> |null|Michael|
> |  30|   Andy|
> |  19| Justin|
> ++---+
> scala> spark.read.json("hdfs:///tmp/people.json")
> 18/02/15 05:00:48 WARN streaming.FileStreamSink: Error while looking for 
> metadata directory.
> 18/02/15 05:00:48 WARN streaming.FileStreamSink: Error while looking for 
> metadata directory.
> res6: org.apache.spark.sql.DataFrame = [age: bigint, name: string]
> {code}
> {code}
> scala> spark.version
> res0: String = 2.2.1
> scala> spark.read.json("hdfs:///tmp/people.json").show
> 18/02/15 05:28:02 WARN FileStreamSink: Error while looking for metadata 
> directory.
> 18/02/15 05:28:02 WARN FileStreamSink: Error while looking for metadata 
> directory.
> {code}
> {code}
> scala> spark.version
> res0: String = 2.1.2
> scala> spark.read.json("hdfs:///tmp/people.json").show
> 18/02/15 05:29:53 WARN DataSource: Error while looking for metadata directory.
> ++---+
> | age|   name|
> ++---+
> |null|Michael|
> |  30|   Andy|
> |  19| Justin|
> ++---+
> {code}
> {code}
> scala> spark.version
> res0: String = 2.0.2
> scala> spark.read.json("hdfs:///tmp/people.json").show
> 18/02/15 05:25:24 WARN DataSource: Error while looking for metadata directory.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23434) Spark should not warn `metadata directory` for a HDFS file path

2018-02-20 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-23434.
--
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 20616
[https://github.com/apache/spark/pull/20616]

> Spark should not warn `metadata directory` for a HDFS file path
> ---
>
> Key: SPARK-23434
> URL: https://issues.apache.org/jira/browse/SPARK-23434
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.2, 2.2.1, 2.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
> Fix For: 2.4.0
>
>
> In a kerberized cluster, when Spark reads a file path (e.g. `people.json`), 
> it warns with a wrong error message during looking up 
> `people.json/_spark_metadata`. The root cause of this istuation is the 
> difference between `LocalFileSystem` and `DistributedFileSystem`. 
> `LocalFileSystem.exists()` returns `false`, but 
> `DistributedFileSystem.exists` raises Exception.
> {code}
> scala> spark.version
> res0: String = 2.4.0-SNAPSHOT
> scala> 
> spark.read.json("file:///usr/hdp/current/spark-client/examples/src/main/resources/people.json").show
> ++---+
> | age|   name|
> ++---+
> |null|Michael|
> |  30|   Andy|
> |  19| Justin|
> ++---+
> scala> spark.read.json("hdfs:///tmp/people.json")
> 18/02/15 05:00:48 WARN streaming.FileStreamSink: Error while looking for 
> metadata directory.
> 18/02/15 05:00:48 WARN streaming.FileStreamSink: Error while looking for 
> metadata directory.
> res6: org.apache.spark.sql.DataFrame = [age: bigint, name: string]
> {code}
> {code}
> scala> spark.version
> res0: String = 2.2.1
> scala> spark.read.json("hdfs:///tmp/people.json").show
> 18/02/15 05:28:02 WARN FileStreamSink: Error while looking for metadata 
> directory.
> 18/02/15 05:28:02 WARN FileStreamSink: Error while looking for metadata 
> directory.
> {code}
> {code}
> scala> spark.version
> res0: String = 2.1.2
> scala> spark.read.json("hdfs:///tmp/people.json").show
> 18/02/15 05:29:53 WARN DataSource: Error while looking for metadata directory.
> ++---+
> | age|   name|
> ++---+
> |null|Michael|
> |  30|   Andy|
> |  19| Justin|
> ++---+
> {code}
> {code}
> scala> spark.version
> res0: String = 2.0.2
> scala> spark.read.json("hdfs:///tmp/people.json").show
> 18/02/15 05:25:24 WARN DataSource: Error while looking for metadata directory.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23470) org.apache.spark.ui.jobs.ApiHelper.lastStageNameAndDescription is too slow

2018-02-20 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-23470:
-
Description: 
I was testing 2.3.0 RC3 and found that it's easy to hit "read timeout" when 
accessing All Jobs page. The stack dump says it was running 
"org.apache.spark.ui.jobs.ApiHelper.lastStageNameAndDescription".

{code}
"SparkUI-59" #59 daemon prio=5 os_prio=0 tid=0x7fc15b0a3000 nid=0x8dc 
runnable [0x7fc0ce9f8000]
   java.lang.Thread.State: RUNNABLE
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.util.kvstore.KVTypeInfo$MethodAccessor.get(KVTypeInfo.java:154)
at 
org.apache.spark.util.kvstore.InMemoryStore$InMemoryView.compare(InMemoryStore.java:248)
at 
org.apache.spark.util.kvstore.InMemoryStore$InMemoryView.lambda$iterator$2(InMemoryStore.java:214)
at 
org.apache.spark.util.kvstore.InMemoryStore$InMemoryView$$Lambda$36/1834982692.compare(Unknown
 Source)
at java.util.TimSort.binarySort(TimSort.java:296)
at java.util.TimSort.sort(TimSort.java:239)
at java.util.Arrays.sort(Arrays.java:1512)
at java.util.ArrayList.sort(ArrayList.java:1460)
at java.util.stream.SortedOps$RefSortingSink.end(SortedOps.java:387)
at java.util.stream.Sink$ChainedReference.end(Sink.java:258)
at 
java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:210)
at 
java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161)
at 
java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
at 
org.apache.spark.util.kvstore.InMemoryStore$InMemoryIterator.hasNext(InMemoryStore.java:278)
at 
org.apache.spark.status.AppStatusStore.lastStageAttempt(AppStatusStore.scala:101)
at 
org.apache.spark.ui.jobs.ApiHelper$$anonfun$38.apply(StagePage.scala:1014)
at 
org.apache.spark.ui.jobs.ApiHelper$$anonfun$38.apply(StagePage.scala:1014)
at 
org.apache.spark.status.AppStatusStore.asOption(AppStatusStore.scala:408)
at 
org.apache.spark.ui.jobs.ApiHelper$.lastStageNameAndDescription(StagePage.scala:1014)
at 
org.apache.spark.ui.jobs.JobDataSource.org$apache$spark$ui$jobs$JobDataSource$$jobRow(AllJobsPage.scala:434)
at 
org.apache.spark.ui.jobs.JobDataSource$$anonfun$24.apply(AllJobsPage.scala:412)
at 
org.apache.spark.ui.jobs.JobDataSource$$anonfun$24.apply(AllJobsPage.scala:412)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at 
scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.ui.jobs.JobDataSource.(AllJobsPage.scala:412)
at org.apache.spark.ui.jobs.JobPagedTable.(AllJobsPage.scala:504)
at org.apache.spark.ui.jobs.AllJobsPage.jobsTable(AllJobsPage.scala:246)
at org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:295)
at org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98)
at org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98)
at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at 
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
{code}

According to the heap dump, there are 954 JobDataWrapper and 54690 
StageDataWrapper. It's obvious that the UI will be slow since we need to sort 
54690 items for 954 jobs.


  was:
I was testing 2.3.0 RC3 and 

[jira] [Updated] (SPARK-23470) org.apache.spark.ui.jobs.ApiHelper.lastStageNameAndDescription is too slow

2018-02-20 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-23470:
-
Target Version/s: 2.3.0

> org.apache.spark.ui.jobs.ApiHelper.lastStageNameAndDescription is too slow
> --
>
> Key: SPARK-23470
> URL: https://issues.apache.org/jira/browse/SPARK-23470
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Priority: Blocker
>
> I was testing 2.3.0 RC3 and found that it's easy to hit "read timeout" in 
> Spark UI. The stack dump says it was running 
> "org.apache.spark.ui.jobs.ApiHelper.lastStageNameAndDescription".
> {code}
> "SparkUI-59" #59 daemon prio=5 os_prio=0 tid=0x7fc15b0a3000 nid=0x8dc 
> runnable [0x7fc0ce9f8000]
>java.lang.Thread.State: RUNNABLE
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.util.kvstore.KVTypeInfo$MethodAccessor.get(KVTypeInfo.java:154)
>   at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryView.compare(InMemoryStore.java:248)
>   at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryView.lambda$iterator$2(InMemoryStore.java:214)
>   at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryView$$Lambda$36/1834982692.compare(Unknown
>  Source)
>   at java.util.TimSort.binarySort(TimSort.java:296)
>   at java.util.TimSort.sort(TimSort.java:239)
>   at java.util.Arrays.sort(Arrays.java:1512)
>   at java.util.ArrayList.sort(ArrayList.java:1460)
>   at java.util.stream.SortedOps$RefSortingSink.end(SortedOps.java:387)
>   at java.util.stream.Sink$ChainedReference.end(Sink.java:258)
>   at 
> java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:210)
>   at 
> java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161)
>   at 
> java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
>   at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
>   at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryIterator.hasNext(InMemoryStore.java:278)
>   at 
> org.apache.spark.status.AppStatusStore.lastStageAttempt(AppStatusStore.scala:101)
>   at 
> org.apache.spark.ui.jobs.ApiHelper$$anonfun$38.apply(StagePage.scala:1014)
>   at 
> org.apache.spark.ui.jobs.ApiHelper$$anonfun$38.apply(StagePage.scala:1014)
>   at 
> org.apache.spark.status.AppStatusStore.asOption(AppStatusStore.scala:408)
>   at 
> org.apache.spark.ui.jobs.ApiHelper$.lastStageNameAndDescription(StagePage.scala:1014)
>   at 
> org.apache.spark.ui.jobs.JobDataSource.org$apache$spark$ui$jobs$JobDataSource$$jobRow(AllJobsPage.scala:434)
>   at 
> org.apache.spark.ui.jobs.JobDataSource$$anonfun$24.apply(AllJobsPage.scala:412)
>   at 
> org.apache.spark.ui.jobs.JobDataSource$$anonfun$24.apply(AllJobsPage.scala:412)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
>   at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at org.apache.spark.ui.jobs.JobDataSource.(AllJobsPage.scala:412)
>   at org.apache.spark.ui.jobs.JobPagedTable.(AllJobsPage.scala:504)
>   at org.apache.spark.ui.jobs.AllJobsPage.jobsTable(AllJobsPage.scala:246)
>   at org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:295)
>   at org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98)
>   at org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98)
>   at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>

[jira] [Commented] (SPARK-23470) org.apache.spark.ui.jobs.ApiHelper.lastStageNameAndDescription is too slow

2018-02-20 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370531#comment-16370531
 ] 

Shixiong Zhu commented on SPARK-23470:
--

[~vanzin] could you make a PR to fix it? I can help you test the patch.

> org.apache.spark.ui.jobs.ApiHelper.lastStageNameAndDescription is too slow
> --
>
> Key: SPARK-23470
> URL: https://issues.apache.org/jira/browse/SPARK-23470
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Priority: Major
>
> I was testing 2.3.0 RC3 and found that it's easy to hit "read timeout" in 
> Spark UI. The stack dump says it was running 
> "org.apache.spark.ui.jobs.ApiHelper.lastStageNameAndDescription".
> {code}
> "SparkUI-59" #59 daemon prio=5 os_prio=0 tid=0x7fc15b0a3000 nid=0x8dc 
> runnable [0x7fc0ce9f8000]
>java.lang.Thread.State: RUNNABLE
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.util.kvstore.KVTypeInfo$MethodAccessor.get(KVTypeInfo.java:154)
>   at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryView.compare(InMemoryStore.java:248)
>   at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryView.lambda$iterator$2(InMemoryStore.java:214)
>   at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryView$$Lambda$36/1834982692.compare(Unknown
>  Source)
>   at java.util.TimSort.binarySort(TimSort.java:296)
>   at java.util.TimSort.sort(TimSort.java:239)
>   at java.util.Arrays.sort(Arrays.java:1512)
>   at java.util.ArrayList.sort(ArrayList.java:1460)
>   at java.util.stream.SortedOps$RefSortingSink.end(SortedOps.java:387)
>   at java.util.stream.Sink$ChainedReference.end(Sink.java:258)
>   at 
> java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:210)
>   at 
> java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161)
>   at 
> java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
>   at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
>   at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryIterator.hasNext(InMemoryStore.java:278)
>   at 
> org.apache.spark.status.AppStatusStore.lastStageAttempt(AppStatusStore.scala:101)
>   at 
> org.apache.spark.ui.jobs.ApiHelper$$anonfun$38.apply(StagePage.scala:1014)
>   at 
> org.apache.spark.ui.jobs.ApiHelper$$anonfun$38.apply(StagePage.scala:1014)
>   at 
> org.apache.spark.status.AppStatusStore.asOption(AppStatusStore.scala:408)
>   at 
> org.apache.spark.ui.jobs.ApiHelper$.lastStageNameAndDescription(StagePage.scala:1014)
>   at 
> org.apache.spark.ui.jobs.JobDataSource.org$apache$spark$ui$jobs$JobDataSource$$jobRow(AllJobsPage.scala:434)
>   at 
> org.apache.spark.ui.jobs.JobDataSource$$anonfun$24.apply(AllJobsPage.scala:412)
>   at 
> org.apache.spark.ui.jobs.JobDataSource$$anonfun$24.apply(AllJobsPage.scala:412)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
>   at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at org.apache.spark.ui.jobs.JobDataSource.(AllJobsPage.scala:412)
>   at org.apache.spark.ui.jobs.JobPagedTable.(AllJobsPage.scala:504)
>   at org.apache.spark.ui.jobs.AllJobsPage.jobsTable(AllJobsPage.scala:246)
>   at org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:295)
>   at org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98)
>   at org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98)
>   at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at 
> 

[jira] [Updated] (SPARK-23470) org.apache.spark.ui.jobs.ApiHelper.lastStageNameAndDescription is too slow

2018-02-20 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-23470:
-
Priority: Blocker  (was: Major)

> org.apache.spark.ui.jobs.ApiHelper.lastStageNameAndDescription is too slow
> --
>
> Key: SPARK-23470
> URL: https://issues.apache.org/jira/browse/SPARK-23470
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Priority: Blocker
>
> I was testing 2.3.0 RC3 and found that it's easy to hit "read timeout" in 
> Spark UI. The stack dump says it was running 
> "org.apache.spark.ui.jobs.ApiHelper.lastStageNameAndDescription".
> {code}
> "SparkUI-59" #59 daemon prio=5 os_prio=0 tid=0x7fc15b0a3000 nid=0x8dc 
> runnable [0x7fc0ce9f8000]
>java.lang.Thread.State: RUNNABLE
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.util.kvstore.KVTypeInfo$MethodAccessor.get(KVTypeInfo.java:154)
>   at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryView.compare(InMemoryStore.java:248)
>   at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryView.lambda$iterator$2(InMemoryStore.java:214)
>   at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryView$$Lambda$36/1834982692.compare(Unknown
>  Source)
>   at java.util.TimSort.binarySort(TimSort.java:296)
>   at java.util.TimSort.sort(TimSort.java:239)
>   at java.util.Arrays.sort(Arrays.java:1512)
>   at java.util.ArrayList.sort(ArrayList.java:1460)
>   at java.util.stream.SortedOps$RefSortingSink.end(SortedOps.java:387)
>   at java.util.stream.Sink$ChainedReference.end(Sink.java:258)
>   at 
> java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:210)
>   at 
> java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161)
>   at 
> java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
>   at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
>   at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryIterator.hasNext(InMemoryStore.java:278)
>   at 
> org.apache.spark.status.AppStatusStore.lastStageAttempt(AppStatusStore.scala:101)
>   at 
> org.apache.spark.ui.jobs.ApiHelper$$anonfun$38.apply(StagePage.scala:1014)
>   at 
> org.apache.spark.ui.jobs.ApiHelper$$anonfun$38.apply(StagePage.scala:1014)
>   at 
> org.apache.spark.status.AppStatusStore.asOption(AppStatusStore.scala:408)
>   at 
> org.apache.spark.ui.jobs.ApiHelper$.lastStageNameAndDescription(StagePage.scala:1014)
>   at 
> org.apache.spark.ui.jobs.JobDataSource.org$apache$spark$ui$jobs$JobDataSource$$jobRow(AllJobsPage.scala:434)
>   at 
> org.apache.spark.ui.jobs.JobDataSource$$anonfun$24.apply(AllJobsPage.scala:412)
>   at 
> org.apache.spark.ui.jobs.JobDataSource$$anonfun$24.apply(AllJobsPage.scala:412)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
>   at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at org.apache.spark.ui.jobs.JobDataSource.(AllJobsPage.scala:412)
>   at org.apache.spark.ui.jobs.JobPagedTable.(AllJobsPage.scala:504)
>   at org.apache.spark.ui.jobs.AllJobsPage.jobsTable(AllJobsPage.scala:246)
>   at org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:295)
>   at org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98)
>   at org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98)
>   at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at 
> 

[jira] [Commented] (SPARK-23470) org.apache.spark.ui.jobs.ApiHelper.lastStageNameAndDescription is too slow

2018-02-20 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370496#comment-16370496
 ] 

Shixiong Zhu commented on SPARK-23470:
--

[~vanzin] [~cloud_fan]

> org.apache.spark.ui.jobs.ApiHelper.lastStageNameAndDescription is too slow
> --
>
> Key: SPARK-23470
> URL: https://issues.apache.org/jira/browse/SPARK-23470
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Priority: Major
>
> I was testing 2.3.0 RC3 and found that it's easy to hit "read timeout" in 
> Spark UI. The stack dump says it was running 
> "org.apache.spark.ui.jobs.ApiHelper.lastStageNameAndDescription".
> {code}
> "SparkUI-59" #59 daemon prio=5 os_prio=0 tid=0x7fc15b0a3000 nid=0x8dc 
> runnable [0x7fc0ce9f8000]
>java.lang.Thread.State: RUNNABLE
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.util.kvstore.KVTypeInfo$MethodAccessor.get(KVTypeInfo.java:154)
>   at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryView.compare(InMemoryStore.java:248)
>   at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryView.lambda$iterator$2(InMemoryStore.java:214)
>   at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryView$$Lambda$36/1834982692.compare(Unknown
>  Source)
>   at java.util.TimSort.binarySort(TimSort.java:296)
>   at java.util.TimSort.sort(TimSort.java:239)
>   at java.util.Arrays.sort(Arrays.java:1512)
>   at java.util.ArrayList.sort(ArrayList.java:1460)
>   at java.util.stream.SortedOps$RefSortingSink.end(SortedOps.java:387)
>   at java.util.stream.Sink$ChainedReference.end(Sink.java:258)
>   at 
> java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:210)
>   at 
> java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161)
>   at 
> java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
>   at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
>   at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryIterator.hasNext(InMemoryStore.java:278)
>   at 
> org.apache.spark.status.AppStatusStore.lastStageAttempt(AppStatusStore.scala:101)
>   at 
> org.apache.spark.ui.jobs.ApiHelper$$anonfun$38.apply(StagePage.scala:1014)
>   at 
> org.apache.spark.ui.jobs.ApiHelper$$anonfun$38.apply(StagePage.scala:1014)
>   at 
> org.apache.spark.status.AppStatusStore.asOption(AppStatusStore.scala:408)
>   at 
> org.apache.spark.ui.jobs.ApiHelper$.lastStageNameAndDescription(StagePage.scala:1014)
>   at 
> org.apache.spark.ui.jobs.JobDataSource.org$apache$spark$ui$jobs$JobDataSource$$jobRow(AllJobsPage.scala:434)
>   at 
> org.apache.spark.ui.jobs.JobDataSource$$anonfun$24.apply(AllJobsPage.scala:412)
>   at 
> org.apache.spark.ui.jobs.JobDataSource$$anonfun$24.apply(AllJobsPage.scala:412)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
>   at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at org.apache.spark.ui.jobs.JobDataSource.(AllJobsPage.scala:412)
>   at org.apache.spark.ui.jobs.JobPagedTable.(AllJobsPage.scala:504)
>   at org.apache.spark.ui.jobs.AllJobsPage.jobsTable(AllJobsPage.scala:246)
>   at org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:295)
>   at org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98)
>   at org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98)
>   at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at 
> 

[jira] [Created] (SPARK-23470) org.apache.spark.ui.jobs.ApiHelper.lastStageNameAndDescription is too slow

2018-02-20 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-23470:


 Summary: 
org.apache.spark.ui.jobs.ApiHelper.lastStageNameAndDescription is too slow
 Key: SPARK-23470
 URL: https://issues.apache.org/jira/browse/SPARK-23470
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 2.3.0
Reporter: Shixiong Zhu


I was testing 2.3.0 RC3 and found that it's easy to hit "read timeout" in Spark 
UI. The stack dump says it was running 
"org.apache.spark.ui.jobs.ApiHelper.lastStageNameAndDescription".

{code}
"SparkUI-59" #59 daemon prio=5 os_prio=0 tid=0x7fc15b0a3000 nid=0x8dc 
runnable [0x7fc0ce9f8000]
   java.lang.Thread.State: RUNNABLE
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.util.kvstore.KVTypeInfo$MethodAccessor.get(KVTypeInfo.java:154)
at 
org.apache.spark.util.kvstore.InMemoryStore$InMemoryView.compare(InMemoryStore.java:248)
at 
org.apache.spark.util.kvstore.InMemoryStore$InMemoryView.lambda$iterator$2(InMemoryStore.java:214)
at 
org.apache.spark.util.kvstore.InMemoryStore$InMemoryView$$Lambda$36/1834982692.compare(Unknown
 Source)
at java.util.TimSort.binarySort(TimSort.java:296)
at java.util.TimSort.sort(TimSort.java:239)
at java.util.Arrays.sort(Arrays.java:1512)
at java.util.ArrayList.sort(ArrayList.java:1460)
at java.util.stream.SortedOps$RefSortingSink.end(SortedOps.java:387)
at java.util.stream.Sink$ChainedReference.end(Sink.java:258)
at 
java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:210)
at 
java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161)
at 
java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
at 
org.apache.spark.util.kvstore.InMemoryStore$InMemoryIterator.hasNext(InMemoryStore.java:278)
at 
org.apache.spark.status.AppStatusStore.lastStageAttempt(AppStatusStore.scala:101)
at 
org.apache.spark.ui.jobs.ApiHelper$$anonfun$38.apply(StagePage.scala:1014)
at 
org.apache.spark.ui.jobs.ApiHelper$$anonfun$38.apply(StagePage.scala:1014)
at 
org.apache.spark.status.AppStatusStore.asOption(AppStatusStore.scala:408)
at 
org.apache.spark.ui.jobs.ApiHelper$.lastStageNameAndDescription(StagePage.scala:1014)
at 
org.apache.spark.ui.jobs.JobDataSource.org$apache$spark$ui$jobs$JobDataSource$$jobRow(AllJobsPage.scala:434)
at 
org.apache.spark.ui.jobs.JobDataSource$$anonfun$24.apply(AllJobsPage.scala:412)
at 
org.apache.spark.ui.jobs.JobDataSource$$anonfun$24.apply(AllJobsPage.scala:412)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at 
scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.ui.jobs.JobDataSource.(AllJobsPage.scala:412)
at org.apache.spark.ui.jobs.JobPagedTable.(AllJobsPage.scala:504)
at org.apache.spark.ui.jobs.AllJobsPage.jobsTable(AllJobsPage.scala:246)
at org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:295)
at org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98)
at org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98)
at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at 
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
{code}

According to the 

[jira] [Commented] (SPARK-23433) java.lang.IllegalStateException: more than one active taskSet for stage

2018-02-16 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367728#comment-16367728
 ] 

Shixiong Zhu commented on SPARK-23433:
--

[~irashid] I'm busy with other stuff and not working on this. Your approach 
sounds good to me. Please go ahead if you have time to work on this.

> java.lang.IllegalStateException: more than one active taskSet for stage
> ---
>
> Key: SPARK-23433
> URL: https://issues.apache.org/jira/browse/SPARK-23433
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Shixiong Zhu
>Priority: Major
>
> This following error thrown by DAGScheduler stopped the cluster:
> {code}
> 18/02/11 13:22:27 ERROR DAGSchedulerEventProcessLoop: 
> DAGSchedulerEventProcessLoop failed; shutting down SparkContext
> java.lang.IllegalStateException: more than one active taskSet for stage 
> 7580621: 7580621.2,7580621.1
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.submitTasks(TaskSchedulerImpl.scala:229)
>   at 
> org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1193)
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:1059)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$submitWaitingChildStages$6.apply(DAGScheduler.scala:900)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$submitWaitingChildStages$6.apply(DAGScheduler.scala:899)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at 
> org.apache.spark.scheduler.DAGScheduler.submitWaitingChildStages(DAGScheduler.scala:899)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1427)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1929)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1880)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1868)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23433) java.lang.IllegalStateException: more than one active taskSet for stage

2018-02-15 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366421#comment-16366421
 ] 

Shixiong Zhu commented on SPARK-23433:
--

cc [~irashid]

> java.lang.IllegalStateException: more than one active taskSet for stage
> ---
>
> Key: SPARK-23433
> URL: https://issues.apache.org/jira/browse/SPARK-23433
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Shixiong Zhu
>Priority: Major
>
> This following error thrown by DAGScheduler stopped the cluster:
> {code}
> 18/02/11 13:22:27 ERROR DAGSchedulerEventProcessLoop: 
> DAGSchedulerEventProcessLoop failed; shutting down SparkContext
> java.lang.IllegalStateException: more than one active taskSet for stage 
> 7580621: 7580621.2,7580621.1
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.submitTasks(TaskSchedulerImpl.scala:229)
>   at 
> org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1193)
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:1059)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$submitWaitingChildStages$6.apply(DAGScheduler.scala:900)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$submitWaitingChildStages$6.apply(DAGScheduler.scala:899)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at 
> org.apache.spark.scheduler.DAGScheduler.submitWaitingChildStages(DAGScheduler.scala:899)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1427)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1929)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1880)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1868)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23433) java.lang.IllegalStateException: more than one active taskSet for stage

2018-02-15 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366417#comment-16366417
 ] 

Shixiong Zhu commented on SPARK-23433:
--

{code}
18/02/11 13:22:20 INFO TaskSetManager: Finished task 17.0 in stage 7580621.1 
(TID 65577139) in 303870 ms on 10.0.246.111 (executor 24) (18/19)
18/02/11 13:22:20 INFO DAGScheduler: ShuffleMapStage 7580621 (start at 
command-2841337:340) finished in 303.880 s
18/02/11 13:22:20 INFO DAGScheduler: Resubmitting ShuffleMapStage 7580621 
(start at command-2841337:340) because some of its tasks had failed: 2, 15, 27, 
28, 41
18/02/11 13:22:27 INFO DAGScheduler: Submitting ShuffleMapStage 7580621 
(MapPartitionsRDD[2660062] at start at command-2841337:340), which has no 
missing parents
18/02/11 13:22:27 INFO DAGScheduler: Submitting 5 missing tasks from 
ShuffleMapStage 7580621 (MapPartitionsRDD[2660062] at start at 
command-2841337:340) (first 15 tasks are for partitions Vector(2, 15, 27, 28, 
41))
18/02/11 13:22:27 INFO TaskSchedulerImpl: Adding task set 7580621.2 with 5 tasks
18/02/11 13:22:27 ERROR DAGSchedulerEventProcessLoop: 
DAGSchedulerEventProcessLoop failed; shutting down SparkContext
java.lang.IllegalStateException: more than one active taskSet for stage 
7580621: 7580621.2,7580621.1
at 
org.apache.spark.scheduler.TaskSchedulerImpl.submitTasks(TaskSchedulerImpl.scala:229)
at 
org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1193)
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:1059)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$submitWaitingChildStages$6.apply(DAGScheduler.scala:900)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$submitWaitingChildStages$6.apply(DAGScheduler.scala:899)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at 
org.apache.spark.scheduler.DAGScheduler.submitWaitingChildStages(DAGScheduler.scala:899)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1427)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1929)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1880)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1868)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
18/02/11 13:22:27 INFO TaskSchedulerImpl: Cancelling stage 7580621
18/02/11 13:22:27 INFO TaskSchedulerImpl: Cancelling stage 7580621
18/02/11 13:22:27 INFO TaskSchedulerImpl: Stage 7580621 was cancelled
18/02/11 13:22:27 INFO DAGScheduler: ShuffleMapStage 7580621 (start at 
command-2841337:340) failed in 0.057 s due to Job aborted due to stage failure: 
Stage 7580621 cancelled
org.apache.spark.SparkException: Job aborted due to stage failure: Stage 
7580621 cancelled
18/02/11 13:22:27 WARN TaskSetManager: Lost task 18.0 in stage 7580621.1 (TID 
65577140, 10.0.144.170, executor 16): TaskKilled (Stage cancelled)
{code}

According to the above logs, I think the issue is in this line: 
https://github.com/apache/spark/blob/1dc2c1d5e85c5f404f470aeb44c1f3c22786bdea/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1281

"Task 18.0 in stage 7580621.0" finished and updated 
"shuffleStage.pendingPartitions" when "Task 18.0 in stage 7580621.1" was still 
running. Hence, when 18 of 19 tasks finished in "stage 7580621.1", this 
condition 
(https://github.com/apache/spark/blob/1dc2c1d5e85c5f404f470aeb44c1f3c22786bdea/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1284)
 would be true and trigger "stage 7580621.2".



> java.lang.IllegalStateException: more than one active taskSet for stage
> ---
>
> Key: SPARK-23433
> URL: https://issues.apache.org/jira/browse/SPARK-23433
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Shixiong Zhu
>Priority: Major
>
> This following error thrown by DAGScheduler stopped the cluster:
> {code}
> 18/02/11 13:22:27 ERROR DAGSchedulerEventProcessLoop: 
> DAGSchedulerEventProcessLoop failed; shutting down SparkContext
> java.lang.IllegalStateException: more than one active taskSet for stage 
> 7580621: 7580621.2,7580621.1
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.submitTasks(TaskSchedulerImpl.scala:229)
>   at 
> org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1193)
>   at 
> 

[jira] [Resolved] (SPARK-23430) Cannot sort "Executor ID" or "Host" columns in the task table

2018-02-15 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-23430.
--
Resolution: Duplicate

> Cannot sort "Executor ID" or "Host" columns in the task table
> -
>
> Key: SPARK-23430
> URL: https://issues.apache.org/jira/browse/SPARK-23430
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Blocker
>  Labels: regression
>
> Click the "Executor ID" or "Host" header in the task table and it will fail:
> {code}
> java.lang.IllegalArgumentException: Invalid sort column: Executor ID
>   at org.apache.spark.ui.jobs.ApiHelper$.indexName(StagePage.scala:1009)
>   at 
> org.apache.spark.ui.jobs.TaskDataSource.sliceData(StagePage.scala:686)
>   at org.apache.spark.ui.PagedDataSource.pageData(PagedTable.scala:61)
>   at org.apache.spark.ui.PagedTable$class.table(PagedTable.scala:96)
>   at org.apache.spark.ui.jobs.TaskPagedTable.table(StagePage.scala:700)
>   at org.apache.spark.ui.jobs.StagePage.liftedTree1$1(StagePage.scala:293)
>   at org.apache.spark.ui.jobs.StagePage.render(StagePage.scala:282)
>   at org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98)
>   at org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98)
>   at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.eclipse.jetty.server.Server.handle(Server.java:534)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23433) java.lang.IllegalStateException: more than one active taskSet for stage

2018-02-14 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-23433:


 Summary: java.lang.IllegalStateException: more than one active 
taskSet for stage
 Key: SPARK-23433
 URL: https://issues.apache.org/jira/browse/SPARK-23433
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.2.1
Reporter: Shixiong Zhu


This following error thrown by DAGScheduler stopped the cluster:

{code}
18/02/11 13:22:27 ERROR DAGSchedulerEventProcessLoop: 
DAGSchedulerEventProcessLoop failed; shutting down SparkContext
java.lang.IllegalStateException: more than one active taskSet for stage 
7580621: 7580621.2,7580621.1
at 
org.apache.spark.scheduler.TaskSchedulerImpl.submitTasks(TaskSchedulerImpl.scala:229)
at 
org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1193)
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:1059)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$submitWaitingChildStages$6.apply(DAGScheduler.scala:900)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$submitWaitingChildStages$6.apply(DAGScheduler.scala:899)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at 
org.apache.spark.scheduler.DAGScheduler.submitWaitingChildStages(DAGScheduler.scala:899)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1427)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1929)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1880)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1868)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23430) Cannot sort "Executor ID" or "Host" columns in the task table

2018-02-14 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-23430:


 Summary: Cannot sort "Executor ID" or "Host" columns in the task 
table
 Key: SPARK-23430
 URL: https://issues.apache.org/jira/browse/SPARK-23430
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 2.3.0
Reporter: Shixiong Zhu
Assignee: Shixiong Zhu


Click the "Executor ID" or "Host" header in the task table and it will fail:
{code}
java.lang.IllegalArgumentException: Invalid sort column: Executor ID
at org.apache.spark.ui.jobs.ApiHelper$.indexName(StagePage.scala:1009)
at 
org.apache.spark.ui.jobs.TaskDataSource.sliceData(StagePage.scala:686)
at org.apache.spark.ui.PagedDataSource.pageData(PagedTable.scala:61)
at org.apache.spark.ui.PagedTable$class.table(PagedTable.scala:96)
at org.apache.spark.ui.jobs.TaskPagedTable.table(StagePage.scala:700)
at org.apache.spark.ui.jobs.StagePage.liftedTree1$1(StagePage.scala:293)
at org.apache.spark.ui.jobs.StagePage.render(StagePage.scala:282)
at org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98)
at org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98)
at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at 
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23351) checkpoint corruption in long running application

2018-02-13 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363115#comment-16363115
 ] 

Shixiong Zhu commented on SPARK-23351:
--

[~davidahern] It's better to ask the vendor for support. They may have a 
different release cycle.

> checkpoint corruption in long running application
> -
>
> Key: SPARK-23351
> URL: https://issues.apache.org/jira/browse/SPARK-23351
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.2.0
>Reporter: David Ahern
>Priority: Major
>
> hi, after leaving my (somewhat high volume) Structured Streaming application 
> running for some time, i get the following exception.  The same exception 
> also repeats when i try to restart the application.  The only way to get the 
> application back running is to clear the checkpoint directory which is far 
> from ideal.
> Maybe a stream is not being flushed/closed properly internally by Spark when 
> checkpointing?
>  
>  User class threw exception: 
> org.apache.spark.sql.streaming.StreamingQueryException: Job aborted due to 
> stage failure: Task 55 in stage 1.0 failed 4 times, most recent failure: Lost 
> task 55.3 in stage 1.0 (TID 240, gbslixaacspa04u.metis.prd, executor 2): 
> java.io.EOFException
>  at java.io.DataInputStream.readInt(DataInputStream.java:392)
>  at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$readSnapshotFile(HDFSBackedStateStoreProvider.scala:481)
>  at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1.apply(HDFSBackedStateStoreProvider.scala:359)
>  at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1.apply(HDFSBackedStateStoreProvider.scala:358)
>  at scala.Option.getOrElse(Option.scala:121)
>  at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap(HDFSBackedStateStoreProvider.scala:358)
>  at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.getStore(HDFSBackedStateStoreProvider.scala:265)
>  at 
> org.apache.spark.sql.execution.streaming.state.StateStore$.get(StateStore.scala:200)
>  at 
> org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:61)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  at org.apache.spark.scheduler.Task.run(Task.scala:108)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23400) Add the extra constructors for ScalaUDF

2018-02-13 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-23400:
-
Fix Version/s: 2.4.0

> Add the extra constructors for ScalaUDF
> ---
>
> Key: SPARK-23400
> URL: https://issues.apache.org/jira/browse/SPARK-23400
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Major
> Fix For: 2.3.1, 2.4.0
>
>
> The last few releases, we changed the interface of ScalaUDF. Unfortunately, 
> some Spark Package (spark-deep-learning) are using our internal class 
> `ScalaUDF`. In the release 2.3, we added new parameters into these class. The 
> users hit the binary compatibility issues and got the exception:
> > java.lang.NoSuchMethodError: 
> > org.apache.spark.sql.catalyst.expressions.ScalaUDF.init(Ljava/lang/Object;Lorg/apache/spark/sql/types/DataType;Lscala/collection/Seq;Lscala/collection/Seq;Lscala/Option;)V



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23400) Add the extra constructors for ScalaUDF

2018-02-13 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-23400.
--
   Resolution: Fixed
Fix Version/s: 2.3.1

Issue resolved by pull request 20591
[https://github.com/apache/spark/pull/20591]

> Add the extra constructors for ScalaUDF
> ---
>
> Key: SPARK-23400
> URL: https://issues.apache.org/jira/browse/SPARK-23400
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Major
> Fix For: 2.3.1
>
>
> The last few releases, we changed the interface of ScalaUDF. Unfortunately, 
> some Spark Package (spark-deep-learning) are using our internal class 
> `ScalaUDF`. In the release 2.3, we added new parameters into these class. The 
> users hit the binary compatibility issues and got the exception:
> > java.lang.NoSuchMethodError: 
> > org.apache.spark.sql.catalyst.expressions.ScalaUDF.init(Ljava/lang/Object;Lorg/apache/spark/sql/types/DataType;Lscala/collection/Seq;Lscala/collection/Seq;Lscala/Option;)V



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23351) checkpoint corruption in long running application

2018-02-08 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357632#comment-16357632
 ] 

Shixiong Zhu commented on SPARK-23351:
--

I believe this should be resolved in 
https://issues.apache.org/jira/browse/SPARK-21696. Could you try Spark 2.2.1?

> checkpoint corruption in long running application
> -
>
> Key: SPARK-23351
> URL: https://issues.apache.org/jira/browse/SPARK-23351
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.2.0
>Reporter: David Ahern
>Priority: Major
>
> hi, after leaving my (somewhat high volume) Structured Streaming application 
> running for some time, i get the following exception.  The same exception 
> also repeats when i try to restart the application.  The only way to get the 
> application back running is to clear the checkpoint directory which is far 
> from ideal.
> Maybe a stream is not being flushed/closed properly internally by Spark when 
> checkpointing?
>  
>  User class threw exception: 
> org.apache.spark.sql.streaming.StreamingQueryException: Job aborted due to 
> stage failure: Task 55 in stage 1.0 failed 4 times, most recent failure: Lost 
> task 55.3 in stage 1.0 (TID 240, gbslixaacspa04u.metis.prd, executor 2): 
> java.io.EOFException
>  at java.io.DataInputStream.readInt(DataInputStream.java:392)
>  at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$readSnapshotFile(HDFSBackedStateStoreProvider.scala:481)
>  at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1.apply(HDFSBackedStateStoreProvider.scala:359)
>  at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1.apply(HDFSBackedStateStoreProvider.scala:358)
>  at scala.Option.getOrElse(Option.scala:121)
>  at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap(HDFSBackedStateStoreProvider.scala:358)
>  at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.getStore(HDFSBackedStateStoreProvider.scala:265)
>  at 
> org.apache.spark.sql.execution.streaming.state.StateStore$.get(StateStore.scala:200)
>  at 
> org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:61)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  at org.apache.spark.scheduler.Task.run(Task.scala:108)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23351) checkpoint corruption in long running application

2018-02-08 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357629#comment-16357629
 ] 

Shixiong Zhu commented on SPARK-23351:
--

What's your file system? HDFS?

> checkpoint corruption in long running application
> -
>
> Key: SPARK-23351
> URL: https://issues.apache.org/jira/browse/SPARK-23351
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.2.0
>Reporter: David Ahern
>Priority: Major
>
> hi, after leaving my (somewhat high volume) Structured Streaming application 
> running for some time, i get the following exception.  The same exception 
> also repeats when i try to restart the application.  The only way to get the 
> application back running is to clear the checkpoint directory which is far 
> from ideal.
> Maybe a stream is not being flushed/closed properly internally by Spark when 
> checkpointing?
>  
>  User class threw exception: 
> org.apache.spark.sql.streaming.StreamingQueryException: Job aborted due to 
> stage failure: Task 55 in stage 1.0 failed 4 times, most recent failure: Lost 
> task 55.3 in stage 1.0 (TID 240, gbslixaacspa04u.metis.prd, executor 2): 
> java.io.EOFException
>  at java.io.DataInputStream.readInt(DataInputStream.java:392)
>  at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$readSnapshotFile(HDFSBackedStateStoreProvider.scala:481)
>  at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1.apply(HDFSBackedStateStoreProvider.scala:359)
>  at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1.apply(HDFSBackedStateStoreProvider.scala:358)
>  at scala.Option.getOrElse(Option.scala:121)
>  at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap(HDFSBackedStateStoreProvider.scala:358)
>  at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.getStore(HDFSBackedStateStoreProvider.scala:265)
>  at 
> org.apache.spark.sql.execution.streaming.state.StateStore$.get(StateStore.scala:200)
>  at 
> org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:61)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  at org.apache.spark.scheduler.Task.run(Task.scala:108)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23326) "Scheduler Delay" of a task is confusing

2018-02-02 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-23326:
-
Description: 
Run the following code and check the UI
{code}
sc.makeRDD(1 to 1, 1).foreach { i => Thread.sleep(3) }
{code}

You will see "Scheduler Delay" of a task is almost the same as "Duration" until 
the task finishes. That's really confusing.

In Spark 2.2,  "Scheduler Delay" will be 0 until the task finishes. This is 
also not correct but less confusing.

> "Scheduler Delay" of a task is confusing
> 
>
> Key: SPARK-23326
> URL: https://issues.apache.org/jira/browse/SPARK-23326
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Priority: Major
>
> Run the following code and check the UI
> {code}
> sc.makeRDD(1 to 1, 1).foreach { i => Thread.sleep(3) }
> {code}
> You will see "Scheduler Delay" of a task is almost the same as "Duration" 
> until the task finishes. That's really confusing.
> In Spark 2.2,  "Scheduler Delay" will be 0 until the task finishes. This is 
> also not correct but less confusing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23326) "Scheduler Delay" of a task is confusing

2018-02-02 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-23326:
-
Environment: (was: Run the following code and check the UI
{code}
sc.makeRDD(1 to 1, 1).foreach { i => Thread.sleep(3) }
{code}

You will see "Scheduler Delay" of a task is almost the same as "Duration" until 
the task finishes.)

> "Scheduler Delay" of a task is confusing
> 
>
> Key: SPARK-23326
> URL: https://issues.apache.org/jira/browse/SPARK-23326
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23326) "Scheduler Delay" of a task is confusing

2018-02-02 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-23326:


 Summary: "Scheduler Delay" of a task is confusing
 Key: SPARK-23326
 URL: https://issues.apache.org/jira/browse/SPARK-23326
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 2.3.0
 Environment: Run the following code and check the UI
{code}
sc.makeRDD(1 to 1, 1).foreach { i => Thread.sleep(3) }
{code}

You will see "Scheduler Delay" of a task is almost the same as "Duration" until 
the task finishes.
Reporter: Shixiong Zhu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23294) Spark Streaming + Rate source + Console Sink : Receiver MaxRate is violated

2018-02-01 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349411#comment-16349411
 ] 

Shixiong Zhu commented on SPARK-23294:
--

[~rmatte] the configurations you posted in the ticket is for old DStreams. They 
are not used by Structured Streaming. In Structured Streaming, we don't need to 
support backpressure since there are not receiver-based sources.

> Spark Streaming + Rate source + Console Sink : Receiver MaxRate is violated
> ---
>
> Key: SPARK-23294
> URL: https://issues.apache.org/jira/browse/SPARK-23294
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.2.1
>Reporter: Ravinder Matte
>Assignee: Takeshi Yamamuro
>Priority: Minor
>  Labels: ConsoleSink, RateSource, backpressure, maxRate
>
> Using following configs while building SparkSession
> ("spark.streaming.backpressure.enabled", "true")
> ("spark.streaming.receiver.maxRate", "100")
> ("spark.streaming.backpressure.initialRate", "100")
>  
> Source: Rate Source with following options.
> rowsPerSecond=10
> Sink: Console Sink. 
>  
> I am expecting the process rate to limit at 100 Rows per Second, but maxRate 
> is ignored and streaming job is processing at 10 rate.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23294) Spark Streaming + Rate source + Console Sink : Receiver MaxRate is violated

2018-02-01 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-23294.
--
Resolution: Not A Problem

> Spark Streaming + Rate source + Console Sink : Receiver MaxRate is violated
> ---
>
> Key: SPARK-23294
> URL: https://issues.apache.org/jira/browse/SPARK-23294
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.2.1
>Reporter: Ravinder Matte
>Assignee: Takeshi Yamamuro
>Priority: Minor
>  Labels: ConsoleSink, RateSource, backpressure, maxRate
>
> Using following configs while building SparkSession
> ("spark.streaming.backpressure.enabled", "true")
> ("spark.streaming.receiver.maxRate", "100")
> ("spark.streaming.backpressure.initialRate", "100")
>  
> Source: Rate Source with following options.
> rowsPerSecond=10
> Sink: Console Sink. 
>  
> I am expecting the process rate to limit at 100 Rows per Second, but maxRate 
> is ignored and streaming job is processing at 10 rate.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23307) Spark UI should sort jobs/stages with the completed timestamp before cleaning up them

2018-02-01 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu reassigned SPARK-23307:


Assignee: Shixiong Zhu

> Spark UI should sort jobs/stages with the completed timestamp before cleaning 
> up them
> -
>
> Key: SPARK-23307
> URL: https://issues.apache.org/jira/browse/SPARK-23307
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
>
> When you have a long running job, it may be deleted from UI quickly when it 
> completes, if you happen to run a small job after it. It's pretty annoying 
> when you run lots of jobs in the same driver concurrently (e.g., running 
> multiple Structured Streaming queries). We should sort jobs/stages with the 
> completed timestamp before cleaning up them.
> In 2.2, Spark has a separated buffer for completed jobs/stages, so it doesn't 
> need to sort the jobs/stages.
> What's the behavior I expect:
> Set "spark.ui.retainedJobs" to 10 and run the following codes, job 0 should 
> be kept in the Spark UI.
>  
> {code:java}
> new Thread() {
>   override def run() {
>     // job 0
>     sc.makeRDD(1 to 1, 1).foreach { i =>
>     Thread.sleep(1)
>    }
>   }
> }.start()
> Thread.sleep(1000)
> for (_ <- 1 to 20) {
>   new Thread() {
>     override def run() {
>       sc.makeRDD(1 to 1, 1).foreach { i =>
>       }
>     }
>   }.start()
> }
> Thread.sleep(15000)
>   sc.makeRDD(1 to 1, 1).foreach { i =>
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23307) Spark UI should sort jobs/stages with the completed timestamp before cleaning up them

2018-02-01 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-23307:
-
Description: 
When you have a long running job, it may be deleted from UI quickly when it 
completes, if you happen to run a small job after it. It's pretty annoying when 
you run lots of jobs in the same driver concurrently (e.g., running multiple 
Structured Streaming queries). We should sort jobs/stages with the completed 
timestamp before cleaning up them.

In 2.2, Spark has a separated buffer for completed jobs/stages, so it doesn't 
need to sort the jobs/stages.

What's the behavior I expect:

Set "spark.ui.retainedJobs" to 10 and run the following codes, job 0 should be 
kept in the Spark UI.

 
{code:java}
new Thread() {
  override def run() {

    // job 0
    sc.makeRDD(1 to 1, 1).foreach { i =>
    Thread.sleep(1)
   }
  }
}.start()

Thread.sleep(1000)

for (_ <- 1 to 20) {
  new Thread() {
    override def run() {
      sc.makeRDD(1 to 1, 1).foreach { i =>
      }
    }
  }.start()
}

Thread.sleep(15000)
  sc.makeRDD(1 to 1, 1).foreach { i =>
}

{code}

  was:
When you have a long running job, it may be deleted from UI quickly when it 
completes, if you happen to run a small job after it. It's pretty annoying when 
you run lots of jobs in the same driver concurrently (e.g., running multiple 
Structured Streaming queries). We should sort jobs/stages with the completed 
timestamp before cleaning up them.

In 2.2, Spark has a separated buffer for completed jobs/stages, so it doesn't 
need to sort the jobs/stages.

What the behavior I expect:

Set "spark.ui.retainedJobs" to 10 and run the following codes, job 0 should be 
kept in the Spark UI.

 

{code}

new Thread() {
  override def run() {

    // job 0
    sc.makeRDD(1 to 1, 1).foreach { i =>
    Thread.sleep(1)
   }
  }
}.start()

Thread.sleep(1000)

for (_ <- 1 to 20) {
  new Thread() {
    override def run() {
      sc.makeRDD(1 to 1, 1).foreach { i =>
      }
    }
  }.start()
}

Thread.sleep(15000)
  sc.makeRDD(1 to 1, 1).foreach { i =>
}

{code}


> Spark UI should sort jobs/stages with the completed timestamp before cleaning 
> up them
> -
>
> Key: SPARK-23307
> URL: https://issues.apache.org/jira/browse/SPARK-23307
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Priority: Major
>
> When you have a long running job, it may be deleted from UI quickly when it 
> completes, if you happen to run a small job after it. It's pretty annoying 
> when you run lots of jobs in the same driver concurrently (e.g., running 
> multiple Structured Streaming queries). We should sort jobs/stages with the 
> completed timestamp before cleaning up them.
> In 2.2, Spark has a separated buffer for completed jobs/stages, so it doesn't 
> need to sort the jobs/stages.
> What's the behavior I expect:
> Set "spark.ui.retainedJobs" to 10 and run the following codes, job 0 should 
> be kept in the Spark UI.
>  
> {code:java}
> new Thread() {
>   override def run() {
>     // job 0
>     sc.makeRDD(1 to 1, 1).foreach { i =>
>     Thread.sleep(1)
>    }
>   }
> }.start()
> Thread.sleep(1000)
> for (_ <- 1 to 20) {
>   new Thread() {
>     override def run() {
>       sc.makeRDD(1 to 1, 1).foreach { i =>
>       }
>     }
>   }.start()
> }
> Thread.sleep(15000)
>   sc.makeRDD(1 to 1, 1).foreach { i =>
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23307) Spark UI should sort jobs/stages with the completed timestamp before cleaning up them

2018-02-01 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349288#comment-16349288
 ] 

Shixiong Zhu commented on SPARK-23307:
--

cc [~vanzin] [~cloud_fan]

> Spark UI should sort jobs/stages with the completed timestamp before cleaning 
> up them
> -
>
> Key: SPARK-23307
> URL: https://issues.apache.org/jira/browse/SPARK-23307
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Priority: Major
>
> When you have a long running job, it may be deleted from UI quickly when it 
> completes, if you happen to run a small job after it. It's pretty annoying 
> when you run lots of jobs in the same driver concurrently (e.g., running 
> multiple Structured Streaming queries). We should sort jobs/stages with the 
> completed timestamp before cleaning up them.
> In 2.2, Spark has a separated buffer for completed jobs/stages, so it doesn't 
> need to sort the jobs/stages.
> What the behavior I expect:
> Set "spark.ui.retainedJobs" to 10 and run the following codes, job 0 should 
> be kept in the Spark UI.
>  
> {code}
> new Thread() {
>   override def run() {
>     // job 0
>     sc.makeRDD(1 to 1, 1).foreach { i =>
>     Thread.sleep(1)
>    }
>   }
> }.start()
> Thread.sleep(1000)
> for (_ <- 1 to 20) {
>   new Thread() {
>     override def run() {
>       sc.makeRDD(1 to 1, 1).foreach { i =>
>       }
>     }
>   }.start()
> }
> Thread.sleep(15000)
>   sc.makeRDD(1 to 1, 1).foreach { i =>
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23307) Spark UI should sort jobs/stages with the completed timestamp before cleaning up them

2018-02-01 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-23307:


 Summary: Spark UI should sort jobs/stages with the completed 
timestamp before cleaning up them
 Key: SPARK-23307
 URL: https://issues.apache.org/jira/browse/SPARK-23307
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 2.3.0
Reporter: Shixiong Zhu


When you have a long running job, it may be deleted from UI quickly when it 
completes, if you happen to run a small job after it. It's pretty annoying when 
you run lots of jobs in the same driver concurrently (e.g., running multiple 
Structured Streaming queries). We should sort jobs/stages with the completed 
timestamp before cleaning up them.

In 2.2, Spark has a separated buffer for completed jobs/stages, so it doesn't 
need to sort the jobs/stages.

What the behavior I expect:

Set "spark.ui.retainedJobs" to 10 and run the following codes, job 0 should be 
kept in the Spark UI.

 

{code}

new Thread() {
  override def run() {

    // job 0
    sc.makeRDD(1 to 1, 1).foreach { i =>
    Thread.sleep(1)
   }
  }
}.start()

Thread.sleep(1000)

for (_ <- 1 to 20) {
  new Thread() {
    override def run() {
      sc.makeRDD(1 to 1, 1).foreach { i =>
      }
    }
  }.start()
}

Thread.sleep(15000)
  sc.makeRDD(1 to 1, 1).foreach { i =>
}

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23289) OneForOneBlockFetcher.DownloadCallback.onData may write just a part of data

2018-01-31 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-23289:


 Summary: OneForOneBlockFetcher.DownloadCallback.onData may write 
just a part of data
 Key: SPARK-23289
 URL: https://issues.apache.org/jira/browse/SPARK-23289
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.2.1, 2.2.0, 2.3.0
Reporter: Shixiong Zhu
Assignee: Shixiong Zhu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23245) KafkaContinuousSourceSuite may hang forever

2018-01-26 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu reassigned SPARK-23245:


Assignee: Jose Torres

> KafkaContinuousSourceSuite may hang forever
> ---
>
> Key: SPARK-23245
> URL: https://issues.apache.org/jira/browse/SPARK-23245
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming, Tests
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Assignee: Jose Torres
>Priority: Major
> Fix For: 2.3.0
>
>
> The following stream execution thread is holding the lock 
> "IncrementalExecution".
> {code}
> "stream execution thread for [id = 83790664-fd66-4645-b55a-37c17897c691, 
> runId = febf6c2a-1372-4c83-998c-90984a9a02c2]" #2653 daemon prio=5 os_prio=0 
> tid=0x7ff511ae2000 nid=0xcde1 waiting on condition [0x7ff32ebbb000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x00071a25fa80> (a 
> scala.concurrent.impl.Promise$CompletionLatch)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>  at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
>  at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
>  at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)
>  at org.apache.spark.util.ThreadUtils$.awaitReady(ThreadUtils.scala:222)
>  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:731)
>  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2109)
>  at 
> org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2.scala:78)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:135)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$3.apply(SparkPlan.scala:167)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:164)
>  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:112)
>  - locked <0x00071a256e10> (a 
> org.apache.spark.sql.execution.streaming.IncrementalExecution)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:112)
>  at 
> org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution$$anonfun$runContinuous$3$$anonfun$apply$1.apply(ContinuousExecution.scala:273)
>  at 
> org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution$$anonfun$runContinuous$3$$anonfun$apply$1.apply(ContinuousExecution.scala:273)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:88)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:124)
>  at 
> org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution$$anonfun$runContinuous$3.apply(ContinuousExecution.scala:273)
>  at 
> org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution$$anonfun$runContinuous$3.apply(ContinuousExecution.scala:273)
>  at 
> org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271)
>  at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:60)
>  at 
> org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution.runContinuous(ContinuousExecution.scala:271)
>  at 
> org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution.runActivatedStream(ContinuousExecution.scala:94)
>  at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:291)
>  at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:201)
> {code}
>  
> But the test thread is waiting for the same lock
> {code}
> "pool-1-thread-1-ScalaTest-running-KafkaContinuousSourceSuite" #20 prio=5 
> os_prio=0 tid=0x7ff5b4f1e800 nid=0x5566 waiting for monitor entry 
> [0x7ff51cffb000]
>  java.lang.Thread.State: BLOCKED (on object monitor)
>  at 
> 

[jira] [Resolved] (SPARK-23245) KafkaContinuousSourceSuite may hang forever

2018-01-26 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-23245.
--
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 20413
[https://github.com/apache/spark/pull/20413]

> KafkaContinuousSourceSuite may hang forever
> ---
>
> Key: SPARK-23245
> URL: https://issues.apache.org/jira/browse/SPARK-23245
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming, Tests
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Priority: Major
> Fix For: 2.3.0
>
>
> The following stream execution thread is holding the lock 
> "IncrementalExecution".
> {code}
> "stream execution thread for [id = 83790664-fd66-4645-b55a-37c17897c691, 
> runId = febf6c2a-1372-4c83-998c-90984a9a02c2]" #2653 daemon prio=5 os_prio=0 
> tid=0x7ff511ae2000 nid=0xcde1 waiting on condition [0x7ff32ebbb000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x00071a25fa80> (a 
> scala.concurrent.impl.Promise$CompletionLatch)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>  at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
>  at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
>  at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)
>  at org.apache.spark.util.ThreadUtils$.awaitReady(ThreadUtils.scala:222)
>  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:731)
>  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2109)
>  at 
> org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2.scala:78)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:135)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$3.apply(SparkPlan.scala:167)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:164)
>  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:112)
>  - locked <0x00071a256e10> (a 
> org.apache.spark.sql.execution.streaming.IncrementalExecution)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:112)
>  at 
> org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution$$anonfun$runContinuous$3$$anonfun$apply$1.apply(ContinuousExecution.scala:273)
>  at 
> org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution$$anonfun$runContinuous$3$$anonfun$apply$1.apply(ContinuousExecution.scala:273)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:88)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:124)
>  at 
> org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution$$anonfun$runContinuous$3.apply(ContinuousExecution.scala:273)
>  at 
> org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution$$anonfun$runContinuous$3.apply(ContinuousExecution.scala:273)
>  at 
> org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271)
>  at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:60)
>  at 
> org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution.runContinuous(ContinuousExecution.scala:271)
>  at 
> org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution.runActivatedStream(ContinuousExecution.scala:94)
>  at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:291)
>  at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:201)
> {code}
>  
> But the test thread is waiting for the same lock
> {code}
> "pool-1-thread-1-ScalaTest-running-KafkaContinuousSourceSuite" #20 prio=5 
> os_prio=0 tid=0x7ff5b4f1e800 nid=0x5566 waiting for monitor entry 
> [0x7ff51cffb000]
>  java.lang.Thread.State: BLOCKED (on object monitor)
>  at 
> 

[jira] [Created] (SPARK-23245) KafkaContinuousSourceSuite may hang forever

2018-01-26 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-23245:


 Summary: KafkaContinuousSourceSuite may hang forever
 Key: SPARK-23245
 URL: https://issues.apache.org/jira/browse/SPARK-23245
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming, Tests
Affects Versions: 2.3.0
Reporter: Shixiong Zhu


The following stream execution thread is holding the lock 
"IncrementalExecution".

{code}

"stream execution thread for [id = 83790664-fd66-4645-b55a-37c17897c691, runId 
= febf6c2a-1372-4c83-998c-90984a9a02c2]" #2653 daemon prio=5 os_prio=0 
tid=0x7ff511ae2000 nid=0xcde1 waiting on condition [0x7ff32ebbb000]
 java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for <0x00071a25fa80> (a 
scala.concurrent.impl.Promise$CompletionLatch)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
 at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
 at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
 at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)
 at org.apache.spark.util.ThreadUtils$.awaitReady(ThreadUtils.scala:222)
 at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:731)
 at org.apache.spark.SparkContext.runJob(SparkContext.scala:2109)
 at 
org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2.scala:78)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:135)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$3.apply(SparkPlan.scala:167)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:164)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:112)
 - locked <0x00071a256e10> (a 
org.apache.spark.sql.execution.streaming.IncrementalExecution)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:112)
 at 
org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution$$anonfun$runContinuous$3$$anonfun$apply$1.apply(ContinuousExecution.scala:273)
 at 
org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution$$anonfun$runContinuous$3$$anonfun$apply$1.apply(ContinuousExecution.scala:273)
 at 
org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:88)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:124)
 at 
org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution$$anonfun$runContinuous$3.apply(ContinuousExecution.scala:273)
 at 
org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution$$anonfun$runContinuous$3.apply(ContinuousExecution.scala:273)
 at 
org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271)
 at 
org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:60)
 at 
org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution.runContinuous(ContinuousExecution.scala:271)
 at 
org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution.runActivatedStream(ContinuousExecution.scala:94)
 at 
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:291)
 at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:201)

{code}

 

But the test thread is waiting for the same lock

{code}

"pool-1-thread-1-ScalaTest-running-KafkaContinuousSourceSuite" #20 prio=5 
os_prio=0 tid=0x7ff5b4f1e800 nid=0x5566 waiting for monitor entry 
[0x7ff51cffb000]
 java.lang.Thread.State: BLOCKED (on object monitor)
 at 
org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:109)
 - waiting to lock <0x00071a256e10> (a 
org.apache.spark.sql.execution.streaming.IncrementalExecution)
 at 
org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:109)
 at 
org.apache.spark.sql.streaming.StreamTest$$anonfun$liftedTree1$1$1$$anonfun$apply$25.apply(StreamTest.scala:475)
 at 

[jira] [Resolved] (SPARK-23242) Don't run tests in KafkaSourceSuiteBase twice

2018-01-26 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-23242.
--
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 20412
[https://github.com/apache/spark/pull/20412]

> Don't run tests in KafkaSourceSuiteBase twice
> -
>
> Key: SPARK-23242
> URL: https://issues.apache.org/jira/browse/SPARK-23242
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming, Tests
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
> Fix For: 2.3.0
>
>
> KafkaSourceSuiteBase should be abstract class, otherwise KafkaSourceSuiteBase 
> will also run.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23242) Don't run tests in KafkaSourceSuiteBase twice

2018-01-26 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-23242:
-
Description: KafkaSourceSuiteBase should be abstract class, otherwise 
KafkaSourceSuiteBase will also run.

> Don't run tests in KafkaSourceSuiteBase twice
> -
>
> Key: SPARK-23242
> URL: https://issues.apache.org/jira/browse/SPARK-23242
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming, Tests
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
>
> KafkaSourceSuiteBase should be abstract class, otherwise KafkaSourceSuiteBase 
> will also run.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23242) Don't run tests in KafkaSourceSuiteBase twice

2018-01-26 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-23242:
-
Component/s: Tests

> Don't run tests in KafkaSourceSuiteBase twice
> -
>
> Key: SPARK-23242
> URL: https://issues.apache.org/jira/browse/SPARK-23242
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming, Tests
>Affects Versions: 2.3.0
> Environment: KafkaSourceSuiteBase should be abstract class, otherwise 
> KafkaSourceSuiteBase will also run.
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23242) Don't run tests in KafkaSourceSuiteBase twice

2018-01-26 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-23242:
-
Environment: (was: KafkaSourceSuiteBase should be abstract class, 
otherwise KafkaSourceSuiteBase will also run.)

> Don't run tests in KafkaSourceSuiteBase twice
> -
>
> Key: SPARK-23242
> URL: https://issues.apache.org/jira/browse/SPARK-23242
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming, Tests
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23242) Don't run tests in KafkaSourceSuiteBase twice

2018-01-26 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-23242:


 Summary: Don't run tests in KafkaSourceSuiteBase twice
 Key: SPARK-23242
 URL: https://issues.apache.org/jira/browse/SPARK-23242
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 2.3.0
 Environment: KafkaSourceSuiteBase should be abstract class, otherwise 
KafkaSourceSuiteBase will also run.
Reporter: Shixiong Zhu
Assignee: Shixiong Zhu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-01-24 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338427#comment-16338427
 ] 

Shixiong Zhu commented on SPARK-23206:
--

We can also just add more information to metrics system and let the external 
system stores the metrics data and display them.

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: MemoryTuningMetricsDesignDoc.pdf
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-01-24 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338421#comment-16338421
 ] 

Shixiong Zhu commented on SPARK-23206:
--

Also cc [~jerryshao] since you were working on metrics system.

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: MemoryTuningMetricsDesignDoc.pdf
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-01-24 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338396#comment-16338396
 ] 

Shixiong Zhu commented on SPARK-23206:
--

cc [~vanzin] 

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: MemoryTuningMetricsDesignDoc.pdf
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23198) Fix KafkaContinuousSourceStressForDontFailOnDataLossSuite to test ContinuousExecution

2018-01-24 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu reassigned SPARK-23198:


Assignee: Dongjoon Hyun

> Fix KafkaContinuousSourceStressForDontFailOnDataLossSuite to test 
> ContinuousExecution
> -
>
> Key: SPARK-23198
> URL: https://issues.apache.org/jira/browse/SPARK-23198
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 2.3.0
>
>
> Currently, `KafkaContinuousSourceStressForDontFailOnDataLossSuite` runs on 
> `MicroBatchExecution`. It should test `ContinuousExecution`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23198) Fix KafkaContinuousSourceStressForDontFailOnDataLossSuite to test ContinuousExecution

2018-01-24 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-23198.
--
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 20374
[https://github.com/apache/spark/pull/20374]

> Fix KafkaContinuousSourceStressForDontFailOnDataLossSuite to test 
> ContinuousExecution
> -
>
> Key: SPARK-23198
> URL: https://issues.apache.org/jira/browse/SPARK-23198
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 2.3.0
>
>
> Currently, `KafkaContinuousSourceStressForDontFailOnDataLossSuite` runs on 
> `MicroBatchExecution`. It should test `ContinuousExecution`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23184) All jobs page is broken when some stage is missing

2018-01-22 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-23184.
--
   Resolution: Duplicate
Fix Version/s: 2.3.0

> All jobs page is broken when some stage is missing
> --
>
> Key: SPARK-23184
> URL: https://issues.apache.org/jira/browse/SPARK-23184
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Priority: Blocker
> Fix For: 2.3.0
>
>
> {code}
> h2. HTTP ERROR 500
> Problem accessing /jobs/. Reason:
> Server Error
>  
> h3. Caused by:
> java.util.NoSuchElementException: No stage with id 44959 at 
> org.apache.spark.status.AppStatusStore.lastStageAttempt(AppStatusStore.scala:104)
>  at 
> org.apache.spark.ui.jobs.JobDataSource.org$apache$spark$ui$jobs$JobDataSource$$jobRow(AllJobsPage.scala:430)
>  at 
> org.apache.spark.ui.jobs.JobDataSource$$anonfun$26.apply(AllJobsPage.scala:408)
>  at 
> org.apache.spark.ui.jobs.JobDataSource$$anonfun$26.apply(AllJobsPage.scala:408)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.immutable.List.foreach(List.scala:381) at 
> scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
>  at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.ui.jobs.JobDataSource.(AllJobsPage.scala:408) at 
> org.apache.spark.ui.jobs.JobPagedTable.(AllJobsPage.scala:502) at 
> org.apache.spark.ui.jobs.AllJobsPage.jobsTable(AllJobsPage.scala:244) at 
> org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:293) at 
> org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98) at 
> org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98) at 
> org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584) at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>  at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) 
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) 
> at 
> org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>  at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>  at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>  at org.eclipse.jetty.server.Server.handle(Server.java:534) at 
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) 
> at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>  at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>  at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>  at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>  at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>  at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>  at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
>  
> This is the index page. It should not crash even if a stage is missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23184) All jobs page is broken when some stage is missing

2018-01-22 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335155#comment-16335155
 ] 

Shixiong Zhu commented on SPARK-23184:
--

Not yet. But I see it's duplicated after reading the patch.

> All jobs page is broken when some stage is missing
> --
>
> Key: SPARK-23184
> URL: https://issues.apache.org/jira/browse/SPARK-23184
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Priority: Blocker
> Fix For: 2.3.0
>
>
> {code}
> h2. HTTP ERROR 500
> Problem accessing /jobs/. Reason:
> Server Error
>  
> h3. Caused by:
> java.util.NoSuchElementException: No stage with id 44959 at 
> org.apache.spark.status.AppStatusStore.lastStageAttempt(AppStatusStore.scala:104)
>  at 
> org.apache.spark.ui.jobs.JobDataSource.org$apache$spark$ui$jobs$JobDataSource$$jobRow(AllJobsPage.scala:430)
>  at 
> org.apache.spark.ui.jobs.JobDataSource$$anonfun$26.apply(AllJobsPage.scala:408)
>  at 
> org.apache.spark.ui.jobs.JobDataSource$$anonfun$26.apply(AllJobsPage.scala:408)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.immutable.List.foreach(List.scala:381) at 
> scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
>  at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.ui.jobs.JobDataSource.(AllJobsPage.scala:408) at 
> org.apache.spark.ui.jobs.JobPagedTable.(AllJobsPage.scala:502) at 
> org.apache.spark.ui.jobs.AllJobsPage.jobsTable(AllJobsPage.scala:244) at 
> org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:293) at 
> org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98) at 
> org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98) at 
> org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584) at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>  at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) 
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) 
> at 
> org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>  at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>  at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>  at org.eclipse.jetty.server.Server.handle(Server.java:534) at 
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) 
> at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>  at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>  at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>  at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>  at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>  at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>  at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
>  
> This is the index page. It should not crash even if a stage is missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23184) All jobs page is broken when some stage is missing

2018-01-22 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335131#comment-16335131
 ] 

Shixiong Zhu commented on SPARK-23184:
--

cc [~vanzin] [~smurakozi] [~cloud_fan]

> All jobs page is broken when some stage is missing
> --
>
> Key: SPARK-23184
> URL: https://issues.apache.org/jira/browse/SPARK-23184
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Priority: Blocker
>
> {code}
> h2. HTTP ERROR 500
> Problem accessing /jobs/. Reason:
> Server Error
>  
> h3. Caused by:
> java.util.NoSuchElementException: No stage with id 44959 at 
> org.apache.spark.status.AppStatusStore.lastStageAttempt(AppStatusStore.scala:104)
>  at 
> org.apache.spark.ui.jobs.JobDataSource.org$apache$spark$ui$jobs$JobDataSource$$jobRow(AllJobsPage.scala:430)
>  at 
> org.apache.spark.ui.jobs.JobDataSource$$anonfun$26.apply(AllJobsPage.scala:408)
>  at 
> org.apache.spark.ui.jobs.JobDataSource$$anonfun$26.apply(AllJobsPage.scala:408)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.immutable.List.foreach(List.scala:381) at 
> scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
>  at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.ui.jobs.JobDataSource.(AllJobsPage.scala:408) at 
> org.apache.spark.ui.jobs.JobPagedTable.(AllJobsPage.scala:502) at 
> org.apache.spark.ui.jobs.AllJobsPage.jobsTable(AllJobsPage.scala:244) at 
> org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:293) at 
> org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98) at 
> org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98) at 
> org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584) at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>  at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) 
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) 
> at 
> org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>  at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>  at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>  at org.eclipse.jetty.server.Server.handle(Server.java:534) at 
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) 
> at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>  at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>  at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>  at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>  at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>  at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>  at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
>  
> This is the index page. It should not crash even if a stage is missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23184) All jobs page is broken when some stage is missing

2018-01-22 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335128#comment-16335128
 ] 

Shixiong Zhu commented on SPARK-23184:
--

This seems caused by the fix for SPARK-23051.

> All jobs page is broken when some stage is missing
> --
>
> Key: SPARK-23184
> URL: https://issues.apache.org/jira/browse/SPARK-23184
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Priority: Blocker
>
> {code}
> h2. HTTP ERROR 500
> Problem accessing /jobs/. Reason:
> Server Error
>  
> h3. Caused by:
> java.util.NoSuchElementException: No stage with id 44959 at 
> org.apache.spark.status.AppStatusStore.lastStageAttempt(AppStatusStore.scala:104)
>  at 
> org.apache.spark.ui.jobs.JobDataSource.org$apache$spark$ui$jobs$JobDataSource$$jobRow(AllJobsPage.scala:430)
>  at 
> org.apache.spark.ui.jobs.JobDataSource$$anonfun$26.apply(AllJobsPage.scala:408)
>  at 
> org.apache.spark.ui.jobs.JobDataSource$$anonfun$26.apply(AllJobsPage.scala:408)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.immutable.List.foreach(List.scala:381) at 
> scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
>  at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.ui.jobs.JobDataSource.(AllJobsPage.scala:408) at 
> org.apache.spark.ui.jobs.JobPagedTable.(AllJobsPage.scala:502) at 
> org.apache.spark.ui.jobs.AllJobsPage.jobsTable(AllJobsPage.scala:244) at 
> org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:293) at 
> org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98) at 
> org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98) at 
> org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584) at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>  at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) 
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) 
> at 
> org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>  at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>  at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>  at org.eclipse.jetty.server.Server.handle(Server.java:534) at 
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) 
> at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>  at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>  at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>  at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>  at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>  at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>  at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
>  
> This is the index page. It should not crash even if a stage is missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23184) All jobs page is broken when some stage is missing

2018-01-22 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-23184:


 Summary: All jobs page is broken when some stage is missing
 Key: SPARK-23184
 URL: https://issues.apache.org/jira/browse/SPARK-23184
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 2.3.0
Reporter: Shixiong Zhu


{code}
h2. HTTP ERROR 500

Problem accessing /jobs/. Reason:

Server Error

 
h3. Caused by:

java.util.NoSuchElementException: No stage with id 44959 at 
org.apache.spark.status.AppStatusStore.lastStageAttempt(AppStatusStore.scala:104)
 at 
org.apache.spark.ui.jobs.JobDataSource.org$apache$spark$ui$jobs$JobDataSource$$jobRow(AllJobsPage.scala:430)
 at 
org.apache.spark.ui.jobs.JobDataSource$$anonfun$26.apply(AllJobsPage.scala:408) 
at 
org.apache.spark.ui.jobs.JobDataSource$$anonfun$26.apply(AllJobsPage.scala:408) 
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at scala.collection.immutable.List.foreach(List.scala:381) at 
scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
 at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45) at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
org.apache.spark.ui.jobs.JobDataSource.(AllJobsPage.scala:408) at 
org.apache.spark.ui.jobs.JobPagedTable.(AllJobsPage.scala:502) at 
org.apache.spark.ui.jobs.AllJobsPage.jobsTable(AllJobsPage.scala:244) at 
org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:293) at 
org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98) at 
org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98) at 
org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90) at 
javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at 
javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at 
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584) at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
 at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) 
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) 
at 
org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) 
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
 at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) 
at org.eclipse.jetty.server.Server.handle(Server.java:534) at 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
 at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) 
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
 at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
 at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) 
at java.lang.Thread.run(Thread.java:748)

{code}

 

This is the index page. It should not crash even if a stage is missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21996) Streaming ignores files with spaces in the file names

2018-01-17 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu reassigned SPARK-21996:


Assignee: Xiayun Sun

> Streaming ignores files with spaces in the file names
> -
>
> Key: SPARK-21996
> URL: https://issues.apache.org/jira/browse/SPARK-21996
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.2.0
> Environment: openjdk version "1.8.0_131"
> OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.17.04.3-b11)
> OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
>Reporter: Ivan Sharamet
>Assignee: Xiayun Sun
>Priority: Major
> Fix For: 2.3.0
>
> Attachments: spark-streaming.zip
>
>
> I tried to stream text files from folder and noticed that files inside this 
> folder with spaces in their names ignored and there are some warnings in the 
> log:
> {code}
> 17/09/13 16:15:14 WARN InMemoryFileIndex: The directory 
> file:/in/two%20two.txt was not found. Was it deleted very recently?
> {code}
> I found that this happens due to duplicate file path URI encoding (I suppose) 
> and the actual URI inside path objects looks like this 
> {{file:/in/two%2520two.txt}}.
> To reproduce this issue just place some text files with spaces in their names 
> and execute some simple streaming code:
> {code:java}
> /in
> /one.txt
> /two two.txt
> /three.txt
> {code}
> {code}
> sparkSession.readStream.textFile("/in")
>   .writeStream
>   .option("checkpointLocation", "/checkpoint")
>   .format("text")
>   .start("/out")
>   .awaitTermination()
> {code}
> The result will contain only content of files {{one.txt}} and {{three.txt}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23064) Add documentation for stream-stream joins

2018-01-17 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-23064.
--
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 20255
[https://github.com/apache/spark/pull/20255]

> Add documentation for stream-stream joins
> -
>
> Key: SPARK-23064
> URL: https://issues.apache.org/jira/browse/SPARK-23064
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.2.1
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Major
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21996) Streaming ignores files with spaces in the file names

2018-01-17 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-21996.
--
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 19247
[https://github.com/apache/spark/pull/19247]

> Streaming ignores files with spaces in the file names
> -
>
> Key: SPARK-21996
> URL: https://issues.apache.org/jira/browse/SPARK-21996
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.2.0
> Environment: openjdk version "1.8.0_131"
> OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.17.04.3-b11)
> OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
>Reporter: Ivan Sharamet
>Priority: Major
> Fix For: 2.3.0
>
> Attachments: spark-streaming.zip
>
>
> I tried to stream text files from folder and noticed that files inside this 
> folder with spaces in their names ignored and there are some warnings in the 
> log:
> {code}
> 17/09/13 16:15:14 WARN InMemoryFileIndex: The directory 
> file:/in/two%20two.txt was not found. Was it deleted very recently?
> {code}
> I found that this happens due to duplicate file path URI encoding (I suppose) 
> and the actual URI inside path objects looks like this 
> {{file:/in/two%2520two.txt}}.
> To reproduce this issue just place some text files with spaces in their names 
> and execute some simple streaming code:
> {code:java}
> /in
> /one.txt
> /two two.txt
> /three.txt
> {code}
> {code}
> sparkSession.readStream.textFile("/in")
>   .writeStream
>   .option("checkpointLocation", "/checkpoint")
>   .format("text")
>   .start("/out")
>   .awaitTermination()
> {code}
> The result will contain only content of files {{one.txt}} and {{three.txt}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23119) Fix API annotation in DataSource V2 for streaming

2018-01-17 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-23119.
--
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 20286
[https://github.com/apache/spark/pull/20286]

> Fix API annotation in DataSource V2 for streaming
> -
>
> Key: SPARK-23119
> URL: https://issues.apache.org/jira/browse/SPARK-23119
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Major
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23093) don't modify run id

2018-01-17 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-23093.
--
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 20282
[https://github.com/apache/spark/pull/20282]

> don't modify run id
> ---
>
> Key: SPARK-23093
> URL: https://issues.apache.org/jira/browse/SPARK-23093
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.3.0
> Environment: The run ID hooks into the listener pipeline, so it has 
> to stay the same within a user-executed run. We need a different ID for epoch 
> coordinators.
>Reporter: Jose Torres
>Assignee: Jose Torres
>Priority: Major
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23093) don't modify run id

2018-01-17 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu reassigned SPARK-23093:


Assignee: Jose Torres

> don't modify run id
> ---
>
> Key: SPARK-23093
> URL: https://issues.apache.org/jira/browse/SPARK-23093
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.3.0
> Environment: The run ID hooks into the listener pipeline, so it has 
> to stay the same within a user-executed run. We need a different ID for epoch 
> coordinators.
>Reporter: Jose Torres
>Assignee: Jose Torres
>Priority: Major
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21996) Streaming ignores files with spaces in the file names

2018-01-16 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-21996:
-
Component/s: (was: SQL)
 Structured Streaming

> Streaming ignores files with spaces in the file names
> -
>
> Key: SPARK-21996
> URL: https://issues.apache.org/jira/browse/SPARK-21996
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.2.0
> Environment: openjdk version "1.8.0_131"
> OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.17.04.3-b11)
> OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
>Reporter: Ivan Sharamet
>Priority: Major
> Attachments: spark-streaming.zip
>
>
> I tried to stream text files from folder and noticed that files inside this 
> folder with spaces in their names ignored and there are some warnings in the 
> log:
> {code}
> 17/09/13 16:15:14 WARN InMemoryFileIndex: The directory 
> file:/in/two%20two.txt was not found. Was it deleted very recently?
> {code}
> I found that this happens due to duplicate file path URI encoding (I suppose) 
> and the actual URI inside path objects looks like this 
> {{file:/in/two%2520two.txt}}.
> To reproduce this issue just place some text files with spaces in their names 
> and execute some simple streaming code:
> {code:java}
> /in
> /one.txt
> /two two.txt
> /three.txt
> {code}
> {code}
> sparkSession.readStream.textFile("/in")
>   .writeStream
>   .option("checkpointLocation", "/checkpoint")
>   .format("text")
>   .start("/out")
>   .awaitTermination()
> {code}
> The result will contain only content of files {{one.txt}} and {{three.txt}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23050) Structured Streaming with S3 file source duplicates data because of eventual consistency.

2018-01-16 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327632#comment-16327632
 ] 

Shixiong Zhu commented on SPARK-23050:
--

[~ste...@apache.org] Yeah, that's a good improvement for S3. Is there an API to 
detect S3 like file systems?

> Structured Streaming with S3 file source duplicates data because of eventual 
> consistency.
> -
>
> Key: SPARK-23050
> URL: https://issues.apache.org/jira/browse/SPARK-23050
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.2.0
>Reporter: Yash Sharma
>Priority: Major
>
> Spark Structured streaming with S3 file source duplicates data because of 
> eventual consistency.
> Re producing the scenario -
> - Structured streaming reading from S3 source. Writing back to S3.
> - Spark tries to commitTask on completion of a task, by verifying if all the 
> files have been written to Filesystem. 
> {{ManifestFileCommitProtocol.commitTask}}.
> - [Eventual consistency issue] Spark finds that the file is not present and 
> fails the task. {{org.apache.spark.SparkException: Task failed while writing 
> rows. No such file or directory 
> 's3://path/data/part-00256-65ae782d-e32e-48fb-8652-e1d0defc370b-c000.snappy.parquet'}}
> - By this time S3 eventually gets the file.
> - Spark reruns the task and completes the task, but gets a new file name this 
> time. {{ManifestFileCommitProtocol.newTaskTempFile. 
> part-00256-b62fa7a4-b7e0-43d6-8c38-9705076a7ee1-c000.snappy.parquet.}}
> - Data duplicates in results and the same data is processed twice and written 
> to S3.
> - There is no data duplication if spark is able to list presence of all 
> committed files and all tasks succeed.
> Code:
> {code}
> query = selected_df.writeStream \
> .format("parquet") \
> .option("compression", "snappy") \
> .option("path", "s3://path/data/") \
> .option("checkpointLocation", "s3://path/checkpoint/") \
> .start()
> {code}
> Same sized duplicate S3 Files:
> {code}
> $ aws s3 ls s3://path/data/ | grep part-00256
> 2018-01-11 03:37:00  17070 
> part-00256-65ae782d-e32e-48fb-8652-e1d0defc370b-c000.snappy.parquet
> 2018-01-11 03:37:10  17070 
> part-00256-b62fa7a4-b7e0-43d6-8c38-9705076a7ee1-c000.snappy.parquet
> {code}
> Exception on S3 listing and task failure:
> {code}
> [Stage 5:>(277 + 100) / 
> 597]18/01/11 03:36:59 WARN TaskSetManager: Lost task 256.0 in stage 5.0 (TID  
> org.apache.spark.SparkException: Task failed while writing rows
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:272)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:191)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:190)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>   at org.apache.spark.scheduler.Task.run(Task.scala:108)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>  Caused by: java.io.FileNotFoundException: No such file or directory 
> 's3://path/data/part-00256-65ae782d-e32e-48fb-8652-e1d0defc370b-c000.snappy.parquet'
>   at 
> com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:816)
>   at 
> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:509)
>   at 
> org.apache.spark.sql.execution.streaming.ManifestFileCommitProtocol$$anonfun$4.apply(ManifestFileCommitProtocol.scala:109)
>   at 
> org.apache.spark.sql.execution.streaming.ManifestFileCommitProtocol$$anonfun$4.apply(ManifestFileCommitProtocol.scala:109)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> 

[jira] [Updated] (SPARK-22956) Union Stream Failover Cause `IllegalStateException`

2018-01-15 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-22956:
-
Fix Version/s: (was: 2.3.0)
   2.4.0
   2.3.1

> Union Stream Failover Cause `IllegalStateException`
> ---
>
> Key: SPARK-22956
> URL: https://issues.apache.org/jira/browse/SPARK-22956
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.1.0
>Reporter: Li Yuanjian
>Assignee: Li Yuanjian
>Priority: Major
> Fix For: 2.3.1, 2.4.0
>
>
> When we union 2 streams from kafka or other sources, while one of them have 
> no continues data coming and in the same time task restart, this will cause 
> an `IllegalStateException`. This mainly cause because the code in 
> [MicroBatchExecution|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L190]
>   , while one stream has no continues data, its comittedOffset same with 
> availableOffset during `populateStartOffsets`, and `currentPartitionOffsets` 
> not properly handled in KafkaSource. Also, maybe we should also consider this 
> scenario in other Source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-22956) Union Stream Failover Cause `IllegalStateException`

2018-01-15 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-22956.
--
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 20150
[https://github.com/apache/spark/pull/20150]

> Union Stream Failover Cause `IllegalStateException`
> ---
>
> Key: SPARK-22956
> URL: https://issues.apache.org/jira/browse/SPARK-22956
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.1.0
>Reporter: Li Yuanjian
>Assignee: Li Yuanjian
>Priority: Major
> Fix For: 2.3.0
>
>
> When we union 2 streams from kafka or other sources, while one of them have 
> no continues data coming and in the same time task restart, this will cause 
> an `IllegalStateException`. This mainly cause because the code in 
> [MicroBatchExecution|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L190]
>   , while one stream has no continues data, its comittedOffset same with 
> availableOffset during `populateStartOffsets`, and `currentPartitionOffsets` 
> not properly handled in KafkaSource. Also, maybe we should also consider this 
> scenario in other Source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22956) Union Stream Failover Cause `IllegalStateException`

2018-01-15 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu reassigned SPARK-22956:


Assignee: Li Yuanjian

> Union Stream Failover Cause `IllegalStateException`
> ---
>
> Key: SPARK-22956
> URL: https://issues.apache.org/jira/browse/SPARK-22956
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.1.0
>Reporter: Li Yuanjian
>Assignee: Li Yuanjian
>Priority: Major
>
> When we union 2 streams from kafka or other sources, while one of them have 
> no continues data coming and in the same time task restart, this will cause 
> an `IllegalStateException`. This mainly cause because the code in 
> [MicroBatchExecution|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L190]
>   , while one stream has no continues data, its comittedOffset same with 
> availableOffset during `populateStartOffsets`, and `currentPartitionOffsets` 
> not properly handled in KafkaSource. Also, maybe we should also consider this 
> scenario in other Source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23050) Structured Streaming with S3 file source duplicates data because of eventual consistency.

2018-01-12 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324717#comment-16324717
 ] 

Shixiong Zhu commented on SPARK-23050:
--

[~yash...@gmail.com] Spark SQL should handle it. Yeah, unfortunately, external 
tools such as Presto, Hive don't understand the logs. "s3a" will not make any 
difference, and is not related to this issue.

> Structured Streaming with S3 file source duplicates data because of eventual 
> consistency.
> -
>
> Key: SPARK-23050
> URL: https://issues.apache.org/jira/browse/SPARK-23050
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.2.0
>Reporter: Yash Sharma
>
> Spark Structured streaming with S3 file source duplicates data because of 
> eventual consistency.
> Re producing the scenario -
> - Structured streaming reading from S3 source. Writing back to S3.
> - Spark tries to commitTask on completion of a task, by verifying if all the 
> files have been written to Filesystem. 
> {{ManifestFileCommitProtocol.commitTask}}.
> - [Eventual consistency issue] Spark finds that the file is not present and 
> fails the task. {{org.apache.spark.SparkException: Task failed while writing 
> rows. No such file or directory 
> 's3://path/data/part-00256-65ae782d-e32e-48fb-8652-e1d0defc370b-c000.snappy.parquet'}}
> - By this time S3 eventually gets the file.
> - Spark reruns the task and completes the task, but gets a new file name this 
> time. {{ManifestFileCommitProtocol.newTaskTempFile. 
> part-00256-b62fa7a4-b7e0-43d6-8c38-9705076a7ee1-c000.snappy.parquet.}}
> - Data duplicates in results and the same data is processed twice and written 
> to S3.
> - There is no data duplication if spark is able to list presence of all 
> committed files and all tasks succeed.
> Code:
> {code}
> query = selected_df.writeStream \
> .format("parquet") \
> .option("compression", "snappy") \
> .option("path", "s3://path/data/") \
> .option("checkpointLocation", "s3://path/checkpoint/") \
> .start()
> {code}
> Same sized duplicate S3 Files:
> {code}
> $ aws s3 ls s3://path/data/ | grep part-00256
> 2018-01-11 03:37:00  17070 
> part-00256-65ae782d-e32e-48fb-8652-e1d0defc370b-c000.snappy.parquet
> 2018-01-11 03:37:10  17070 
> part-00256-b62fa7a4-b7e0-43d6-8c38-9705076a7ee1-c000.snappy.parquet
> {code}
> Exception on S3 listing and task failure:
> {code}
> [Stage 5:>(277 + 100) / 
> 597]18/01/11 03:36:59 WARN TaskSetManager: Lost task 256.0 in stage 5.0 (TID  
> org.apache.spark.SparkException: Task failed while writing rows
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:272)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:191)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:190)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>   at org.apache.spark.scheduler.Task.run(Task.scala:108)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>  Caused by: java.io.FileNotFoundException: No such file or directory 
> 's3://path/data/part-00256-65ae782d-e32e-48fb-8652-e1d0defc370b-c000.snappy.parquet'
>   at 
> com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:816)
>   at 
> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:509)
>   at 
> org.apache.spark.sql.execution.streaming.ManifestFileCommitProtocol$$anonfun$4.apply(ManifestFileCommitProtocol.scala:109)
>   at 
> org.apache.spark.sql.execution.streaming.ManifestFileCommitProtocol$$anonfun$4.apply(ManifestFileCommitProtocol.scala:109)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> 

[jira] [Assigned] (SPARK-22975) MetricsReporter producing NullPointerException when there was no progress reported

2018-01-12 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu reassigned SPARK-22975:


Assignee: Marco Gaido

> MetricsReporter producing NullPointerException when there was no progress 
> reported
> --
>
> Key: SPARK-22975
> URL: https://issues.apache.org/jira/browse/SPARK-22975
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.2.0, 2.2.1
>Reporter: Yuriy Bondaruk
>Assignee: Marco Gaido
> Fix For: 2.2.2, 2.3.0
>
>
> The exception occurs in MetricsReporter when it tries to register gauges 
> using lastProgress of each stream.
> {code:java}
>   registerGauge("inputRate-total", () => 
> stream.lastProgress.inputRowsPerSecond)
>   registerGauge("processingRate-total", () => 
> stream.lastProgress.inputRowsPerSecond)
>   registerGauge("latency", () => 
> stream.lastProgress.durationMs.get("triggerExecution").longValue())
> {code}
> In case if a stream doesn't have any progress reported than following 
> exception occurs:
> {noformat}
> 18/01/05 17:45:57 ERROR ScheduledReporter: RuntimeException thrown from 
> CloudwatchReporter#report. Exception was suppressed.
> java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.streaming.MetricsReporter$$anonfun$1.apply$mcD$sp(MetricsReporter.scala:42)
>   at 
> org.apache.spark.sql.execution.streaming.MetricsReporter$$anonfun$1.apply(MetricsReporter.scala:42)
>   at 
> org.apache.spark.sql.execution.streaming.MetricsReporter$$anonfun$1.apply(MetricsReporter.scala:42)
>   at 
> org.apache.spark.sql.execution.streaming.MetricsReporter$$anon$1.getValue(MetricsReporter.scala:49)
>   at 
> amazon.nexus.spark.metrics.cloudwatch.CloudwatchReporter.lambda$createNumericGaugeMetricDatumStream$0(CloudwatchReporter.java:146)
>   at 
> java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
>   at 
> java.util.Collections$UnmodifiableMap$UnmodifiableEntrySet.lambda$entryConsumer$0(Collections.java:1575)
>   at 
> java.util.TreeMap$EntrySpliterator.forEachRemaining(TreeMap.java:2969)
>   at 
> java.util.Collections$UnmodifiableMap$UnmodifiableEntrySet$UnmodifiableEntrySetSpliterator.forEachRemaining(Collections.java:1600)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:312)
>   at 
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
>   at 
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
>   at 
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
>   at 
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
>   at 
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
>   at 
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:510)
>   at 
> amazon.nexus.spark.metrics.cloudwatch.CloudwatchReporter.partitionIntoSublists(CloudwatchReporter.java:390)
>   at 
> amazon.nexus.spark.metrics.cloudwatch.CloudwatchReporter.report(CloudwatchReporter.java:137)
>   at 
> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162)
>   at 
> com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: 

[jira] [Resolved] (SPARK-22975) MetricsReporter producing NullPointerException when there was no progress reported

2018-01-12 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-22975.
--
   Resolution: Fixed
Fix Version/s: 2.3.0
   2.2.2

Issue resolved by pull request 20189
[https://github.com/apache/spark/pull/20189]

> MetricsReporter producing NullPointerException when there was no progress 
> reported
> --
>
> Key: SPARK-22975
> URL: https://issues.apache.org/jira/browse/SPARK-22975
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.2.0, 2.2.1
>Reporter: Yuriy Bondaruk
> Fix For: 2.2.2, 2.3.0
>
>
> The exception occurs in MetricsReporter when it tries to register gauges 
> using lastProgress of each stream.
> {code:java}
>   registerGauge("inputRate-total", () => 
> stream.lastProgress.inputRowsPerSecond)
>   registerGauge("processingRate-total", () => 
> stream.lastProgress.inputRowsPerSecond)
>   registerGauge("latency", () => 
> stream.lastProgress.durationMs.get("triggerExecution").longValue())
> {code}
> In case if a stream doesn't have any progress reported than following 
> exception occurs:
> {noformat}
> 18/01/05 17:45:57 ERROR ScheduledReporter: RuntimeException thrown from 
> CloudwatchReporter#report. Exception was suppressed.
> java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.streaming.MetricsReporter$$anonfun$1.apply$mcD$sp(MetricsReporter.scala:42)
>   at 
> org.apache.spark.sql.execution.streaming.MetricsReporter$$anonfun$1.apply(MetricsReporter.scala:42)
>   at 
> org.apache.spark.sql.execution.streaming.MetricsReporter$$anonfun$1.apply(MetricsReporter.scala:42)
>   at 
> org.apache.spark.sql.execution.streaming.MetricsReporter$$anon$1.getValue(MetricsReporter.scala:49)
>   at 
> amazon.nexus.spark.metrics.cloudwatch.CloudwatchReporter.lambda$createNumericGaugeMetricDatumStream$0(CloudwatchReporter.java:146)
>   at 
> java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
>   at 
> java.util.Collections$UnmodifiableMap$UnmodifiableEntrySet.lambda$entryConsumer$0(Collections.java:1575)
>   at 
> java.util.TreeMap$EntrySpliterator.forEachRemaining(TreeMap.java:2969)
>   at 
> java.util.Collections$UnmodifiableMap$UnmodifiableEntrySet$UnmodifiableEntrySetSpliterator.forEachRemaining(Collections.java:1600)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:312)
>   at 
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
>   at 
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
>   at 
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
>   at 
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
>   at 
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
>   at 
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:510)
>   at 
> amazon.nexus.spark.metrics.cloudwatch.CloudwatchReporter.partitionIntoSublists(CloudwatchReporter.java:390)
>   at 
> amazon.nexus.spark.metrics.cloudwatch.CloudwatchReporter.report(CloudwatchReporter.java:137)
>   at 
> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162)
>   at 
> com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SPARK-21475) Change to use NIO's Files API for external shuffle service

2018-01-11 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16323617#comment-16323617
 ] 

Shixiong Zhu commented on SPARK-21475:
--

Fixed

> Change to use NIO's Files API for external shuffle service
> --
>
> Key: SPARK-21475
> URL: https://issues.apache.org/jira/browse/SPARK-21475
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 2.3.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Minor
> Fix For: 2.3.0
>
>
> Java's {{FileInputStream}} and {{FileOutputStream}} overrides {{finalize()}}, 
> even this file input/output stream is closed correctly and promptly, it will 
> still leave some memory footprints which will get cleaned in Full GC. This 
> will introduce two side effects:
> 1. Lots of memory footprints regarding to Finalizer will be kept in memory 
> and this will increase the memory overhead. In our use case of external 
> shuffle service, a busy shuffle service will have bunch of this object and 
> potentially lead to OOM.
> 2. The Finalizer will only be called in Full GC, and this will increase the 
> overhead of Full GC and lead to long GC pause.
> So to fix this potential issue, here propose to use NIO's 
> Files#newInput/OutputStream instead in some critical paths like shuffle.
> https://www.cloudbees.com/blog/fileinputstream-fileoutputstream-considered-harmful



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21475) Change to use NIO's Files API for external shuffle service

2018-01-11 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-21475:
-
Fix Version/s: (was: 3.0.0)

> Change to use NIO's Files API for external shuffle service
> --
>
> Key: SPARK-21475
> URL: https://issues.apache.org/jira/browse/SPARK-21475
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 2.3.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Minor
> Fix For: 2.3.0
>
>
> Java's {{FileInputStream}} and {{FileOutputStream}} overrides {{finalize()}}, 
> even this file input/output stream is closed correctly and promptly, it will 
> still leave some memory footprints which will get cleaned in Full GC. This 
> will introduce two side effects:
> 1. Lots of memory footprints regarding to Finalizer will be kept in memory 
> and this will increase the memory overhead. In our use case of external 
> shuffle service, a busy shuffle service will have bunch of this object and 
> potentially lead to OOM.
> 2. The Finalizer will only be called in Full GC, and this will increase the 
> overhead of Full GC and lead to long GC pause.
> So to fix this potential issue, here propose to use NIO's 
> Files#newInput/OutputStream instead in some critical paths like shuffle.
> https://www.cloudbees.com/blog/fileinputstream-fileoutputstream-considered-harmful



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23051) job description in Spark UI is broken

2018-01-11 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16323360#comment-16323360
 ] 

Shixiong Zhu commented on SPARK-23051:
--

I marked this is a blocker since it's a regression.

> job description in Spark UI is broken 
> --
>
> Key: SPARK-23051
> URL: https://issues.apache.org/jira/browse/SPARK-23051
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Priority: Blocker
>  Labels: regression
>
> In previous versions, Spark UI will use the stage description if the job 
> description is not set. But right now it’s just empty.
> Reproducer: Just run the following codes in spark shell and check the UI:
> {code}
> val q = 
> spark.readStream.format("rate").load().writeStream.format("console").start()
> Thread.sleep(2000)
> q.stop()
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23051) job description in Spark UI is broken

2018-01-11 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-23051:
-
Labels: regression  (was: )

> job description in Spark UI is broken 
> --
>
> Key: SPARK-23051
> URL: https://issues.apache.org/jira/browse/SPARK-23051
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Priority: Blocker
>  Labels: regression
>
> In previous versions, Spark UI will use the stage description if the job 
> description is not set. But right now it’s just empty.
> Reproducer: Just run the following codes in spark shell and check the UI:
> {code}
> val q = 
> spark.readStream.format("rate").load().writeStream.format("console").start()
> Thread.sleep(2000)
> q.stop()
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23051) job description in Spark UI is broken

2018-01-11 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-23051:


 Summary: job description in Spark UI is broken 
 Key: SPARK-23051
 URL: https://issues.apache.org/jira/browse/SPARK-23051
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 2.3.0
Reporter: Shixiong Zhu
Priority: Blocker


In previous versions, Spark UI will use the stage description if the job 
description is not set. But right now it’s just empty.

Reproducer: Just run the following codes in spark shell and check the UI:
{code}
val q = 
spark.readStream.format("rate").load().writeStream.format("console").start()
Thread.sleep(2000)
q.stop()
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23050) Structured Streaming with S3 file source duplicates data because of eventual consistency.

2018-01-11 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16323320#comment-16323320
 ] 

Shixiong Zhu edited comment on SPARK-23050 at 1/12/18 12:56 AM:


How do you read the output? If you use Spark to read the output, it will only 
read the successful files which are stored in the file sink metadata. 


was (Author: zsxwing):
How do you read the output? If you use Spark to read the output, it will only 
read the successful files which are stored in the query metadata. 

> Structured Streaming with S3 file source duplicates data because of eventual 
> consistency.
> -
>
> Key: SPARK-23050
> URL: https://issues.apache.org/jira/browse/SPARK-23050
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.2.0
>Reporter: Yash Sharma
>
> Spark Structured streaming with S3 file source duplicates data because of 
> eventual consistency.
> Re producing the scenario -
> - Structured streaming reading from S3 source. Writing back to S3.
> - Spark tries to commitTask on completion of a task, by verifying if all the 
> files have been written to Filesystem. 
> {{ManifestFileCommitProtocol.commitTask}}.
> - [Eventual consistency issue] Spark finds that the file is not present and 
> fails the task. {{org.apache.spark.SparkException: Task failed while writing 
> rows. No such file or directory 
> 's3://path/data/part-00256-65ae782d-e32e-48fb-8652-e1d0defc370b-c000.snappy.parquet'}}
> - By this time S3 eventually gets the file.
> - Spark reruns the task and completes the task, but gets a new file name this 
> time. {{ManifestFileCommitProtocol.newTaskTempFile. 
> part-00256-b62fa7a4-b7e0-43d6-8c38-9705076a7ee1-c000.snappy.parquet.}}
> - Data duplicates in results and the same data is processed twice and written 
> to S3.
> - There is no data duplication if spark is able to list presence of all 
> committed files and all tasks succeed.
> Code:
> {code}
> query = selected_df.writeStream \
> .format("parquet") \
> .option("compression", "snappy") \
> .option("path", "s3://path/data/") \
> .option("checkpointLocation", "s3://path/checkpoint/") \
> .start()
> {code}
> Same sized duplicate S3 Files:
> {code}
> $ aws s3 ls s3://path/data/ | grep part-00256
> 2018-01-11 03:37:00  17070 
> part-00256-65ae782d-e32e-48fb-8652-e1d0defc370b-c000.snappy.parquet
> 2018-01-11 03:37:10  17070 
> part-00256-b62fa7a4-b7e0-43d6-8c38-9705076a7ee1-c000.snappy.parquet
> {code}
> Exception on S3 listing and task failure:
> {code}
> [Stage 5:>(277 + 100) / 
> 597]18/01/11 03:36:59 WARN TaskSetManager: Lost task 256.0 in stage 5.0 (TID  
> org.apache.spark.SparkException: Task failed while writing rows
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:272)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:191)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:190)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>   at org.apache.spark.scheduler.Task.run(Task.scala:108)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>  Caused by: java.io.FileNotFoundException: No such file or directory 
> 's3://path/data/part-00256-65ae782d-e32e-48fb-8652-e1d0defc370b-c000.snappy.parquet'
>   at 
> com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:816)
>   at 
> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:509)
>   at 
> org.apache.spark.sql.execution.streaming.ManifestFileCommitProtocol$$anonfun$4.apply(ManifestFileCommitProtocol.scala:109)
>   at 
> org.apache.spark.sql.execution.streaming.ManifestFileCommitProtocol$$anonfun$4.apply(ManifestFileCommitProtocol.scala:109)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at 

[jira] [Commented] (SPARK-23050) Structured Streaming with S3 file source duplicates data because of eventual consistency.

2018-01-11 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16323320#comment-16323320
 ] 

Shixiong Zhu commented on SPARK-23050:
--

How do you read the output? If you use Spark to read the output, it will only 
read the successful files which are stored in the query metadata. 

> Structured Streaming with S3 file source duplicates data because of eventual 
> consistency.
> -
>
> Key: SPARK-23050
> URL: https://issues.apache.org/jira/browse/SPARK-23050
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.2.0
>Reporter: Yash Sharma
>
> Spark Structured streaming with S3 file source duplicates data because of 
> eventual consistency.
> Re producing the scenario -
> - Structured streaming reading from S3 source. Writing back to S3.
> - Spark tries to commitTask on completion of a task, by verifying if all the 
> files have been written to Filesystem. 
> {{ManifestFileCommitProtocol.commitTask}}.
> - [Eventual consistency issue] Spark finds that the file is not present and 
> fails the task. {{org.apache.spark.SparkException: Task failed while writing 
> rows. No such file or directory 
> 's3://path/data/part-00256-65ae782d-e32e-48fb-8652-e1d0defc370b-c000.snappy.parquet'}}
> - By this time S3 eventually gets the file.
> - Spark reruns the task and completes the task, but gets a new file name this 
> time. {{ManifestFileCommitProtocol.newTaskTempFile. 
> part-00256-b62fa7a4-b7e0-43d6-8c38-9705076a7ee1-c000.snappy.parquet.}}
> - Data duplicates in results and the same data is processed twice and written 
> to S3.
> - There is no data duplication if spark is able to list presence of all 
> committed files and all tasks succeed.
> Code:
> {code}
> query = selected_df.writeStream \
> .format("parquet") \
> .option("compression", "snappy") \
> .option("path", "s3://path/data/") \
> .option("checkpointLocation", "s3://path/checkpoint/") \
> .start()
> {code}
> Same sized duplicate S3 Files:
> {code}
> $ aws s3 ls s3://path/data/ | grep part-00256
> 2018-01-11 03:37:00  17070 
> part-00256-65ae782d-e32e-48fb-8652-e1d0defc370b-c000.snappy.parquet
> 2018-01-11 03:37:10  17070 
> part-00256-b62fa7a4-b7e0-43d6-8c38-9705076a7ee1-c000.snappy.parquet
> {code}
> Exception on S3 listing and task failure:
> {code}
> [Stage 5:>(277 + 100) / 
> 597]18/01/11 03:36:59 WARN TaskSetManager: Lost task 256.0 in stage 5.0 (TID  
> org.apache.spark.SparkException: Task failed while writing rows
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:272)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:191)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:190)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>   at org.apache.spark.scheduler.Task.run(Task.scala:108)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>  Caused by: java.io.FileNotFoundException: No such file or directory 
> 's3://path/data/part-00256-65ae782d-e32e-48fb-8652-e1d0defc370b-c000.snappy.parquet'
>   at 
> com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:816)
>   at 
> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:509)
>   at 
> org.apache.spark.sql.execution.streaming.ManifestFileCommitProtocol$$anonfun$4.apply(ManifestFileCommitProtocol.scala:109)
>   at 
> org.apache.spark.sql.execution.streaming.ManifestFileCommitProtocol$$anonfun$4.apply(ManifestFileCommitProtocol.scala:109)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> 

[jira] [Resolved] (SPARK-22989) sparkstreaming ui show 0 records when spark-streaming-kafka application restore from checkpoint

2018-01-10 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-22989.
--
Resolution: Duplicate

> sparkstreaming ui show 0 records when spark-streaming-kafka application 
> restore from checkpoint 
> 
>
> Key: SPARK-22989
> URL: https://issues.apache.org/jira/browse/SPARK-22989
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.2.0
>Reporter: zhaoshijie
>Priority: Minor
>
> when a spark-streaming-kafka application restore from checkpoint , I find 
> spark-streaming ui  Each batch records is 0.
> !https://raw.githubusercontent.com/smdfj/picture/master/spark/batch.png!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22991) High read latency with spark streaming 2.2.1 and kafka 0.10.0.1

2018-01-10 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-22991:
-
Component/s: (was: Structured Streaming)
 (was: Spark Core)
 DStreams

> High read latency with spark streaming 2.2.1 and kafka 0.10.0.1
> ---
>
> Key: SPARK-22991
> URL: https://issues.apache.org/jira/browse/SPARK-22991
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.2.1
>Reporter: Kiran Shivappa Japannavar
>Priority: Critical
>
> Spark 2.2.1 + Kafka 0.10 + Spark streaming.
> Batch duration is 1s, Max rate per partition is 500, poll interval is 120 
> seconds, max poll records is 500 and no of partitions in Kafka is 500, 
> enabled cache consumer.
> While trying to read data from Kafka we are observing very high read 
> latencies intermittently.The high latencies results in Kafka consumer session 
> expiration and hence the Kafka brokers removes the consumer from the group. 
> The consumer keeps retrying and finally fails with the
> [org.apache.kafka.clients.NetworkClient] - Disconnecting from node 12 due to 
> request timeout
> [org.apache.kafka.clients.NetworkClient] - Cancelled request ClientRequest
> [org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient] - 
> Cancelled FETCH request ClientRequest.**
> Due to this a lot of batches are in the queued state.
> The high read latencies are occurring whenever multiple clients are 
> parallelly trying to read the data from the same Kafka cluster. The Kafka 
> cluster is having a large number of brokers and can support high network 
> bandwidth.
> When running with spark 1.5 and Kafka 0.8 consumer client against the same 
> Kafka cluster we are not seeing any read latencies.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22975) MetricsReporter producing NullPointerException when there was no progress reported

2018-01-10 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-22975:
-
Component/s: (was: SQL)
 Structured Streaming

> MetricsReporter producing NullPointerException when there was no progress 
> reported
> --
>
> Key: SPARK-22975
> URL: https://issues.apache.org/jira/browse/SPARK-22975
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.2.0, 2.2.1
>Reporter: Yuriy Bondaruk
>
> The exception occurs in MetricsReporter when it tries to register gauges 
> using lastProgress of each stream.
> {code:java}
>   registerGauge("inputRate-total", () => 
> stream.lastProgress.inputRowsPerSecond)
>   registerGauge("processingRate-total", () => 
> stream.lastProgress.inputRowsPerSecond)
>   registerGauge("latency", () => 
> stream.lastProgress.durationMs.get("triggerExecution").longValue())
> {code}
> In case if a stream doesn't have any progress reported than following 
> exception occurs:
> {noformat}
> 18/01/05 17:45:57 ERROR ScheduledReporter: RuntimeException thrown from 
> CloudwatchReporter#report. Exception was suppressed.
> java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.streaming.MetricsReporter$$anonfun$1.apply$mcD$sp(MetricsReporter.scala:42)
>   at 
> org.apache.spark.sql.execution.streaming.MetricsReporter$$anonfun$1.apply(MetricsReporter.scala:42)
>   at 
> org.apache.spark.sql.execution.streaming.MetricsReporter$$anonfun$1.apply(MetricsReporter.scala:42)
>   at 
> org.apache.spark.sql.execution.streaming.MetricsReporter$$anon$1.getValue(MetricsReporter.scala:49)
>   at 
> amazon.nexus.spark.metrics.cloudwatch.CloudwatchReporter.lambda$createNumericGaugeMetricDatumStream$0(CloudwatchReporter.java:146)
>   at 
> java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
>   at 
> java.util.Collections$UnmodifiableMap$UnmodifiableEntrySet.lambda$entryConsumer$0(Collections.java:1575)
>   at 
> java.util.TreeMap$EntrySpliterator.forEachRemaining(TreeMap.java:2969)
>   at 
> java.util.Collections$UnmodifiableMap$UnmodifiableEntrySet$UnmodifiableEntrySetSpliterator.forEachRemaining(Collections.java:1600)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:312)
>   at 
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
>   at 
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
>   at 
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
>   at 
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
>   at 
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
>   at 
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:510)
>   at 
> amazon.nexus.spark.metrics.cloudwatch.CloudwatchReporter.partitionIntoSublists(CloudwatchReporter.java:390)
>   at 
> amazon.nexus.spark.metrics.cloudwatch.CloudwatchReporter.report(CloudwatchReporter.java:137)
>   at 
> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162)
>   at 
> com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For 

[jira] [Commented] (SPARK-21760) Structured streaming terminates with Exception

2018-01-04 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312089#comment-16312089
 ] 

Shixiong Zhu commented on SPARK-21760:
--

Could you try 2.2.1? It's probably fixed by this line: 
https://github.com/apache/spark/commit/6edfff055caea81dc3a98a6b4081313a0c0b0729#diff-aaeb546880508bb771df502318c40a99L126

> Structured streaming terminates with Exception 
> ---
>
> Key: SPARK-21760
> URL: https://issues.apache.org/jira/browse/SPARK-21760
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.1.0
>Reporter: gagan taneja
>Priority: Critical
>
> We have seen Structured stream stops with exception below 
> While analyzing the content we found that latest log file as just one line 
> with version 
> {quote}hdfs dfs -cat warehouse/latency_internal/_spark_metadata/1683
> v1
> {quote}
> Exception is below 
> Exception in thread "stream execution thread for latency_internal [id = 
> 39f35d01-60d5-40b4-826e-99e5e38d0077, runId = 
> 95c95a01-bd4f-4604-8aae-c0c5d3e873e8]" java.lang.IllegalStateException: 
> Incomplete log file
> at 
> org.apache.spark.sql.execution.streaming.CompactibleFileStreamLog.deserialize(CompactibleFileStreamLog.scala:147)
> at 
> org.apache.spark.sql.execution.streaming.CompactibleFileStreamLog.deserialize(CompactibleFileStreamLog.scala:42)
> at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.get(HDFSMetadataLog.scala:237)
> at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$getLatest$1.apply$mcVJ$sp(HDFSMetadataLog.scala:266)
> at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$getLatest$1.apply(HDFSMetadataLog.scala:265)
> at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$getLatest$1.apply(HDFSMetadataLog.scala:265)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at 
> scala.collection.mutable.ArrayOps$ofLong.foreach(ArrayOps.scala:246)
> at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.getLatest(HDFSMetadataLog.scala:265)
> at 
> org.apache.spark.sql.execution.streaming.FileStreamSource.(FileStreamSource.scala:60)
> at 
> org.apache.spark.sql.execution.datasources.DataSource.createSource(DataSource.scala:256)
> at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$logicalPlan$1.applyOrElse(StreamExecution.scala:127)
> at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$logicalPlan$1.applyOrElse(StreamExecution.scala:123)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
> at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:287)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:293)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:293)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329)
> at 
> 

[jira] [Resolved] (SPARK-21475) Change the usage of FileInputStream/OutputStream to Files.newInput/OutputStream in the critical path

2018-01-04 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-21475.
--
   Resolution: Fixed
Fix Version/s: 2.3.0
   3.0.0

Issue resolved by pull request 20144
[https://github.com/apache/spark/pull/20144]

> Change the usage of FileInputStream/OutputStream to 
> Files.newInput/OutputStream in the critical path
> 
>
> Key: SPARK-21475
> URL: https://issues.apache.org/jira/browse/SPARK-21475
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 2.3.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
>
> Java's {{FileInputStream}} and {{FileOutputStream}} overrides {{finalize()}}, 
> even this file input/output stream is closed correctly and promptly, it will 
> still leave some memory footprints which will get cleaned in Full GC. This 
> will introduce two side effects:
> 1. Lots of memory footprints regarding to Finalizer will be kept in memory 
> and this will increase the memory overhead. In our use case of external 
> shuffle service, a busy shuffle service will have bunch of this object and 
> potentially lead to OOM.
> 2. The Finalizer will only be called in Full GC, and this will increase the 
> overhead of Full GC and lead to long GC pause.
> So to fix this potential issue, here propose to use NIO's 
> Files#newInput/OutputStream instead in some critical paths like shuffle.
> https://www.cloudbees.com/blog/fileinputstream-fileoutputstream-considered-harmful



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21475) Change to use NIO's Files API for external shuffle service

2018-01-04 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-21475:
-
Summary: Change to use NIO's Files API for external shuffle service  (was: 
Change the usage of FileInputStream/OutputStream to Files.newInput/OutputStream 
in the critical path)

> Change to use NIO's Files API for external shuffle service
> --
>
> Key: SPARK-21475
> URL: https://issues.apache.org/jira/browse/SPARK-21475
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 2.3.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Minor
> Fix For: 2.3.0, 3.0.0
>
>
> Java's {{FileInputStream}} and {{FileOutputStream}} overrides {{finalize()}}, 
> even this file input/output stream is closed correctly and promptly, it will 
> still leave some memory footprints which will get cleaned in Full GC. This 
> will introduce two side effects:
> 1. Lots of memory footprints regarding to Finalizer will be kept in memory 
> and this will increase the memory overhead. In our use case of external 
> shuffle service, a busy shuffle service will have bunch of this object and 
> potentially lead to OOM.
> 2. The Finalizer will only be called in Full GC, and this will increase the 
> overhead of Full GC and lead to long GC pause.
> So to fix this potential issue, here propose to use NIO's 
> Files#newInput/OutputStream instead in some critical paths like shuffle.
> https://www.cloudbees.com/blog/fileinputstream-fileoutputstream-considered-harmful



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21475) Change the usage of FileInputStream/OutputStream to Files.newInput/OutputStream in the critical path

2017-12-29 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-21475:
-
Fix Version/s: (was: 2.3.0)

> Change the usage of FileInputStream/OutputStream to 
> Files.newInput/OutputStream in the critical path
> 
>
> Key: SPARK-21475
> URL: https://issues.apache.org/jira/browse/SPARK-21475
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 2.3.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Minor
>
> Java's {{FileInputStream}} and {{FileOutputStream}} overrides {{finalize()}}, 
> even this file input/output stream is closed correctly and promptly, it will 
> still leave some memory footprints which will get cleaned in Full GC. This 
> will introduce two side effects:
> 1. Lots of memory footprints regarding to Finalizer will be kept in memory 
> and this will increase the memory overhead. In our use case of external 
> shuffle service, a busy shuffle service will have bunch of this object and 
> potentially lead to OOM.
> 2. The Finalizer will only be called in Full GC, and this will increase the 
> overhead of Full GC and lead to long GC pause.
> So to fix this potential issue, here propose to use NIO's 
> Files#newInput/OutputStream instead in some critical paths like shuffle.
> https://www.cloudbees.com/blog/fileinputstream-fileoutputstream-considered-harmful



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-22863) Make MicroBatchExecution also support MicroBatchRead/WriteSupport

2017-12-29 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-22863.
--
Resolution: Duplicate

> Make MicroBatchExecution also support MicroBatchRead/WriteSupport
> -
>
> Key: SPARK-22863
> URL: https://issues.apache.org/jira/browse/SPARK-22863
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-21475) Change the usage of FileInputStream/OutputStream to Files.newInput/OutputStream in the critical path

2017-12-29 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu reopened SPARK-21475:
--

I sent https://github.com/apache/spark/pull/20119 to revert this due to a 
performance regression.

> Change the usage of FileInputStream/OutputStream to 
> Files.newInput/OutputStream in the critical path
> 
>
> Key: SPARK-21475
> URL: https://issues.apache.org/jira/browse/SPARK-21475
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 2.3.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Minor
> Fix For: 2.3.0
>
>
> Java's {{FileInputStream}} and {{FileOutputStream}} overrides {{finalize()}}, 
> even this file input/output stream is closed correctly and promptly, it will 
> still leave some memory footprints which will get cleaned in Full GC. This 
> will introduce two side effects:
> 1. Lots of memory footprints regarding to Finalizer will be kept in memory 
> and this will increase the memory overhead. In our use case of external 
> shuffle service, a busy shuffle service will have bunch of this object and 
> potentially lead to OOM.
> 2. The Finalizer will only be called in Full GC, and this will increase the 
> overhead of Full GC and lead to long GC pause.
> So to fix this potential issue, here propose to use NIO's 
> Files#newInput/OutputStream instead in some critical paths like shuffle.
> https://www.cloudbees.com/blog/fileinputstream-fileoutputstream-considered-harmful



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22909) Move Structured Streaming v2 APIs to streaming package

2017-12-27 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-22909:
-
Target Version/s: 2.3.0

> Move Structured Streaming v2 APIs to streaming package
> --
>
> Key: SPARK-22909
> URL: https://issues.apache.org/jira/browse/SPARK-22909
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22909) Move Structured Streaming v2 APIs to streaming package

2017-12-27 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-22909:
-
Priority: Blocker  (was: Major)

> Move Structured Streaming v2 APIs to streaming package
> --
>
> Key: SPARK-22909
> URL: https://issues.apache.org/jira/browse/SPARK-22909
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   9   10   >