[jira] [Commented] (SPARK-15703) Make ListenerBus event queue size configurable

2018-04-16 Thread rohit verma (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440377#comment-16440377
 ] 

rohit verma commented on SPARK-15703:
-

Hi Guys, 

I am worried weather this issue has been resolved or not in 2.0.x/2.1.x 
releases, because I am using spark 2.2.0 and facing the same issue. I did not 
find anything documented regarding this.

 

Thanks.

> Make ListenerBus event queue size configurable
> --
>
> Key: SPARK-15703
> URL: https://issues.apache.org/jira/browse/SPARK-15703
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Web UI
>Affects Versions: 2.0.0
>Reporter: Thomas Graves
>Assignee: Dhruve Ashar
>Priority: Minor
> Fix For: 2.0.1, 2.1.0
>
> Attachments: Screen Shot 2016-06-01 at 11.21.32 AM.png, Screen Shot 
> 2016-06-01 at 11.23.48 AM.png, SparkListenerBus .png, 
> spark-dynamic-executor-allocation.png
>
>
> The Spark UI doesn't seem to be showing all the tasks and metrics.
> I ran a job with 10 tasks but Detail stage page says it completed 93029:
> Summary Metrics for 93029 Completed Tasks
> The Stages for all jobs pages list that only 89519/10 tasks finished but 
> its completed.  The metrics for shuffled write and input are also incorrect.
> I will attach screen shots.
> I checked the logs and it does show that all the tasks actually finished.
> 16/06/01 16:15:42 INFO TaskSetManager: Finished task 59880.0 in stage 2.0 
> (TID 54038) in 265309 ms on 10.213.45.51 (10/10)
> 16/06/01 16:15:42 INFO YarnClusterScheduler: Removed TaskSet 2.0, whose tasks 
> have all completed, from pool



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-22262) Failed to recover Spark Structured Streaming job from checkpoint location

2017-10-13 Thread rohit verma (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rohit verma resolved SPARK-22262.
-
Resolution: Invalid

Invalid..

> Failed to recover Spark Structured Streaming job from checkpoint location
> -
>
> Key: SPARK-22262
> URL: https://issues.apache.org/jira/browse/SPARK-22262
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.2.0
> Environment: Ubuntu Server 16.04 LTS *strong text* (AMI AWS - EC2)
> servers : 3 x m4.2xlarge
>  
>Reporter: rohit verma
>
> Hello Everyone,
> I am using Structured Streaming + Kafka for realtime data analytics in our 
> project. I am using Spark 2.2, kafka 0.10.2.
> We are facing an issue during streaming query recovery from checkpoint at 
> application startup. As there are multiple streaming queries derived from a 
> single kafka streaming point and there are different checkpint directories 
> for every streaming query. So in case of job failure, when we restart the job 
> there are some streaming queries which fails to recover from checkpoint 
> location hence throw an exception of  *Error reading delta file*. To make the 
> job working again I need to clear the checkpoint location and all the data 
> from hdfs, then it works fine. But this should not be the case. Here I am 
> losing all the captured data which I don't want. Here are the logs :
> 
> Job aborted due to stage failure: Task 2 in stage 13.0 failed 4 times, most 
> recent failure: Lost task 2.3 in stage 13.0 (TID 831, 
> ip-172-31-10-246.us-west-2.compute.internal, executor 3): 
> java.lang.IllegalStateException: Error reading delta file 
> /checkpointing/wifiHealthPerUserPerMinute/state/0/2/1.delta of 
> HDFSStateStoreProvider[id = (op=0, part=2), dir = 
> /checkpointing/wifiHealthPerUserPerMinute/state/0/2]: 
> /checkpointing/wifiHealthPerUserPerMinute/state/0/2/1.delta does not exist
>   at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$updateFromDeltaFile(HDFSBackedStateStoreProvider.scala:410)
>   at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1$$anonfun$6.apply(HDFSBackedStateStoreProvider.scala:362)
>   at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1$$anonfun$6.apply(HDFSBackedStateStoreProvider.scala:359)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1.apply(HDFSBackedStateStoreProvider.scala:359)
>   at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1.apply(HDFSBackedStateStoreProvider.scala:358)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap(HDFSBackedStateStoreProvider.scala:358)
>   at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1$$anonfun$6.apply(HDFSBackedStateStoreProvider.scala:360)
>   at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1$$anonfun$6.apply(HDFSBackedStateStoreProvider.scala:359)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1.apply(HDFSBackedStateStoreProvider.scala:359)
>   at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1.apply(HDFSBackedStateStoreProvider.scala:358)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> 

[jira] [Created] (SPARK-22262) Failed to recover Spark Structured Streaming job from checkpoint location

2017-10-12 Thread rohit verma (JIRA)
rohit verma created SPARK-22262:
---

 Summary: Failed to recover Spark Structured Streaming job from 
checkpoint location
 Key: SPARK-22262
 URL: https://issues.apache.org/jira/browse/SPARK-22262
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 2.2.0
 Environment: Ubuntu Server 16.04 LTS *strong text* (AMI AWS - EC2)

servers : 3 x m4.2xlarge
 
Reporter: rohit verma


Hello Everyone,

I am using Structured Streaming + Kafka for realtime data analytics in our 
project. I am using Spark 2.2, kafka 0.10.2.

We are facing an issue during streaming query recovery from checkpoint at 
application startup. As there are multiple streaming queries derived from a 
single kafka streaming point and there are different checkpint directories for 
every streaming query. So in case of job failure, when we restart the job there 
are some streaming queries which fails to recover from checkpoint location 
hence throw an exception of  *Error reading delta file*. Here are the logs :



Job aborted due to stage failure: Task 2 in stage 13.0 failed 4 times, most 
recent failure: Lost task 2.3 in stage 13.0 (TID 831, 
ip-172-31-10-246.us-west-2.compute.internal, executor 3): 
java.lang.IllegalStateException: Error reading delta file 
/checkpointing/wifiHealthPerUserPerMinute/state/0/2/1.delta of 
HDFSStateStoreProvider[id = (op=0, part=2), dir = 
/checkpointing/wifiHealthPerUserPerMinute/state/0/2]: 
/checkpointing/wifiHealthPerUserPerMinute/state/0/2/1.delta does not exist
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$updateFromDeltaFile(HDFSBackedStateStoreProvider.scala:410)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1$$anonfun$6.apply(HDFSBackedStateStoreProvider.scala:362)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1$$anonfun$6.apply(HDFSBackedStateStoreProvider.scala:359)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1.apply(HDFSBackedStateStoreProvider.scala:359)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1.apply(HDFSBackedStateStoreProvider.scala:358)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap(HDFSBackedStateStoreProvider.scala:358)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1$$anonfun$6.apply(HDFSBackedStateStoreProvider.scala:360)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1$$anonfun$6.apply(HDFSBackedStateStoreProvider.scala:359)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1.apply(HDFSBackedStateStoreProvider.scala:359)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1.apply(HDFSBackedStateStoreProvider.scala:358)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap(HDFSBackedStateStoreProvider.scala:358)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1$$anonfun$6.apply(HDFSBackedStateStoreProvider.scala:360)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1$$anonfun$6.apply(HDFSBackedStateStoreProvider.scala:359)
at 

[jira] [Updated] (SPARK-22262) Failed to recover Spark Structured Streaming job from checkpoint location

2017-10-12 Thread rohit verma (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rohit verma updated SPARK-22262:

Description: 
Hello Everyone,

I am using Structured Streaming + Kafka for realtime data analytics in our 
project. I am using Spark 2.2, kafka 0.10.2.

We are facing an issue during streaming query recovery from checkpoint at 
application startup. As there are multiple streaming queries derived from a 
single kafka streaming point and there are different checkpint directories for 
every streaming query. So in case of job failure, when we restart the job there 
are some streaming queries which fails to recover from checkpoint location 
hence throw an exception of  *Error reading delta file*. To make the job 
working again I need to clear the checkpoint location and all the data from 
hdfs, then it works fine. But this should not be the case. Here I am losing all 
the captured data which I don't want. Here are the logs :



Job aborted due to stage failure: Task 2 in stage 13.0 failed 4 times, most 
recent failure: Lost task 2.3 in stage 13.0 (TID 831, 
ip-172-31-10-246.us-west-2.compute.internal, executor 3): 
java.lang.IllegalStateException: Error reading delta file 
/checkpointing/wifiHealthPerUserPerMinute/state/0/2/1.delta of 
HDFSStateStoreProvider[id = (op=0, part=2), dir = 
/checkpointing/wifiHealthPerUserPerMinute/state/0/2]: 
/checkpointing/wifiHealthPerUserPerMinute/state/0/2/1.delta does not exist
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$updateFromDeltaFile(HDFSBackedStateStoreProvider.scala:410)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1$$anonfun$6.apply(HDFSBackedStateStoreProvider.scala:362)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1$$anonfun$6.apply(HDFSBackedStateStoreProvider.scala:359)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1.apply(HDFSBackedStateStoreProvider.scala:359)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1.apply(HDFSBackedStateStoreProvider.scala:358)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap(HDFSBackedStateStoreProvider.scala:358)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1$$anonfun$6.apply(HDFSBackedStateStoreProvider.scala:360)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1$$anonfun$6.apply(HDFSBackedStateStoreProvider.scala:359)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1.apply(HDFSBackedStateStoreProvider.scala:359)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1.apply(HDFSBackedStateStoreProvider.scala:358)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap(HDFSBackedStateStoreProvider.scala:358)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1$$anonfun$6.apply(HDFSBackedStateStoreProvider.scala:360)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1$$anonfun$6.apply(HDFSBackedStateStoreProvider.scala:359)
at scala.Option.getOrElse(Option.scala:121)
at