?????? Spark hangs at "Removed broadcast_*"

2016-07-12 Thread Sea
please provide your jstack info. -- -- ??: "dhruve ashar";; : 2016??7??13??(??) 3:53 ??: "Anton Sviridov"; : "user"; ????: Re: Spark hangs at "Removed broadcast_*" Looking at the

Re: Spark hangs at "Removed broadcast_*"

2016-07-12 Thread dhruve ashar
Looking at the jstack, it seems that it doesn't contain all the threads. Cannot find the main thread in the jstack. I am not an expert on analyzing jstacks, but are you creating any threads in your code? Shutting them down correctly? This one is a non-daemon and doesn't seem to be coming from Spa

Re: Spark hangs at "Removed broadcast_*"

2016-07-12 Thread Anton Sviridov
Hi. Here's the last few lines before it starts removing broadcasts: 16/07/11 14:02:11 INFO FileOutputCommitter: Saved output of task 'attempt_20160723_0009_m_003209_20886' to file:/mnt/rendang/cache-main/RunWikistatsSFCounts727fc9d635f25d0922984e59a0d18fdd/stats/sf_counts/_temporary/0/task_20

Re: Spark hangs at "Removed broadcast_*"

2016-07-11 Thread dhruve ashar
Hi, Can you check the time when the job actually finished from the logs. The logs provided are too short and do not reveal meaningful information. On Mon, Jul 11, 2016 at 9:50 AM, velvetbaldmime wrote: > Spark 2.0.0-preview > > We've got an app that uses a fairly big broadcast variable. We ru

Spark hangs at "Removed broadcast_*"

2016-07-11 Thread velvetbaldmime
Spark 2.0.0-preview We've got an app that uses a fairly big broadcast variable. We run this on a big EC2 instance, so deployment is in client-mode. Broadcasted variable is a massive Map[String, Array[String]]. At the end of saveAsTextFile, the output in the folder seems to be complete and correct