Hi.

Here's the last few lines before it starts removing broadcasts:

16/07/11 14:02:11 INFO FileOutputCommitter: Saved output of task
'attempt_201607111123_0009_m_003209_20886' to
file:/mnt/rendang/cache-main/RunWikistatsSFCounts727fc9d635f25d0922984e59a0d18fdd/stats/sf_counts/_temporary/0/task_201607111123_0009_m_003209
16/07/11 14:02:11 INFO SparkHadoopMapRedUtil:
attempt_201607111123_0009_m_003209_20886: Committed
16/07/11 14:02:11 INFO TaskSetManager: Finished task 3211.0 in stage 9.0
(TID 20888) in 95 ms on localhost (3209/3214)
16/07/11 14:02:11 INFO Executor: Finished task 3209.0 in stage 9.0 (TID
20886). 1721 bytes result sent to driver
16/07/11 14:02:11 INFO TaskSetManager: Finished task 3209.0 in stage 9.0
(TID 20886) in 103 ms on localhost (3210/3214)
16/07/11 14:02:11 INFO FileOutputCommitter: Saved output of task
'attempt_201607111123_0009_m_003208_20885' to
file:/mnt/rendang/cache-main/RunWikistatsSFCounts727fc9d635f25d0922984e59a0d18fdd/stats/sf_counts/_temporary/0/task_201607111123_0009_m_003208
16/07/11 14:02:11 INFO SparkHadoopMapRedUtil:
attempt_201607111123_0009_m_003208_20885: Committed
16/07/11 14:02:11 INFO Executor: Finished task 3208.0 in stage 9.0 (TID
20885). 1721 bytes result sent to driver
16/07/11 14:02:11 INFO TaskSetManager: Finished task 3208.0 in stage 9.0
(TID 20885) in 109 ms on localhost (3211/3214)
16/07/11 14:02:11 INFO FileOutputCommitter: Saved output of task
'attempt_201607111123_0009_m_003212_20889' to
file:/mnt/rendang/cache-main/RunWikistatsSFCounts727fc9d635f25d0922984e59a0d18fdd/stats/sf_counts/_temporary/0/task_201607111123_0009_m_003212
16/07/11 14:02:11 INFO SparkHadoopMapRedUtil:
attempt_201607111123_0009_m_003212_20889: Committed
16/07/11 14:02:11 INFO Executor: Finished task 3212.0 in stage 9.0 (TID
20889). 1721 bytes result sent to driver
16/07/11 14:02:11 INFO TaskSetManager: Finished task 3212.0 in stage 9.0
(TID 20889) in 84 ms on localhost (3212/3214)
16/07/11 14:02:11 INFO FileOutputCommitter: Saved output of task
'attempt_201607111123_0009_m_003210_20887' to
file:/mnt/rendang/cache-main/RunWikistatsSFCounts727fc9d635f25d0922984e59a0d18fdd/stats/sf_counts/_temporary/0/task_201607111123_0009_m_003210
16/07/11 14:02:11 INFO SparkHadoopMapRedUtil:
attempt_201607111123_0009_m_003210_20887: Committed
16/07/11 14:02:11 INFO Executor: Finished task 3210.0 in stage 9.0 (TID
20887). 1721 bytes result sent to driver
16/07/11 14:02:11 INFO TaskSetManager: Finished task 3210.0 in stage 9.0
(TID 20887) in 100 ms on localhost (3213/3214)
16/07/11 14:02:11 INFO FileOutputCommitter: File Output Committer Algorithm
version is 1
16/07/11 14:02:11 INFO FileOutputCommitter: Saved output of task
'attempt_201607111123_0009_m_003213_20890' to
file:/mnt/rendang/cache-main/RunWikistatsSFCounts727fc9d635f25d0922984e59a0d18fdd/stats/sf_counts/_temporary/0/task_201607111123_0009_m_003213
16/07/11 14:02:11 INFO SparkHadoopMapRedUtil:
attempt_201607111123_0009_m_003213_20890: Committed
16/07/11 14:02:11 INFO Executor: Finished task 3213.0 in stage 9.0 (TID
20890). 1721 bytes result sent to driver
16/07/11 14:02:11 INFO TaskSetManager: Finished task 3213.0 in stage 9.0
(TID 20890) in 82 ms on localhost (3214/3214)
16/07/11 14:02:11 INFO TaskSchedulerImpl: Removed TaskSet 9.0, whose tasks
have all completed, from pool
*16/07/11 14:02:11 INFO DAGScheduler: ResultStage 9 (saveAsTextFile at
SfCountsDumper.scala:13) finished in 42.294 s*
*16/07/11 14:02:11 INFO DAGScheduler: Job 1 finished: saveAsTextFile at
SfCountsDumper.scala:13, took 9517.124624 s*
16/07/11 14:28:46 INFO BlockManagerInfo: Removed broadcast_0_piece0 on
10.101.230.154:35192 in memory (size: 15.8 KB, free: 37.1 GB)
16/07/11 14:28:46 INFO ContextCleaner: Cleaned shuffle 7
16/07/11 14:28:46 INFO ContextCleaner: Cleaned shuffle 6
16/07/11 14:28:46 INFO ContextCleaner: Cleaned shuffle 5
16/07/11 14:28:46 INFO ContextCleaner: Cleaned shuffle 4
16/07/11 14:28:46 INFO ContextCleaner: Cleaned shuffle 3
16/07/11 14:28:46 INFO ContextCleaner: Cleaned shuffle 2
16/07/11 14:28:46 INFO ContextCleaner: Cleaned shuffle 1
16/07/11 14:28:46 INFO BlockManager: Removing RDD 14
16/07/11 14:28:46 INFO ContextCleaner: Cleaned RDD 14
16/07/11 14:28:46 INFO BlockManagerInfo: Removed broadcast_11_piece0 on
10.101.230.154:35192 in memory (size: 25.5 KB, free: 37.1 GB)
...

In fact, the job is still running, Spark's UI shows uptime of 20.6 hours
with last job finishing 18 hours ago at least.

On Mon, 11 Jul 2016 at 23:23 dhruve ashar <dhruveas...@gmail.com> wrote:

> Hi,
>
> Can you check the time when the job actually finished from the logs. The
> logs provided are too short and do not reveal meaningful information.
>
>
>
> On Mon, Jul 11, 2016 at 9:50 AM, velvetbaldmime <keyn...@gmail.com> wrote:
>
>> Spark 2.0.0-preview
>>
>> We've got an app that uses a fairly big broadcast variable. We run this
>> on a
>> big EC2 instance, so deployment is in client-mode. Broadcasted variable
>> is a
>> massive Map[String, Array[String]].
>>
>> At the end of saveAsTextFile, the output in the folder seems to be
>> complete
>> and correct (apart from .crc files still being there) BUT the spark-submit
>> process is stuck on, seemingly, removing the broadcast variable. The stuck
>> logs look like this: http://pastebin.com/wpTqvArY
>>
>> My last run lasted for 12 hours after after doing saveAsTextFile - just
>> sitting there. I did a jstack on driver process, most threads are parked:
>> http://pastebin.com/E29JKVT7
>>
>> Full store: We used this code with Spark 1.5.0 and it worked, but then the
>> data changed and something stopped fitting into Kryo's serialisation
>> buffer.
>> Increasing it didn't help, so I had to disable the KryoSerialiser. Tested
>> it
>> again - it hanged. Switched to 2.0.0-preview - seems like the same issue.
>>
>> I'm not quite sure what's even going on given that there's almost no CPU
>> activity and no output in the logs, yet the output is not finalised like
>> it
>> used to before.
>>
>> Would appreciate any help, thanks
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-hangs-at-Removed-broadcast-tp27320.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>
>
> --
> -Dhruve Ashar
>
>

Reply via email to