Re: Worker hangs with 100% CPU in Standalone cluster

Milos Nikolic Thu, 16 Jan 2014 05:47:48 -0800

Hello,

I’m facing the same (or similar) problem. In my case, the last two tasks hang 
in a map function following sc.sequenceFile(…). It happens from time to time 
(more often with TorrentBroadcast than HttpBroadcast) and after restarting it 
works fine.

The problem always happens on the same node — on the node that plays the roles 
of the master and one worker. Once this node becomes master-only (i.e., I 
removed this nodes from conf/slaves), the problem is gone. 

Does that mean that the master and workers have to be on separate nodes? 

Best,
Milos

On Jan 6, 2014, at 5:44 PM, Grega Kešpret <[email protected]> wrote:

> Hi,
> 
> we are seeing several times a day one worker in a Standalone cluster hang up 
> with 100% CPU at the last task and doesn't proceed. After we restart the job, 
> it completes successfully.
> 
> We are using Spark v0.8.1-incubating.
> 
> Attached please find jstack logs of Worker and CoarseGrainedExecutorBackend 
> JVM processes.
> 
> Grega
> --
> <celtra_logo.png>     
> Grega Kešpret
> Analytics engineer
> 
> Celtra — Rich Media Mobile Advertising
> celtra.com | @celtramobile
> <logs.zip>

Re: Worker hangs with 100% CPU in Standalone cluster

Reply via email to