Hello, I’m facing the same (or similar) problem. In my case, the last two tasks hang in a map function following sc.sequenceFile(…). It happens from time to time (more often with TorrentBroadcast than HttpBroadcast) and after restarting it works fine.
The problem always happens on the same node — on the node that plays the roles of the master and one worker. Once this node becomes master-only (i.e., I removed this nodes from conf/slaves), the problem is gone. Does that mean that the master and workers have to be on separate nodes? Best, Milos On Jan 6, 2014, at 5:44 PM, Grega Kešpret <[email protected]> wrote: > Hi, > > we are seeing several times a day one worker in a Standalone cluster hang up > with 100% CPU at the last task and doesn't proceed. After we restart the job, > it completes successfully. > > We are using Spark v0.8.1-incubating. > > Attached please find jstack logs of Worker and CoarseGrainedExecutorBackend > JVM processes. > > Grega > -- > <celtra_logo.png> > Grega Kešpret > Analytics engineer > > Celtra — Rich Media Mobile Advertising > celtra.com | @celtramobile > <logs.zip>
