It sounds like the takeaway is that if you're using custom classes, you need to make sure that their hashCode() and equals() methods are value-based?
On Thu, Jan 16, 2014 at 12:08 PM, Patrick Wendell <[email protected]>wrote: > Thanks for following up and explaining this one! Definitely something > other users might run into... > > > On Thu, Jan 16, 2014 at 5:58 AM, Grega Kešpret <[email protected]> wrote: > >> Just to follow up, we have since pinpointed the problem to be in >> application code (not Spark). In some cases, there was an infinite loop in >> Scala HashTable linear probing algorithm, where an element's next() pointed >> at itself. It was probably caused by wrong hashCode() and equals() methods >> on the object we were storing. >> >> Milos, we also have Master node separate from Worker nodes. Could someone >> from Spark team comment about that? >> >> Grega >> -- >> [image: Inline image 1] >> *Grega Kešpret* >> Analytics engineer >> >> Celtra — Rich Media Mobile Advertising >> celtra.com <http://www.celtra.com/> | >> @celtramobile<http://www.twitter.com/celtramobile> >> >> >> On Thu, Jan 16, 2014 at 2:46 PM, Milos Nikolic <[email protected] >> > wrote: >> >>> Hello, >>> >>> I’m facing the same (or similar) problem. In my case, the last two tasks >>> hang in a map function following sc.sequenceFile(…). It happens from time >>> to time (more often with TorrentBroadcast than HttpBroadcast) and after >>> restarting it works fine. >>> >>> The problem always happens on the same node — on the node that plays the >>> roles of the master and one worker. Once this node becomes master-only >>> (i.e., I removed this nodes from conf/slaves), the problem is gone. >>> >>> Does that mean that the master and workers have to be on separate nodes? >>> >>> Best, >>> Milos >>> >>> >>> On Jan 6, 2014, at 5:44 PM, Grega Kešpret <[email protected]> wrote: >>> >>> Hi, >>> >>> we are seeing several times a day one worker in a Standalone cluster >>> hang up with 100% CPU at the last task and doesn't proceed. After we >>> restart the job, it completes successfully. >>> >>> We are using Spark v0.8.1-incubating. >>> >>> Attached please find jstack logs of Worker >>> and CoarseGrainedExecutorBackend JVM processes. >>> >>> Grega >>> -- >>> <celtra_logo.png> >>> *Grega Kešpret* >>> Analytics engineer >>> >>> Celtra — Rich Media Mobile Advertising >>> celtra.com <http://www.celtra.com/> | >>> @celtramobile<http://www.twitter.com/celtramobile> >>> <logs.zip> >>> >>> >>> >> >
<<celtra_logo.png>>
