Oh, I see. I did jstack on a cluster of machines and a single machine... I'm not quite sure how to interpret the output. My best guess is that there might be a deadlock---there's just a bunch of Netty threads waiting. The links to the jstack dumps:
http://pastebin.com/0cLuaF07 (PageRank, single worker, amazon0505 graph from SNAP) http://pastebin.com/MNEUELui (MST, from one of the 64 workers, com-orkut graph from SNAP) Any idea what's happening? Or anything in particular I should look for next? Thanks, Young On Mon, Mar 17, 2014 at 12:19 PM, Avery Ching <[email protected]> wrote: > Hi Young, > > Our Hadoop instance (Corona) kills processes after they finish executing > so we don't see this. You might want to do a jstack to see where it's hung > up on and figure out the issue. > > Thanks > > Avery > > > On 3/17/14, 7:56 AM, Young Han wrote: > >> Hi all, >> >> With Giraph 1.0.0, I've noticed an issue where the Java process >> corresponding to the job loiters around indefinitely even after the job >> completes (successfully). The process consumes memory but not CPU time. >> This happens on both a single machine and clusters of machines (in which >> case every worker has the issue). The only way I know of fixing this is >> killing the Java process manually---restarting or stopping Hadoop does not >> help. >> >> Is this some known bug or a configuration issue on my end? >> >> Thanks, >> Young >> > >
