Hi Avery,
The following is the error of one of the failed tasks:
May 7, 2013 2:34:26 PM
org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink
WARNING: Failed to accept a connection.
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:657)
at
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:943)
at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1336)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.start(AbstractNioWorker.java:179)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.register(AbstractNioWorker.java:141)
at
org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink$Boss.registerAcceptedChannel(NioServerSocketPipelineSink.java:277)
at
org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink$Boss.run(NioServerSocketPipelineSink.java:239)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
13/05/07 14:34:26 INFO worker.BspServiceWorker: startSuperstep:
Master(hostname=lvshdc5dn0020.qa.paypal.com, MRtaskID=44, port=30044)
13/05/07 14:34:26 INFO worker.BspServiceWorker: startSuperstep: Ready for
computation on superstep -1 since worker selection and vertex range assignments
are done in
/_hadoopBsp/job_201305061811_0012/_applicationAttemptsDir/0/_superstepDir/-1/_addressesAndPartitions
May 7, 2013 2:34:26 PM org.jboss.netty.channel.DefaultChannelPipeline
WARNING: An exception was thrown by a user handler while handling an exception
event ([id: 0x45c3e9ba] EXCEPTION: java.lang.OutOfMemoryError: unable to create
new native thread)
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:657)
at
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:943)
at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1325)
at
org.jboss.netty.handler.execution.MemoryAwareThreadPoolExecutor.doUnorderedExecute(MemoryAwareThreadPoolExecutor.java:452)
at
org.jboss.netty.handler.execution.MemoryAwareThreadPoolExecutor.doExecute(MemoryAwareThreadPoolExecutor.java:445)
at
org.jboss.netty.handler.execution.MemoryAwareThreadPoolExecutor.execute(MemoryAwareThreadPoolExecutor.java:437)
at
org.jboss.netty.handler.execution.ExecutionHandler.handleUpstream(ExecutionHandler.java:172)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.exceptionCaught(FrameDecoder.java:378)
at
org.apache.giraph.comm.netty.ByteCounter.handleUpstream(ByteCounter.java:116)
at
org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:533)
at org.jboss.netty.channel.Channels$7.run(Channels.java:507)
at
org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:41)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.processEventQueue(AbstractNioWorker.java:373)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:254)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
May 7, 2013 2:34:26 PM org.jboss.netty.channel.DefaultChannelPipeline
WARNING: An exception was thrown by a user handler while handling an exception
event ([id: 0x45c3e9
Thanks
Arun Ramani
From: Avery Ching <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Date: Tuesday, May 7, 2013 2:29 PM
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: Re: Giraph and Fair Scheduler
Can you check the logs of the failed task and report what the error is?
Avery
On 5/7/13 2:26 PM, Ramani, Arun wrote:
Hi Avery,
I am setting "minsharepreemptiontimeout" to 5 sec and my Giraph job could not
even wait for 5 secs to get its slots. Let me explain the scenario below:
Assume, Cluster capacity is 150
Queue A (min share –10 maps) - I submit a sleep job with 100 map tasks. Cluster
is empty, and hence the first job submitted to Queue A will take the entire 100
map tasks.
Queue B (Giraph pool with min share – 140 maps) - Now my job 1 is running with
100 tasks occupied. I submit a giraph shortestpathfirst example job with 100
workers to Queue B. Queue B has "minsharepreemptiontimeout" to 5 sec". So, it
will first schedule 50 tasks since first job only took 100 tasks and cluster's
capacity is 150. Meanwhile, in 5 sec, 50 more tasks would be preempted from
Queue A and would be given to Giraph Job. I see this happening, however, the
job fails with "Unable to create native thread error"
Please let me know if "giraph.maxMasterSuperstepWaitMsecs" will help in this
scenario.
Thanks so much
Arun Ramani
From: Avery Ching <[email protected]<mailto:[email protected]>>
Date: Tuesday, May 7, 2013 2:19 PM
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Cc: "Ramani, Arun(aramani)" <[email protected]<mailto:[email protected]>>
Subject: Re: Giraph and Fair Scheduler
Oh, I see. You can change the timeout of how long the giraph job waits for
tasks before giving up. Try setting giraph.maxMasterSuperstepWaitMsecs to a
higher number. The default is 10 minutes.
Avery
On 5/7/13 2:10 PM, Ramani, Arun wrote:
Hi Avery,
I am not preempting tasks out of the giraph pool. I have configured pre-emption
so that any job submitted to giraph pool will get its min share. Any suggestion
on how to make this work?
Thanks so much in advance.
Arun Ramani
From: Avery Ching <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Date: Tuesday, May 7, 2013 7:25 AM
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: Re: Giraph and Fair Scheduler
Can you disable the preemption for the giraph pool? It's not great to preempt
those tasks.
Avery
On 5/6/13 6:37 PM, Ramani, Arun wrote:
Hi,
I am running Fair scheduler with many applications in hadoop stack in my
cluster (like pig, hive, hbase etc). I have dedicated a pool for Giraph and
want to run giraph along with those other applications. I have configured
pre-emption and and set the "minsharepreemptiontimeout=5" (sec – for the jobs
submitted to this pool to wait to get the min share).
I am trying to run giraph in this mode. I see that jobs from other pools are
getting pre-empted to give the giraph job's pool its configured min share but
my job fails with "Unable to create native thread" error. This same job passes
if the slots are available immediately without having to wait for the tasks
from other queues to be pre-empted. I also tried to tweak the
"giraph.minPercentResponded=50.0f". My Giraph job still fails. Please help in
this scenario.
Basically, I wanted to know how to configure giraph to wait for a threshold for
the slots to be available for it through pre-emption.
Thanks
Arun Ramani