Re: Giraph and Fair Scheduler

Ramani, Arun Tue, 07 May 2013 14:36:15 -0700

Hi Avery,

The following is the error of one of the failed tasks:



May 7, 2013 2:34:26 PM 
org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink
WARNING: Failed to accept a connection.
java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:657)
        at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:943)
        at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1336)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.start(AbstractNioWorker.java:179)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.register(AbstractNioWorker.java:141)
        at 
org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink$Boss.registerAcceptedChannel(NioServerSocketPipelineSink.java:277)
        at 
org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink$Boss.run(NioServerSocketPipelineSink.java:239)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:679)
13/05/07 14:34:26 INFO worker.BspServiceWorker: startSuperstep: 
Master(hostname=lvshdc5dn0020.qa.paypal.com, MRtaskID=44, port=30044)
13/05/07 14:34:26 INFO worker.BspServiceWorker: startSuperstep: Ready for 
computation on superstep -1 since worker selection and vertex range assignments 
are done in 
/_hadoopBsp/job_201305061811_0012/_applicationAttemptsDir/0/_superstepDir/-1/_addressesAndPartitions
May 7, 2013 2:34:26 PM org.jboss.netty.channel.DefaultChannelPipeline
WARNING: An exception was thrown by a user handler while handling an exception 
event ([id: 0x45c3e9ba] EXCEPTION: java.lang.OutOfMemoryError: unable to create 
new native thread)
java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:657)
        at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:943)
        at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1325)
        at 
org.jboss.netty.handler.execution.MemoryAwareThreadPoolExecutor.doUnorderedExecute(MemoryAwareThreadPoolExecutor.java:452)
        at 
org.jboss.netty.handler.execution.MemoryAwareThreadPoolExecutor.doExecute(MemoryAwareThreadPoolExecutor.java:445)
        at 
org.jboss.netty.handler.execution.MemoryAwareThreadPoolExecutor.execute(MemoryAwareThreadPoolExecutor.java:437)
        at 
org.jboss.netty.handler.execution.ExecutionHandler.handleUpstream(ExecutionHandler.java:172)
        at 
org.jboss.netty.handler.codec.frame.FrameDecoder.exceptionCaught(FrameDecoder.java:378)
        at 
org.apache.giraph.comm.netty.ByteCounter.handleUpstream(ByteCounter.java:116)
        at 
org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:533)
        at org.jboss.netty.channel.Channels$7.run(Channels.java:507)
        at 
org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:41)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.processEventQueue(AbstractNioWorker.java:373)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:254)
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:679)
May 7, 2013 2:34:26 PM org.jboss.netty.channel.DefaultChannelPipeline
WARNING: An exception was thrown by a user handler while handling an exception 
event ([id: 0x45c3e9

Thanks

Arun Ramani

From: Avery Ching <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Tuesday, May 7, 2013 2:29 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Giraph and Fair Scheduler

Can you check the logs of the failed task and report what the error is?

Avery

On 5/7/13 2:26 PM, Ramani, Arun wrote:
Hi Avery,

I am setting "minsharepreemptiontimeout" to 5 sec and my Giraph job could not 
even wait for 5 secs to get its slots. Let me explain the scenario below:

Assume, Cluster capacity is 150
Queue A (min share –10 maps) - I submit a sleep job with 100 map tasks. Cluster 
is empty, and hence the first job submitted to Queue A will take the entire 100 
map tasks.
Queue B (Giraph pool with min share – 140 maps) - Now my job 1 is running with 
100 tasks occupied. I submit a giraph shortestpathfirst example job with 100 
workers to Queue B. Queue B has "minsharepreemptiontimeout" to 5 sec". So, it 
will first schedule 50 tasks since first job only took 100 tasks and cluster's 
capacity is 150. Meanwhile, in 5 sec, 50 more tasks would be preempted from 
Queue A and would be given to Giraph Job. I see this happening, however, the 
job fails with "Unable to create native thread error"

Please let me know if "giraph.maxMasterSuperstepWaitMsecs" will help in this 
scenario.

Thanks so much
Arun Ramani

From: Avery Ching <[email protected]<mailto:[email protected]>>
Date: Tuesday, May 7, 2013 2:19 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Cc: "Ramani, Arun(aramani)" <[email protected]<mailto:[email protected]>>
Subject: Re: Giraph and Fair Scheduler

Oh, I see.  You can change the timeout of how long the giraph job waits for 
tasks before giving up.  Try setting giraph.maxMasterSuperstepWaitMsecs to a 
higher number.  The default is 10 minutes.

Avery

On 5/7/13 2:10 PM, Ramani, Arun wrote:
Hi Avery,

I am not preempting tasks out of the giraph pool. I have configured pre-emption 
so that any job submitted to giraph pool will get its min share. Any suggestion 
on how to make this work?

Thanks so much in advance.

Arun Ramani

From: Avery Ching <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Tuesday, May 7, 2013 7:25 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Giraph and Fair Scheduler

Can you disable the preemption for the giraph pool?  It's not great to preempt 
those tasks.

Avery

On 5/6/13 6:37 PM, Ramani, Arun wrote:
Hi,

I am running Fair scheduler with many applications in hadoop stack in my 
cluster (like pig, hive, hbase etc). I have dedicated a pool for Giraph and 
want to run giraph along with those other applications. I have configured 
pre-emption and and set the "minsharepreemptiontimeout=5" (sec – for the jobs 
submitted to this pool to wait to get the min share).

I am trying to run giraph in this mode. I see that jobs from other pools are 
getting pre-empted to give the giraph job's pool its configured min share but 
my job fails with "Unable to create native thread" error. This same job passes 
if the slots are available immediately without having to wait for the tasks 
from other queues to be pre-empted. I also tried to tweak the 
"giraph.minPercentResponded=50.0f". My Giraph job still fails. Please help in 
this scenario.

Basically, I wanted to know how to configure giraph to wait for a threshold for 
the slots to be available for it through pre-emption.

Thanks
Arun Ramani

Re: Giraph and Fair Scheduler

Reply via email to