Hi Can you take thread dump and verify it?
jstack <pid> > RM.out OR kill -3 <pid> (Note : head dump will be logged in out file) Thanks & Regards Rohith Sharma K S > On May 29, 2015, at 8:43 AM, jason lu <[email protected]> wrote: > > > Hi, > I met the same problem as : > http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201303.mbox/%[email protected]%3E > > <http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201303.mbox/%[email protected]%3E> > > Any idea about that? > It almost hadoop every 3 or 4weeks in my cluster(about 150 nodes). > I check the log, no warn, no error, no exception, but the ResouceManager > hung, not crash. > > I found this code, but I have no idea why it happens, why the event is bigger > and bigger? > > thanks. > > private final class EventProcessor implements Runnable { > @Override > public void run() { > > SchedulerEvent event; > > while (!stopped && !Thread.currentThread().isInterrupted()) { > try { > event = eventQueue.take(); > } catch (InterruptedException e) { > LOG.error("Returning, interrupted : " + e); > return; // TODO: Kill RM. > } > > try { > scheduler.handle(event); > } catch (Throwable t) { > // An error occurred, but we are shutting down anyway. > // If it was an InterruptedException, the very act of > // shutdown could have caused it and is probably harmless. > if (stopped) { > LOG.warn("Exception during shutdown: ", t); > break; > } > LOG.fatal("Error in handling event type " + event.getType() > + " to the scheduler", t); > if (shouldExitOnError > && !ShutdownHookManager.get().isShutdownInProgress()) { > LOG.info("Exiting, bbye.."); > System.exit(-1); > } > } > } > } > } > > @Override > protected void serviceStop() throws Exception { > this.stopped = true; > this.eventProcessor.interrupt(); > try { > this.eventProcessor.join(); > } catch (InterruptedException e) { > throw new YarnRuntimeException(e); > } > super.serviceStop(); > } > > @Override > public void handle(SchedulerEvent event) { > try { > int qSize = eventQueue.size(); > if (qSize !=0 && qSize %1000 == 0) { > LOG.info("Size of scheduler event-queue is " + qSize); > } > int remCapacity = eventQueue.remainingCapacity(); > if (remCapacity < 1000) { > LOG.info("Very low remaining capacity on scheduler event queue: " > + remCapacity); > } > this.eventQueue.put(event); > } catch (InterruptedException e) { > throw new YarnRuntimeException(e); > } > } > } > > logs: > > grep 'Size of event-queue' > yarn-hadoop-resourcemanager-gdc-hm01-formal.i.nease.net.log > 2015-05-29 00:54:46,985 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Size of event-queue is 1000 > 2015-05-29 00:55:28,850 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Size of event-queue is 2000 > 2015-05-29 00:56:10,204 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Size of event-queue is 3000 > 2015-05-29 00:56:51,995 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Size of event-queue is 4000 > 2015-05-29 00:57:33,981 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Size of event-queue is 5000 > 2015-05-29 00:58:15,324 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Size of event-queue is 6000 > 2015-05-29 00:58:57,111 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Size of event-queue is 7000 > 2015-05-29 00:59:38,593 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Size of event-queue is 8000 > 2015-05-29 01:00:20,215 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Size of event-queue is 9000 > 2015-05-29 01:01:00,559 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Size of event-queue is 10000 > 2015-05-29 01:01:39,614 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Size of event-queue is 11000 > 2015-05-29 01:02:21,364 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Size of event-queue is 12000 > 2015-05-29 01:03:03,233 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Size of event-queue is 13000 > 2015-05-29 01:03:44,701 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Size of event-queue is 14000 > 2015-05-29 01:04:26,494 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Size of event-queue is 15000 > 2015-05-29 01:05:08,180 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Size of event-queue is 16000 > 2015-05-29 01:05:50,331 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Size of event-queue is 17000 > >
