I forgot to do that before restart the process.
> 在 2015年5月29日,11:17,Rohith Sharma <[email protected]> 写道:
>
> Hi
>
> Can you take thread dump and verify it?
>
> jstack <pid> > RM.out
> OR
> kill -3 <pid> (Note : head dump will be logged in out file)
>
> Thanks & Regards
> Rohith Sharma K S
>
>> On May 29, 2015, at 8:43 AM, jason lu <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>
>> Hi,
>> I met the same problem as :
>> http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201303.mbox/%[email protected]%3E
>>
>> <http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201303.mbox/%[email protected]%3E>
>>
>> Any idea about that?
>> It almost hadoop every 3 or 4weeks in my cluster(about 150 nodes).
>> I check the log, no warn, no error, no exception, but the ResouceManager
>> hung, not crash.
>>
>> I found this code, but I have no idea why it happens, why the event is
>> bigger and bigger?
>>
>> thanks.
>>
>> private final class EventProcessor implements Runnable {
>> @Override
>> public void run() {
>>
>> SchedulerEvent event;
>>
>> while (!stopped && !Thread.currentThread().isInterrupted()) {
>> try {
>> event = eventQueue.take();
>> } catch (InterruptedException e) {
>> LOG.error("Returning, interrupted : " + e);
>> return; // TODO: Kill RM.
>> }
>>
>> try {
>> scheduler.handle(event);
>> } catch (Throwable t) {
>> // An error occurred, but we are shutting down anyway.
>> // If it was an InterruptedException, the very act of
>> // shutdown could have caused it and is probably harmless.
>> if (stopped) {
>> LOG.warn("Exception during shutdown: ", t);
>> break;
>> }
>> LOG.fatal("Error in handling event type " + event.getType()
>> + " to the scheduler", t);
>> if (shouldExitOnError
>> && !ShutdownHookManager.get().isShutdownInProgress()) {
>> LOG.info("Exiting, bbye..");
>> System.exit(-1);
>> }
>> }
>> }
>> }
>> }
>>
>> @Override
>> protected void serviceStop() throws Exception {
>> this.stopped = true;
>> this.eventProcessor.interrupt();
>> try {
>> this.eventProcessor.join();
>> } catch (InterruptedException e) {
>> throw new YarnRuntimeException(e);
>> }
>> super.serviceStop();
>> }
>>
>> @Override
>> public void handle(SchedulerEvent event) {
>> try {
>> int qSize = eventQueue.size();
>> if (qSize !=0 && qSize %1000 == 0) {
>> LOG.info("Size of scheduler event-queue is " + qSize);
>> }
>> int remCapacity = eventQueue.remainingCapacity();
>> if (remCapacity < 1000) {
>> LOG.info("Very low remaining capacity on scheduler event queue: "
>> + remCapacity);
>> }
>> this.eventQueue.put(event);
>> } catch (InterruptedException e) {
>> throw new YarnRuntimeException(e);
>> }
>> }
>> }
>>
>> logs:
>>
>> grep 'Size of event-queue'
>> yarn-hadoop-resourcemanager-gdc-hm01-formal.i.nease.net.log
>> 2015-05-29 00:54:46,985 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
>> Size of event-queue is 1000
>> 2015-05-29 00:55:28,850 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
>> Size of event-queue is 2000
>> 2015-05-29 00:56:10,204 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
>> Size of event-queue is 3000
>> 2015-05-29 00:56:51,995 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
>> Size of event-queue is 4000
>> 2015-05-29 00:57:33,981 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
>> Size of event-queue is 5000
>> 2015-05-29 00:58:15,324 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
>> Size of event-queue is 6000
>> 2015-05-29 00:58:57,111 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
>> Size of event-queue is 7000
>> 2015-05-29 00:59:38,593 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
>> Size of event-queue is 8000
>> 2015-05-29 01:00:20,215 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
>> Size of event-queue is 9000
>> 2015-05-29 01:01:00,559 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
>> Size of event-queue is 10000
>> 2015-05-29 01:01:39,614 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
>> Size of event-queue is 11000
>> 2015-05-29 01:02:21,364 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
>> Size of event-queue is 12000
>> 2015-05-29 01:03:03,233 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
>> Size of event-queue is 13000
>> 2015-05-29 01:03:44,701 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
>> Size of event-queue is 14000
>> 2015-05-29 01:04:26,494 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
>> Size of event-queue is 15000
>> 2015-05-29 01:05:08,180 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
>> Size of event-queue is 16000
>> 2015-05-29 01:05:50,331 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
>> Size of event-queue is 17000
>>
>>
>