[
https://issues.apache.org/jira/browse/YARN-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148643#comment-15148643
]
Steve Loughran commented on YARN-4695:
--------------------------------------
Fuller stack. What's happening is the RM has been shut down (to be precise:
there isn't an RM running. not even a little one). the log parser is asking for
the latest list of running apps, and timing out; this puts IPC into retry,
which stops the shutdown working.
{code}
016-02-16 14:13:28,126 [IPC Server Responder] INFO ipc.Server
(Server.java:run(959)) - Stopping IPC Server Responder
2016-02-16 14:13:28,126 [ScalaTest-main-running-TimelineListenerSuite] INFO
timeline.EntityGroupFSTimelineStore
(EntityGroupFSTimelineStore.java:serviceStop(275)) - Stopping
EntityGroupFSTimelineStore
2016-02-16 14:13:28,127 [ScalaTest-main-running-TimelineListenerSuite] INFO
timeline.EntityGroupFSTimelineStore
(EntityGroupFSTimelineStore.java:serviceStop(279)) - Waiting for executor to
terminate
2016-02-16 14:13:28,966 [EntityLogPluginWorker #1] INFO ipc.Client
(Client.java:handleConnectionFailure(897)) - Retrying connect to server:
www.bbc.co.uk/0.0.0.0:8032. Already tried 0 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-02-16 14:13:29,970 [EntityLogPluginWorker #1] INFO ipc.Client
(Client.java:handleConnectionFailure(897)) - Retrying connect to server:
www.bbc.co.uk/0.0.0.0:8032. Already tried 1 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-02-16 14:13:30,975 [EntityLogPluginWorker #1] INFO ipc.Client
(Client.java:handleConnectionFailure(897)) - Retrying connect to server:
www.bbc.co.uk/0.0.0.0:8032. Already tried 2 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-02-16 14:13:31,980 [EntityLogPluginWorker #1] INFO ipc.Client
(Client.java:handleConnectionFailure(897)) - Retrying connect to server:
www.bbc.co.uk/0.0.0.0:8032. Already tried 3 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-02-16 14:13:32,986 [EntityLogPluginWorker #1] INFO ipc.Client
(Client.java:handleConnectionFailure(897)) - Retrying connect to server:
www.bbc.co.uk/0.0.0.0:8032. Already tried 4 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-02-16 14:13:33,995 [EntityLogPluginWorker #1] INFO ipc.Client
(Client.java:handleConnectionFailure(897)) - Retrying connect to server:
www.bbc.co.uk/0.0.0.0:8032. Already tried 5 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-02-16 14:13:35,000 [EntityLogPluginWorker #1] INFO ipc.Client
(Client.java:handleConnectionFailure(897)) - Retrying connect to server:
www.bbc.co.uk/0.0.0.0:8032. Already tried 6 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-02-16 14:13:36,005 [EntityLogPluginWorker #1] INFO ipc.Client
(Client.java:handleConnectionFailure(897)) - Retrying connect to server:
www.bbc.co.uk/0.0.0.0:8032. Already tried 7 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-02-16 14:13:37,010 [EntityLogPluginWorker #1] INFO ipc.Client
(Client.java:handleConnectionFailure(897)) - Retrying connect to server:
www.bbc.co.uk/0.0.0.0:8032. Already tried 8 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-02-16 14:13:38,015 [EntityLogPluginWorker #1] INFO ipc.Client
(Client.java:handleConnectionFailure(897)) - Retrying connect to server:
www.bbc.co.uk/0.0.0.0:8032. Already tried 9 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-02-16 14:13:38,133 [ScalaTest-main-running-TimelineListenerSuite] WARN
timeline.EntityGroupFSTimelineStore
(EntityGroupFSTimelineStore.java:serviceStop(284)) - Executor did not terminate
2016-02-16 14:13:38,133 [ScalaTest-main-running-TimelineListenerSuite] INFO
timeline.LeveldbTimelineStore (LeveldbTimelineStore.java:serviceStop(250)) -
Waiting for deletion thread to complete its current action
2016-02-16 14:13:38,133 [Thread-9] INFO timeline.LeveldbTimelineStore
(LeveldbTimelineStore.java:run(296)) - Deletion thread received interrupt,
exiting
2016-02-16 14:13:38,134 [EntityLogPluginWorker #1] ERROR
timeline.EntityGroupFSTimelineStore (EntityGroupFSTimelineStore.java:run(693))
- Error processing logs for application_1111_0000
java.lang.reflect.UndeclaredThrowableException
at com.sun.proxy.$Proxy25.getApplicationReport(Unknown Source)
at
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:448)
at
org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getAppState(EntityGroupFSTimelineStore.java:464)
at
org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.access$700(EntityGroupFSTimelineStore.java:79)
at
org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore$AppLogs.parseSummaryLogs(EntityGroupFSTimelineStore.java:529)
at
org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore$AppLogs.parseSummaryLogs(EntityGroupFSTimelineStore.java:519)
at
org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore$ActiveLogParser.run(EntityGroupFSTimelineStore.java:686)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:159)
... 14 more
{code}
> EntityGroupFSTimelineStore to not log errors during shutdown
> ------------------------------------------------------------
>
> Key: YARN-4695
> URL: https://issues.apache.org/jira/browse/YARN-4695
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Affects Versions: 2.8.0
> Reporter: Steve Loughran
>
> # The {{EntityGroupFSTimelineStore}} threads log exceptions that get raised
> during their execution.
> # the service stops by interrupting all its workers
> # as a result, the workers all log exceptions at error *even during a managed
> shutdown*
> # this creates distracting noise in logs
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)