[ https://issues.apache.org/jira/browse/YARN-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Junping Du updated YARN-4984: ----------------------------- Attachment: YARN-4984-v3.patch > LogAggregationService shouldn't swallow exception in handling createAppDir() > which cause thread leak. > ----------------------------------------------------------------------------------------------------- > > Key: YARN-4984 > URL: https://issues.apache.org/jira/browse/YARN-4984 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation > Affects Versions: 2.7.2 > Reporter: Junping Du > Assignee: Junping Du > Priority: Critical > Attachments: YARN-4984-v2.patch, YARN-4984-v3.patch, YARN-4984.patch > > > Due to YARN-4325, many stale applications still exists in NM state store and > get recovered after NM restart. The app initiation will get failed due to > token invalid, but exception is swallowed and aggregator thread is still > created for invalid app. > Exception is: > {noformat} > 158 2016-04-19 23:38:33,039 ERROR logaggregation.LogAggregationService > (LogAggregationService.java:run(300)) - Failed to setup application log > directory for application_1448 060878692_11842 > 159 > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token 1380589 for hdfswrite) can't be fo > und in cache > 160 at org.apache.hadoop.ipc.Client.call(Client.java:1427) > 161 at org.apache.hadoop.ipc.Client.call(Client.java:1358) > 162 at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > 163 at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source) > 164 at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771) > 165 at sun.reflect.GeneratedMethodAccessor76.invoke(Unknown > Source) > 166 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 167 at java.lang.reflect.Method.invoke(Method.java:606) > 168 at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252) > 169 at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) > 170 at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) > 171 at > org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116) > 172 at > org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1315) > 173 at > org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1311) > 174 at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > 175 at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1311) > 176 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.checkExists(LogAggregationService.java:248) > 177 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.access$100(LogAggregationService.java:67) > 178 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276) > 179 at java.security.AccessController.doPrivileged(Native Method) > 180 at javax.security.auth.Subject.doAs(Subject.java:415) > 181 at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > 182 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:261) > 183 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:367) > 184 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320) > 185 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:447) > 186 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)