[jira] [Updated] (YARN-4984) LogAggregationService shouldn't swallow exception in handling createAppDir() which cause thread leak.

2016-04-28 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4984:
-
Attachment: YARN-4984-v4.patch

Fix test failure in v3 patch which is actually a test issue - we shouldn't 
delete local log dir in case log aggregation service cannot continue due to 
directory creation failure in remoteFS. 

> LogAggregationService shouldn't swallow exception in handling createAppDir() 
> which cause thread leak.
> -
>
> Key: YARN-4984
> URL: https://issues.apache.org/jira/browse/YARN-4984
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 2.7.2
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4984-v2.patch, YARN-4984-v3.patch, 
> YARN-4984-v4.patch, YARN-4984.patch
>
>
> Due to YARN-4325, many stale applications still exists in NM state store and 
> get recovered after NM restart. The app initiation will get failed due to 
> token invalid, but exception is swallowed and aggregator thread is still 
> created for invalid app.
> Exception is:
> {noformat}
> 158 2016-04-19 23:38:33,039 ERROR logaggregation.LogAggregationService 
> (LogAggregationService.java:run(300)) - Failed to setup application log 
> directory for application_1448060878692_11842
> 159 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 1380589 for hdfswrite) can't be fo
> und in cache
> 160 at org.apache.hadoop.ipc.Client.call(Client.java:1427)
> 161 at org.apache.hadoop.ipc.Client.call(Client.java:1358)
> 162 at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> 163 at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source)
> 164 at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
> 165 at sun.reflect.GeneratedMethodAccessor76.invoke(Unknown 
> Source)
> 166 at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 167 at java.lang.reflect.Method.invoke(Method.java:606)
> 168 at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
> 169 at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
> 170 at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
> 171 at 
> org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116)
> 172 at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1315)
> 173 at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1311)
> 174 at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> 175 at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1311)
> 176 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.checkExists(LogAggregationService.java:248)
> 177 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.access$100(LogAggregationService.java:67)
> 178 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276)
> 179 at java.security.AccessController.doPrivileged(Native Method)
> 180 at javax.security.auth.Subject.doAs(Subject.java:415)
> 181 at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> 182 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:261)
> 183 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:367)
> 184 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320)
> 185 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:447)
> 186 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4984) LogAggregationService shouldn't swallow exception in handling createAppDir() which cause thread leak.

2016-04-26 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4984:
-
Attachment: YARN-4984-v3.patch

> LogAggregationService shouldn't swallow exception in handling createAppDir() 
> which cause thread leak.
> -
>
> Key: YARN-4984
> URL: https://issues.apache.org/jira/browse/YARN-4984
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 2.7.2
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4984-v2.patch, YARN-4984-v3.patch, YARN-4984.patch
>
>
> Due to YARN-4325, many stale applications still exists in NM state store and 
> get recovered after NM restart. The app initiation will get failed due to 
> token invalid, but exception is swallowed and aggregator thread is still 
> created for invalid app.
> Exception is:
> {noformat}
> 158 2016-04-19 23:38:33,039 ERROR logaggregation.LogAggregationService 
> (LogAggregationService.java:run(300)) - Failed to setup application log 
> directory for application_1448060878692_11842
> 159 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 1380589 for hdfswrite) can't be fo
> und in cache
> 160 at org.apache.hadoop.ipc.Client.call(Client.java:1427)
> 161 at org.apache.hadoop.ipc.Client.call(Client.java:1358)
> 162 at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> 163 at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source)
> 164 at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
> 165 at sun.reflect.GeneratedMethodAccessor76.invoke(Unknown 
> Source)
> 166 at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 167 at java.lang.reflect.Method.invoke(Method.java:606)
> 168 at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
> 169 at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
> 170 at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
> 171 at 
> org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116)
> 172 at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1315)
> 173 at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1311)
> 174 at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> 175 at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1311)
> 176 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.checkExists(LogAggregationService.java:248)
> 177 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.access$100(LogAggregationService.java:67)
> 178 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276)
> 179 at java.security.AccessController.doPrivileged(Native Method)
> 180 at javax.security.auth.Subject.doAs(Subject.java:415)
> 181 at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> 182 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:261)
> 183 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:367)
> 184 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320)
> 185 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:447)
> 186 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4984) LogAggregationService shouldn't swallow exception in handling createAppDir() which cause thread leak.

2016-04-22 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4984:
-
Attachment: YARN-4984-v2.patch

Thanks [~leftnoteasy] for review and comments!
bq.  We may need to remove following statement as well.
Nice catch. Remove this unnecessary code in v2 patch.

> LogAggregationService shouldn't swallow exception in handling createAppDir() 
> which cause thread leak.
> -
>
> Key: YARN-4984
> URL: https://issues.apache.org/jira/browse/YARN-4984
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 2.7.2
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4984-v2.patch, YARN-4984.patch
>
>
> Due to YARN-4325, many stale applications still exists in NM state store and 
> get recovered after NM restart. The app initiation will get failed due to 
> token invalid, but exception is swallowed and aggregator thread is still 
> created for invalid app.
> Exception is:
> {noformat}
> 158 2016-04-19 23:38:33,039 ERROR logaggregation.LogAggregationService 
> (LogAggregationService.java:run(300)) - Failed to setup application log 
> directory for application_1448060878692_11842
> 159 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 1380589 for hdfswrite) can't be fo
> und in cache
> 160 at org.apache.hadoop.ipc.Client.call(Client.java:1427)
> 161 at org.apache.hadoop.ipc.Client.call(Client.java:1358)
> 162 at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> 163 at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source)
> 164 at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
> 165 at sun.reflect.GeneratedMethodAccessor76.invoke(Unknown 
> Source)
> 166 at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 167 at java.lang.reflect.Method.invoke(Method.java:606)
> 168 at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
> 169 at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
> 170 at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
> 171 at 
> org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116)
> 172 at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1315)
> 173 at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1311)
> 174 at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> 175 at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1311)
> 176 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.checkExists(LogAggregationService.java:248)
> 177 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.access$100(LogAggregationService.java:67)
> 178 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276)
> 179 at java.security.AccessController.doPrivileged(Native Method)
> 180 at javax.security.auth.Subject.doAs(Subject.java:415)
> 181 at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> 182 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:261)
> 183 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:367)
> 184 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320)
> 185 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:447)
> 186 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4984) LogAggregationService shouldn't swallow exception in handling createAppDir() which cause thread leak.

2016-04-22 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4984:
-
Attachment: YARN-4984.patch

> LogAggregationService shouldn't swallow exception in handling createAppDir() 
> which cause thread leak.
> -
>
> Key: YARN-4984
> URL: https://issues.apache.org/jira/browse/YARN-4984
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 2.7.2
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4984.patch
>
>
> Due to YARN-4325, many stale applications still exists in NM state store and 
> get recovered after NM restart. The app initiation will get failed due to 
> token invalid, but exception is swallowed and aggregator thread is still 
> created for invalid app.
> Exception is:
> {noformat}
> 158 2016-04-19 23:38:33,039 ERROR logaggregation.LogAggregationService 
> (LogAggregationService.java:run(300)) - Failed to setup application log 
> directory for application_1448060878692_11842
> 159 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 1380589 for hdfswrite) can't be fo
> und in cache
> 160 at org.apache.hadoop.ipc.Client.call(Client.java:1427)
> 161 at org.apache.hadoop.ipc.Client.call(Client.java:1358)
> 162 at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> 163 at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source)
> 164 at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
> 165 at sun.reflect.GeneratedMethodAccessor76.invoke(Unknown 
> Source)
> 166 at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 167 at java.lang.reflect.Method.invoke(Method.java:606)
> 168 at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
> 169 at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
> 170 at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
> 171 at 
> org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116)
> 172 at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1315)
> 173 at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1311)
> 174 at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> 175 at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1311)
> 176 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.checkExists(LogAggregationService.java:248)
> 177 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.access$100(LogAggregationService.java:67)
> 178 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276)
> 179 at java.security.AccessController.doPrivileged(Native Method)
> 180 at javax.security.auth.Subject.doAs(Subject.java:415)
> 181 at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> 182 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:261)
> 183 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:367)
> 184 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320)
> 185 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:447)
> 186 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)