[jira] [Updated] (YARN-6019) MR application fails with "No NMToken sent" exception after MRAppMaster recovery

2016-12-22 Thread Aleksandr Balitsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Balitsky updated YARN-6019:
-
Component/s: resourcemanager

> MR application fails with "No NMToken sent" exception after MRAppMaster 
> recovery
> 
>
> Key: YARN-6019
> URL: https://issues.apache.org/jira/browse/YARN-6019
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, yarn
>Affects Versions: 2.7.0
> Environment: Centos 7
>Reporter: Aleksandr Balitsky
>Priority: Critical
> Attachments: YARN-6019.001.patch
>
>
> *Steps to reproduce:*
> 1) Submit MR application (for example PI app with 50 containers)
> 2) Find MRAppMaster process id for the application 
> 3) Kill MRAppMaster by kill -9 command
> *Expected:* ResourceManager launch new MRAppMaster container and MRAppAttempt 
> and application finish correctly
> *Actually:* After launching new MRAppMaster and MRAppAttempt the application 
> fails with the following exception:
> {noformat}
> 2016-12-22 23:17:53,929 ERROR [ContainerLauncher #9] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container 
> launch failed for container_1482408247195_0002_02_11 : 
> org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
> for node1:43037
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:254)
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:244)
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:395)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:361)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> *Problem*:
> When RMCommunicator sends "registerApplicationMaster" request to RM, RM 
> generates NMTokens for new RMAppAttempt. Those new NMTokens are transmitted 
> to RMCommunicator in RegisterApplicationMasterResponse  
> (getNMTokensFromPreviousAttempts method). But we don't handle these tokens in 
> RMCommunicator.register method. RM don't transmit tese tokens again for other 
> allocated requests, but we don't have these tokens in NMTokenCache. 
> Accordingly we get "No NMToken sent for node" exception.
> I have found that this issue appears after changes from the 
> https://github.com/apache/hadoop/commit/9b272ccae78918e7d756d84920a9322187d61eed
>  
> I tried to do the same scenario without the commit and application completed 
> successfully after RMAppMaster recovery



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6019) MR application fails with "No NMToken sent" exception after MRAppMaster recovery

2016-12-22 Thread Aleksandr Balitsky (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15770392#comment-15770392
 ] 

Aleksandr Balitsky commented on YARN-6019:
--

I tried to save NMTokens and containers form previous attempt in RMCommunicator 
and checked the steps to reproduce again. Application finished successfully 
with no exceptions. Could somebody review my patch (001) ?

> MR application fails with "No NMToken sent" exception after MRAppMaster 
> recovery
> 
>
> Key: YARN-6019
> URL: https://issues.apache.org/jira/browse/YARN-6019
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.0
> Environment: Centos 7
>Reporter: Aleksandr Balitsky
>Priority: Critical
> Attachments: YARN-6019.001.patch
>
>
> *Steps to reproduce:*
> 1) Submit MR application (for example PI app with 50 containers)
> 2) Find MRAppMaster process id for the application 
> 3) Kill MRAppMaster by kill -9 command
> *Expected:* ResourceManager launch new MRAppMaster container and MRAppAttempt 
> and application finish correctly
> *Actually:* After launching new MRAppMaster and MRAppAttempt the application 
> fails with the following exception:
> {noformat}
> 2016-12-22 23:17:53,929 ERROR [ContainerLauncher #9] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container 
> launch failed for container_1482408247195_0002_02_11 : 
> org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
> for node1:43037
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:254)
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:244)
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:395)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:361)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> *Problem*:
> When RMCommunicator sends "registerApplicationMaster" request to RM, RM 
> generates NMTokens for new RMAppAttempt. Those new NMTokens are transmitted 
> to RMCommunicator in RegisterApplicationMasterResponse  
> (getNMTokensFromPreviousAttempts method). But we don't handle these tokens in 
> RMCommunicator.register method. RM don't transmit tese tokens again for other 
> allocated requests, but we don't have these tokens in NMTokenCache. 
> Accordingly we get "No NMToken sent for node" exception.
> I have found that this issue appears after changes from the 
> https://github.com/apache/hadoop/commit/9b272ccae78918e7d756d84920a9322187d61eed
>  
> I tried to do the same scenario without the commit and application completed 
> successfully after RMAppMaster recovery



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6019) MR application fails with "No NMToken sent" exception after MRAppMaster recovery

2016-12-22 Thread Aleksandr Balitsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Balitsky updated YARN-6019:
-
Attachment: YARN-6019.001.patch

> MR application fails with "No NMToken sent" exception after MRAppMaster 
> recovery
> 
>
> Key: YARN-6019
> URL: https://issues.apache.org/jira/browse/YARN-6019
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.0
> Environment: Centos 7
>Reporter: Aleksandr Balitsky
>Priority: Critical
> Attachments: YARN-6019.001.patch
>
>
> *Steps to reproduce:*
> 1) Submit MR application (for example PI app with 50 containers)
> 2) Find MRAppMaster process id for the application 
> 3) Kill MRAppMaster by kill -9 command
> *Expected:* ResourceManager launch new MRAppMaster container and MRAppAttempt 
> and application finish correctly
> *Actually:* After launching new MRAppMaster and MRAppAttempt the application 
> fails with the following exception:
> {noformat}
> 2016-12-22 23:17:53,929 ERROR [ContainerLauncher #9] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container 
> launch failed for container_1482408247195_0002_02_11 : 
> org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
> for node1:43037
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:254)
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:244)
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:395)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:361)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> *Problem*:
> When RMCommunicator sends "registerApplicationMaster" request to RM, RM 
> generates NMTokens for new RMAppAttempt. Those new NMTokens are transmitted 
> to RMCommunicator in RegisterApplicationMasterResponse  
> (getNMTokensFromPreviousAttempts method). But we don't handle these tokens in 
> RMCommunicator.register method. RM don't transmit tese tokens again for other 
> allocated requests, but we don't have these tokens in NMTokenCache. 
> Accordingly we get "No NMToken sent for node" exception.
> I have found that this issue appears after changes from the 
> https://github.com/apache/hadoop/commit/9b272ccae78918e7d756d84920a9322187d61eed
>  
> I tried to do the same scenario without the commit and application completed 
> successfully after RMAppMaster recovery



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6019) MR application fails with "No NMToken sent" exception after MRAppMaster recovery

2016-12-22 Thread Aleksandr Balitsky (JIRA)
Aleksandr Balitsky created YARN-6019:


 Summary: MR application fails with "No NMToken sent" exception 
after MRAppMaster recovery
 Key: YARN-6019
 URL: https://issues.apache.org/jira/browse/YARN-6019
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.0
 Environment: Centos 7
Reporter: Aleksandr Balitsky
Priority: Critical


*Steps to reproduce:*
1) Submit MR application (for example PI app with 50 containers)
2) Find MRAppMaster process id for the application 
3) Kill MRAppMaster by kill -9 command

*Expected:* ResourceManager launch new MRAppMaster container and MRAppAttempt 
and application finish correctly

*Actually:* After launching new MRAppMaster and MRAppAttempt the application 
fails with the following exception:

{noformat}
2016-12-22 23:17:53,929 ERROR [ContainerLauncher #9] 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container 
launch failed for container_1482408247195_0002_02_11 : 
org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
for node1:43037
at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:254)
at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:244)
at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129)
at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:395)
at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:361)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

{noformat}

*Problem*:
When RMCommunicator sends "registerApplicationMaster" request to RM, RM 
generates NMTokens for new RMAppAttempt. Those new NMTokens are transmitted to 
RMCommunicator in RegisterApplicationMasterResponse  
(getNMTokensFromPreviousAttempts method). But we don't handle these tokens in 
RMCommunicator.register method. RM don't transmit tese tokens again for other 
allocated requests, but we don't have these tokens in NMTokenCache. Accordingly 
we get "No NMToken sent for node" exception.

I have found that this issue appears after changes from the 
https://github.com/apache/hadoop/commit/9b272ccae78918e7d756d84920a9322187d61eed
 

I tried to do the same scenario without the commit and application completed 
successfully after RMAppMaster recovery




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5691) RM failed Failed to load/recover state due to bad DelegationKey in RM State Store

2016-09-29 Thread Aleksandr Balitsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Balitsky updated YARN-5691:
-
Attachment: YARN_5691_v1_001_patch.patch

> RM failed Failed to load/recover state due to bad DelegationKey in RM State 
> Store
> -
>
> Key: YARN-5691
> URL: https://issues.apache.org/jira/browse/YARN-5691
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0, 2.7.1, 2.7.2, 2.7.3
>Reporter: Aleksandr Balitsky
>Priority: Minor
> Attachments: YARN_5691_v1_001_patch.patch
>
>
> RM failed while recovery with the following error:
> 2016-09-12 21:32:21,999 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to 
> load/recover state
> java.io.EOFException
> at java.io.DataInputStream.readByte(DataInputStream.java:267)
> at 
> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
> at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
> at 
> org.apache.hadoop.security.token.delegation.DelegationKey.readFields(DelegationKey.java:110)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMDTSecretManagerState(FileSystemRMStateStore.java:346)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:199)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1007)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1048)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1044)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1044)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1084)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1221)
> 2016-09-12 21:32:22,002 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state STARTED; cause: java.io.EOFException
> java.io.EOFException
> at java.io.DataInputStream.readByte(DataInputStream.java:267)
> at 
> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
> at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
> at 
> org.apache.hadoop.security.token.delegation.DelegationKey.readFields(DelegationKey.java:110)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMDTSecretManagerState(FileSystemRMStateStore.java:346)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:199)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1007)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1048)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1044)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1044)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1084)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1221)
> 2016-09-12 21:32:22,008 INFO 
> 

[jira] [Created] (YARN-5691) RM failed Failed to load/recover state due to bad DelegationKey in RM State Store

2016-09-29 Thread Aleksandr Balitsky (JIRA)
Aleksandr Balitsky created YARN-5691:


 Summary: RM failed Failed to load/recover state due to bad 
DelegationKey in RM State Store
 Key: YARN-5691
 URL: https://issues.apache.org/jira/browse/YARN-5691
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.3, 2.7.2, 2.7.1, 2.7.0
Reporter: Aleksandr Balitsky
Priority: Minor


RM failed while recovery with the following error:

2016-09-12 21:32:21,999 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to 
load/recover state
java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:267)
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
at 
org.apache.hadoop.security.token.delegation.DelegationKey.readFields(DelegationKey.java:110)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMDTSecretManagerState(FileSystemRMStateStore.java:346)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:199)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1007)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1048)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1044)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1044)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1084)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1221)
2016-09-12 21:32:22,002 INFO org.apache.hadoop.service.AbstractService: Service 
RMActiveServices failed in state STARTED; cause: java.io.EOFException
java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:267)
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
at 
org.apache.hadoop.security.token.delegation.DelegationKey.readFields(DelegationKey.java:110)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMDTSecretManagerState(FileSystemRMStateStore.java:346)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:199)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1007)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1048)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1044)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1044)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1084)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1221)
2016-09-12 21:32:22,008 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
Stopping ResourceManager metrics system...
2016-09-12 21:32:22,009 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
ResourceManager metrics system stopped.
2016-09-12 21:32:22,009 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
ResourceManager metrics system shutdown complete.
2016-09-12 21:32:22,010 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
AsyncDispatcher is 

[jira] [Resolved] (YARN-5619) Provide way to limit MRJob's stdout/stderr size

2016-09-14 Thread Aleksandr Balitsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Balitsky resolved YARN-5619.
--
Resolution: Duplicate

> Provide way to limit MRJob's stdout/stderr size
> ---
>
> Key: YARN-5619
> URL: https://issues.apache.org/jira/browse/YARN-5619
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.7.0
>Reporter: Aleksandr Balitsky
>Priority: Minor
>
> We can run job with huge amount of stdout/stderr and causing undesired 
> consequence. There is already a Jira which is been open for while now:
> https://issues.apache.org/jira/browse/YARN-2231
> The possible solution is to redirect Stdout's and Stderr's output to log4j in 
> YarnChild.java main method via commands:
> System.setErr( new PrintStream( new LoggingOutputStream( , 
> Level.ERROR ), true));
> System.setOut( new PrintStream( new LoggingOutputStream( , 
> Level.INFO ), true));
> In this case System.out and System.err will be redirected to log4j logger 
> with appropriate appender that will direct output to stderr or stdout files 
> with needed size limitation. 
> Advantages of such solution:
> - it allows us to restrict file sizes during job execution.
> Disadvantages:
> - It will work only for MRs jobs.
> - logs are stored in memory and are flushed on disk only after job's 
> finishing (syslog works the same way) - we can loose logs if container will 
> be killed or failed.
> Is it appropriate solution for solving this problem, or is there something 
> better?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5619) Provide way to limit MRJob's stdout/stderr size

2016-09-06 Thread Aleksandr Balitsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Balitsky updated YARN-5619:
-
Description: 
We can run job with huge amount of stdout/stderr and causing undesired 
consequence. There is already a Jira which is been open for while now:
https://issues.apache.org/jira/browse/YARN-2231

The possible solution is to redirect Stdout's and Stderr's output to log4j in 
YarnChild.java main method via commands:

System.setErr( new PrintStream( new LoggingOutputStream( , 
Level.ERROR ), true));
System.setOut( new PrintStream( new LoggingOutputStream( , 
Level.INFO ), true));

In this case System.out and System.err will be redirected to log4j logger with 
appropriate appender that will direct output to stderr or stdout files with 
needed size limitation. 


Advantages of such solution:
- it allows us to restrict file sizes during job execution.

Disadvantages:
- It will work only for MRs jobs.
- logs are stored in memory and are flushed on disk only after job's finishing 
(syslog works the same way) - we can loose logs if container will be killed or 
failed.

Is it appropriate solution for solving this problem, or is there something 
better?



  was:
We can run job with huge amount of stdout/stderr and causing undesired 
consequence. There is already a Jira which is been open for while now:
https://issues.apache.org/jira/browse/YARN-2231

The possible solution is to redirect Stdout's and Stderr's output to log4j in 
YarnChild.java main method via commands:

System.setErr( new PrintStream( new LoggingOutputStream( , 
Level.ERROR ), true));
System.setOut( new PrintStream( new LoggingOutputStream( , 
Level.INFO ), true));

In this case System.out and System.err will be redirected to log4j logger with 
appropriate appender that will direct output to stderr or stdout files with 
needed size limitation. 


Advantages of such solution:
- it allows us to restrict file sizes during job execution.
Disadvantages:
- It will work only for MRs jobs.
- logs are stored in memory and are flushed on disk only after job's finishing 
(syslog works the same way) - we can loose logs if container will be killed or 
failed.

Is it appropriate solution for solving this problem, or is there something 
better?




> Provide way to limit MRJob's stdout/stderr size
> ---
>
> Key: YARN-5619
> URL: https://issues.apache.org/jira/browse/YARN-5619
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.7.0
>Reporter: Aleksandr Balitsky
>Priority: Minor
>
> We can run job with huge amount of stdout/stderr and causing undesired 
> consequence. There is already a Jira which is been open for while now:
> https://issues.apache.org/jira/browse/YARN-2231
> The possible solution is to redirect Stdout's and Stderr's output to log4j in 
> YarnChild.java main method via commands:
> System.setErr( new PrintStream( new LoggingOutputStream( , 
> Level.ERROR ), true));
> System.setOut( new PrintStream( new LoggingOutputStream( , 
> Level.INFO ), true));
> In this case System.out and System.err will be redirected to log4j logger 
> with appropriate appender that will direct output to stderr or stdout files 
> with needed size limitation. 
> Advantages of such solution:
> - it allows us to restrict file sizes during job execution.
> Disadvantages:
> - It will work only for MRs jobs.
> - logs are stored in memory and are flushed on disk only after job's 
> finishing (syslog works the same way) - we can loose logs if container will 
> be killed or failed.
> Is it appropriate solution for solving this problem, or is there something 
> better?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5619) Provide way to limit MRJob's stdout/stderr size

2016-09-06 Thread Aleksandr Balitsky (JIRA)
Aleksandr Balitsky created YARN-5619:


 Summary: Provide way to limit MRJob's stdout/stderr size
 Key: YARN-5619
 URL: https://issues.apache.org/jira/browse/YARN-5619
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: log-aggregation, nodemanager
Affects Versions: 2.7.0
Reporter: Aleksandr Balitsky
Priority: Minor


We can run job with huge amount of stdout/stderr and causing undesired 
consequence. There is already a Jira which is been open for while now:
https://issues.apache.org/jira/browse/YARN-2231

The possible solution is to redirect Stdout's and Stderr's output to log4j in 
YarnChild.java main method via commands:

System.setErr( new PrintStream( new LoggingOutputStream( , 
Level.ERROR ), true));
System.setOut( new PrintStream( new LoggingOutputStream( , 
Level.INFO ), true));

In this case System.out and System.err will be redirected to log4j logger with 
appropriate appender that will direct output to stderr or stdout files with 
needed size limitation. 


Advantages of such solution:
- it allows us to restrict file sizes during job execution.
Disadvantages:
- It will work only for MRs jobs.
- logs are stored in memory and are flushed on disk only after job's finishing 
(syslog works the same way) - we can loose logs if container will be killed or 
failed.

Is it appropriate solution for solving this problem, or is there something 
better?





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org