[jira] [Updated] (YARN-6019) MR application fails with "No NMToken sent" exception after MRAppMaster recovery
[ https://issues.apache.org/jira/browse/YARN-6019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Balitsky updated YARN-6019: - Component/s: resourcemanager > MR application fails with "No NMToken sent" exception after MRAppMaster > recovery > > > Key: YARN-6019 > URL: https://issues.apache.org/jira/browse/YARN-6019 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, yarn >Affects Versions: 2.7.0 > Environment: Centos 7 >Reporter: Aleksandr Balitsky >Priority: Critical > Attachments: YARN-6019.001.patch > > > *Steps to reproduce:* > 1) Submit MR application (for example PI app with 50 containers) > 2) Find MRAppMaster process id for the application > 3) Kill MRAppMaster by kill -9 command > *Expected:* ResourceManager launch new MRAppMaster container and MRAppAttempt > and application finish correctly > *Actually:* After launching new MRAppMaster and MRAppAttempt the application > fails with the following exception: > {noformat} > 2016-12-22 23:17:53,929 ERROR [ContainerLauncher #9] > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container > launch failed for container_1482408247195_0002_02_11 : > org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent > for node1:43037 > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:254) > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:244) > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:395) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:361) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {noformat} > *Problem*: > When RMCommunicator sends "registerApplicationMaster" request to RM, RM > generates NMTokens for new RMAppAttempt. Those new NMTokens are transmitted > to RMCommunicator in RegisterApplicationMasterResponse > (getNMTokensFromPreviousAttempts method). But we don't handle these tokens in > RMCommunicator.register method. RM don't transmit tese tokens again for other > allocated requests, but we don't have these tokens in NMTokenCache. > Accordingly we get "No NMToken sent for node" exception. > I have found that this issue appears after changes from the > https://github.com/apache/hadoop/commit/9b272ccae78918e7d756d84920a9322187d61eed > > I tried to do the same scenario without the commit and application completed > successfully after RMAppMaster recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6019) MR application fails with "No NMToken sent" exception after MRAppMaster recovery
[ https://issues.apache.org/jira/browse/YARN-6019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15770392#comment-15770392 ] Aleksandr Balitsky commented on YARN-6019: -- I tried to save NMTokens and containers form previous attempt in RMCommunicator and checked the steps to reproduce again. Application finished successfully with no exceptions. Could somebody review my patch (001) ? > MR application fails with "No NMToken sent" exception after MRAppMaster > recovery > > > Key: YARN-6019 > URL: https://issues.apache.org/jira/browse/YARN-6019 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.0 > Environment: Centos 7 >Reporter: Aleksandr Balitsky >Priority: Critical > Attachments: YARN-6019.001.patch > > > *Steps to reproduce:* > 1) Submit MR application (for example PI app with 50 containers) > 2) Find MRAppMaster process id for the application > 3) Kill MRAppMaster by kill -9 command > *Expected:* ResourceManager launch new MRAppMaster container and MRAppAttempt > and application finish correctly > *Actually:* After launching new MRAppMaster and MRAppAttempt the application > fails with the following exception: > {noformat} > 2016-12-22 23:17:53,929 ERROR [ContainerLauncher #9] > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container > launch failed for container_1482408247195_0002_02_11 : > org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent > for node1:43037 > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:254) > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:244) > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:395) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:361) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {noformat} > *Problem*: > When RMCommunicator sends "registerApplicationMaster" request to RM, RM > generates NMTokens for new RMAppAttempt. Those new NMTokens are transmitted > to RMCommunicator in RegisterApplicationMasterResponse > (getNMTokensFromPreviousAttempts method). But we don't handle these tokens in > RMCommunicator.register method. RM don't transmit tese tokens again for other > allocated requests, but we don't have these tokens in NMTokenCache. > Accordingly we get "No NMToken sent for node" exception. > I have found that this issue appears after changes from the > https://github.com/apache/hadoop/commit/9b272ccae78918e7d756d84920a9322187d61eed > > I tried to do the same scenario without the commit and application completed > successfully after RMAppMaster recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6019) MR application fails with "No NMToken sent" exception after MRAppMaster recovery
[ https://issues.apache.org/jira/browse/YARN-6019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Balitsky updated YARN-6019: - Attachment: YARN-6019.001.patch > MR application fails with "No NMToken sent" exception after MRAppMaster > recovery > > > Key: YARN-6019 > URL: https://issues.apache.org/jira/browse/YARN-6019 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.0 > Environment: Centos 7 >Reporter: Aleksandr Balitsky >Priority: Critical > Attachments: YARN-6019.001.patch > > > *Steps to reproduce:* > 1) Submit MR application (for example PI app with 50 containers) > 2) Find MRAppMaster process id for the application > 3) Kill MRAppMaster by kill -9 command > *Expected:* ResourceManager launch new MRAppMaster container and MRAppAttempt > and application finish correctly > *Actually:* After launching new MRAppMaster and MRAppAttempt the application > fails with the following exception: > {noformat} > 2016-12-22 23:17:53,929 ERROR [ContainerLauncher #9] > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container > launch failed for container_1482408247195_0002_02_11 : > org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent > for node1:43037 > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:254) > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:244) > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:395) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:361) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {noformat} > *Problem*: > When RMCommunicator sends "registerApplicationMaster" request to RM, RM > generates NMTokens for new RMAppAttempt. Those new NMTokens are transmitted > to RMCommunicator in RegisterApplicationMasterResponse > (getNMTokensFromPreviousAttempts method). But we don't handle these tokens in > RMCommunicator.register method. RM don't transmit tese tokens again for other > allocated requests, but we don't have these tokens in NMTokenCache. > Accordingly we get "No NMToken sent for node" exception. > I have found that this issue appears after changes from the > https://github.com/apache/hadoop/commit/9b272ccae78918e7d756d84920a9322187d61eed > > I tried to do the same scenario without the commit and application completed > successfully after RMAppMaster recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6019) MR application fails with "No NMToken sent" exception after MRAppMaster recovery
Aleksandr Balitsky created YARN-6019: Summary: MR application fails with "No NMToken sent" exception after MRAppMaster recovery Key: YARN-6019 URL: https://issues.apache.org/jira/browse/YARN-6019 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Environment: Centos 7 Reporter: Aleksandr Balitsky Priority: Critical *Steps to reproduce:* 1) Submit MR application (for example PI app with 50 containers) 2) Find MRAppMaster process id for the application 3) Kill MRAppMaster by kill -9 command *Expected:* ResourceManager launch new MRAppMaster container and MRAppAttempt and application finish correctly *Actually:* After launching new MRAppMaster and MRAppAttempt the application fails with the following exception: {noformat} 2016-12-22 23:17:53,929 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1482408247195_0002_02_11 : org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent for node1:43037 at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:254) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:244) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:395) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:361) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} *Problem*: When RMCommunicator sends "registerApplicationMaster" request to RM, RM generates NMTokens for new RMAppAttempt. Those new NMTokens are transmitted to RMCommunicator in RegisterApplicationMasterResponse (getNMTokensFromPreviousAttempts method). But we don't handle these tokens in RMCommunicator.register method. RM don't transmit tese tokens again for other allocated requests, but we don't have these tokens in NMTokenCache. Accordingly we get "No NMToken sent for node" exception. I have found that this issue appears after changes from the https://github.com/apache/hadoop/commit/9b272ccae78918e7d756d84920a9322187d61eed I tried to do the same scenario without the commit and application completed successfully after RMAppMaster recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5691) RM failed Failed to load/recover state due to bad DelegationKey in RM State Store
[ https://issues.apache.org/jira/browse/YARN-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Balitsky updated YARN-5691: - Attachment: YARN_5691_v1_001_patch.patch > RM failed Failed to load/recover state due to bad DelegationKey in RM State > Store > - > > Key: YARN-5691 > URL: https://issues.apache.org/jira/browse/YARN-5691 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0, 2.7.1, 2.7.2, 2.7.3 >Reporter: Aleksandr Balitsky >Priority: Minor > Attachments: YARN_5691_v1_001_patch.patch > > > RM failed while recovery with the following error: > 2016-09-12 21:32:21,999 ERROR > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to > load/recover state > java.io.EOFException > at java.io.DataInputStream.readByte(DataInputStream.java:267) > at > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308) > at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329) > at > org.apache.hadoop.security.token.delegation.DelegationKey.readFields(DelegationKey.java:110) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMDTSecretManagerState(FileSystemRMStateStore.java:346) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:199) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1007) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1048) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1044) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1044) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1084) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1221) > 2016-09-12 21:32:22,002 INFO org.apache.hadoop.service.AbstractService: > Service RMActiveServices failed in state STARTED; cause: java.io.EOFException > java.io.EOFException > at java.io.DataInputStream.readByte(DataInputStream.java:267) > at > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308) > at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329) > at > org.apache.hadoop.security.token.delegation.DelegationKey.readFields(DelegationKey.java:110) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMDTSecretManagerState(FileSystemRMStateStore.java:346) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:199) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1007) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1048) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1044) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1044) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1084) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1221) > 2016-09-12 21:32:22,008 INFO >
[jira] [Created] (YARN-5691) RM failed Failed to load/recover state due to bad DelegationKey in RM State Store
Aleksandr Balitsky created YARN-5691: Summary: RM failed Failed to load/recover state due to bad DelegationKey in RM State Store Key: YARN-5691 URL: https://issues.apache.org/jira/browse/YARN-5691 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.3, 2.7.2, 2.7.1, 2.7.0 Reporter: Aleksandr Balitsky Priority: Minor RM failed while recovery with the following error: 2016-09-12 21:32:21,999 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to load/recover state java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:267) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308) at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329) at org.apache.hadoop.security.token.delegation.DelegationKey.readFields(DelegationKey.java:110) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMDTSecretManagerState(FileSystemRMStateStore.java:346) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:199) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1007) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1048) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1044) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1044) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1084) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1221) 2016-09-12 21:32:22,002 INFO org.apache.hadoop.service.AbstractService: Service RMActiveServices failed in state STARTED; cause: java.io.EOFException java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:267) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308) at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329) at org.apache.hadoop.security.token.delegation.DelegationKey.readFields(DelegationKey.java:110) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMDTSecretManagerState(FileSystemRMStateStore.java:346) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:199) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1007) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1048) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1044) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1044) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1084) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1221) 2016-09-12 21:32:22,008 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager metrics system... 2016-09-12 21:32:22,009 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system stopped. 2016-09-12 21:32:22,009 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system shutdown complete. 2016-09-12 21:32:22,010 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher is
[jira] [Resolved] (YARN-5619) Provide way to limit MRJob's stdout/stderr size
[ https://issues.apache.org/jira/browse/YARN-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Balitsky resolved YARN-5619. -- Resolution: Duplicate > Provide way to limit MRJob's stdout/stderr size > --- > > Key: YARN-5619 > URL: https://issues.apache.org/jira/browse/YARN-5619 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation, nodemanager >Affects Versions: 2.7.0 >Reporter: Aleksandr Balitsky >Priority: Minor > > We can run job with huge amount of stdout/stderr and causing undesired > consequence. There is already a Jira which is been open for while now: > https://issues.apache.org/jira/browse/YARN-2231 > The possible solution is to redirect Stdout's and Stderr's output to log4j in > YarnChild.java main method via commands: > System.setErr( new PrintStream( new LoggingOutputStream( , > Level.ERROR ), true)); > System.setOut( new PrintStream( new LoggingOutputStream( , > Level.INFO ), true)); > In this case System.out and System.err will be redirected to log4j logger > with appropriate appender that will direct output to stderr or stdout files > with needed size limitation. > Advantages of such solution: > - it allows us to restrict file sizes during job execution. > Disadvantages: > - It will work only for MRs jobs. > - logs are stored in memory and are flushed on disk only after job's > finishing (syslog works the same way) - we can loose logs if container will > be killed or failed. > Is it appropriate solution for solving this problem, or is there something > better? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5619) Provide way to limit MRJob's stdout/stderr size
[ https://issues.apache.org/jira/browse/YARN-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Balitsky updated YARN-5619: - Description: We can run job with huge amount of stdout/stderr and causing undesired consequence. There is already a Jira which is been open for while now: https://issues.apache.org/jira/browse/YARN-2231 The possible solution is to redirect Stdout's and Stderr's output to log4j in YarnChild.java main method via commands: System.setErr( new PrintStream( new LoggingOutputStream( , Level.ERROR ), true)); System.setOut( new PrintStream( new LoggingOutputStream( , Level.INFO ), true)); In this case System.out and System.err will be redirected to log4j logger with appropriate appender that will direct output to stderr or stdout files with needed size limitation. Advantages of such solution: - it allows us to restrict file sizes during job execution. Disadvantages: - It will work only for MRs jobs. - logs are stored in memory and are flushed on disk only after job's finishing (syslog works the same way) - we can loose logs if container will be killed or failed. Is it appropriate solution for solving this problem, or is there something better? was: We can run job with huge amount of stdout/stderr and causing undesired consequence. There is already a Jira which is been open for while now: https://issues.apache.org/jira/browse/YARN-2231 The possible solution is to redirect Stdout's and Stderr's output to log4j in YarnChild.java main method via commands: System.setErr( new PrintStream( new LoggingOutputStream( , Level.ERROR ), true)); System.setOut( new PrintStream( new LoggingOutputStream( , Level.INFO ), true)); In this case System.out and System.err will be redirected to log4j logger with appropriate appender that will direct output to stderr or stdout files with needed size limitation. Advantages of such solution: - it allows us to restrict file sizes during job execution. Disadvantages: - It will work only for MRs jobs. - logs are stored in memory and are flushed on disk only after job's finishing (syslog works the same way) - we can loose logs if container will be killed or failed. Is it appropriate solution for solving this problem, or is there something better? > Provide way to limit MRJob's stdout/stderr size > --- > > Key: YARN-5619 > URL: https://issues.apache.org/jira/browse/YARN-5619 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation, nodemanager >Affects Versions: 2.7.0 >Reporter: Aleksandr Balitsky >Priority: Minor > > We can run job with huge amount of stdout/stderr and causing undesired > consequence. There is already a Jira which is been open for while now: > https://issues.apache.org/jira/browse/YARN-2231 > The possible solution is to redirect Stdout's and Stderr's output to log4j in > YarnChild.java main method via commands: > System.setErr( new PrintStream( new LoggingOutputStream( , > Level.ERROR ), true)); > System.setOut( new PrintStream( new LoggingOutputStream( , > Level.INFO ), true)); > In this case System.out and System.err will be redirected to log4j logger > with appropriate appender that will direct output to stderr or stdout files > with needed size limitation. > Advantages of such solution: > - it allows us to restrict file sizes during job execution. > Disadvantages: > - It will work only for MRs jobs. > - logs are stored in memory and are flushed on disk only after job's > finishing (syslog works the same way) - we can loose logs if container will > be killed or failed. > Is it appropriate solution for solving this problem, or is there something > better? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5619) Provide way to limit MRJob's stdout/stderr size
Aleksandr Balitsky created YARN-5619: Summary: Provide way to limit MRJob's stdout/stderr size Key: YARN-5619 URL: https://issues.apache.org/jira/browse/YARN-5619 Project: Hadoop YARN Issue Type: Improvement Components: log-aggregation, nodemanager Affects Versions: 2.7.0 Reporter: Aleksandr Balitsky Priority: Minor We can run job with huge amount of stdout/stderr and causing undesired consequence. There is already a Jira which is been open for while now: https://issues.apache.org/jira/browse/YARN-2231 The possible solution is to redirect Stdout's and Stderr's output to log4j in YarnChild.java main method via commands: System.setErr( new PrintStream( new LoggingOutputStream( , Level.ERROR ), true)); System.setOut( new PrintStream( new LoggingOutputStream( , Level.INFO ), true)); In this case System.out and System.err will be redirected to log4j logger with appropriate appender that will direct output to stderr or stdout files with needed size limitation. Advantages of such solution: - it allows us to restrict file sizes during job execution. Disadvantages: - It will work only for MRs jobs. - logs are stored in memory and are flushed on disk only after job's finishing (syslog works the same way) - we can loose logs if container will be killed or failed. Is it appropriate solution for solving this problem, or is there something better? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org