[jira] [Updated] (YARN-2674) Distributed shell AM may re-launch containers if RM work preserving restart happens
[ https://issues.apache.org/jira/browse/YARN-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun Chen updated YARN-2674: Attachment: YARN-2674.4.patch > Distributed shell AM may re-launch containers if RM work preserving restart > happens > --- > > Key: YARN-2674 > URL: https://issues.apache.org/jira/browse/YARN-2674 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chun Chen >Assignee: Chun Chen > Attachments: YARN-2674.1.patch, YARN-2674.2.patch, YARN-2674.3.patch, > YARN-2674.4.patch > > > Currently, if RM work preserving restart happens while distributed shell is > running, distribute shell AM may re-launch all the containers, including > new/running/complete. We must make sure it won't re-launch the > running/complete containers. > We need to remove allocated containers from > AMRMClientImpl#remoteRequestsTable once AM receive them from RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2674) Distributed shell AM may re-launch containers if RM work preserving restart happens
[ https://issues.apache.org/jira/browse/YARN-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574106#comment-14574106 ] Chun Chen commented on YARN-2674: - Upload a patch to fix test failures. > Distributed shell AM may re-launch containers if RM work preserving restart > happens > --- > > Key: YARN-2674 > URL: https://issues.apache.org/jira/browse/YARN-2674 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chun Chen >Assignee: Chun Chen > Attachments: YARN-2674.1.patch, YARN-2674.2.patch, YARN-2674.3.patch, > YARN-2674.4.patch > > > Currently, if RM work preserving restart happens while distributed shell is > running, distribute shell AM may re-launch all the containers, including > new/running/complete. We must make sure it won't re-launch the > running/complete containers. > We need to remove allocated containers from > AMRMClientImpl#remoteRequestsTable once AM receive them from RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3770) SerializedException should also handle java.land.Error
Lavkesh Lahngir created YARN-3770: - Summary: SerializedException should also handle java.land.Error Key: YARN-3770 URL: https://issues.apache.org/jira/browse/YARN-3770 Project: Hadoop YARN Issue Type: Bug Reporter: Lavkesh Lahngir Assignee: Lavkesh Lahngir IN SerializedExceptionPBImpl {code} Class classType = null; if (YarnException.class.isAssignableFrom(realClass)) { classType = YarnException.class; } else if (IOException.class.isAssignableFrom(realClass)) { classType = IOException.class; } else if (RuntimeException.class.isAssignableFrom(realClass)) { classType = RuntimeException.class; } else { classType = Exception.class; } return instantiateException(realClass.asSubclass(classType), getMessage(), cause == null ? null : cause.deSerialize()); } {code} if realClass is a subclass of java.lang.Error deSerialize() throws ClassCastException. in the last else statement classType should be equal to Trowable.class instead of Exception.class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3770) SerializedException should also handle java.lang.Error
[ https://issues.apache.org/jira/browse/YARN-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lavkesh Lahngir updated YARN-3770: -- Summary: SerializedException should also handle java.lang.Error (was: SerializedException should also handle java.land.Error ) > SerializedException should also handle java.lang.Error > --- > > Key: YARN-3770 > URL: https://issues.apache.org/jira/browse/YARN-3770 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > > IN SerializedExceptionPBImpl > {code} > Class classType = null; > if (YarnException.class.isAssignableFrom(realClass)) { > classType = YarnException.class; > } else if (IOException.class.isAssignableFrom(realClass)) { > classType = IOException.class; > } else if (RuntimeException.class.isAssignableFrom(realClass)) { > classType = RuntimeException.class; > } else { > classType = Exception.class; > } > return instantiateException(realClass.asSubclass(classType), getMessage(), > cause == null ? null : cause.deSerialize()); > } > {code} > if realClass is a subclass of java.lang.Error deSerialize() throws > ClassCastException. > in the last else statement classType should be equal to Trowable.class > instead of Exception.class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2674) Distributed shell AM may re-launch containers if RM work preserving restart happens
[ https://issues.apache.org/jira/browse/YARN-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun Chen updated YARN-2674: Attachment: YARN-2674.5.patch Upload YARN-2674.5.patch to remove unnecessary synchronize. > Distributed shell AM may re-launch containers if RM work preserving restart > happens > --- > > Key: YARN-2674 > URL: https://issues.apache.org/jira/browse/YARN-2674 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chun Chen >Assignee: Chun Chen > Attachments: YARN-2674.1.patch, YARN-2674.2.patch, YARN-2674.3.patch, > YARN-2674.4.patch, YARN-2674.5.patch > > > Currently, if RM work preserving restart happens while distributed shell is > running, distribute shell AM may re-launch all the containers, including > new/running/complete. We must make sure it won't re-launch the > running/complete containers. > We need to remove allocated containers from > AMRMClientImpl#remoteRequestsTable once AM receive them from RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2674) Distributed shell AM may re-launch containers if RM work preserving restart happens
[ https://issues.apache.org/jira/browse/YARN-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574177#comment-14574177 ] Hadoop QA commented on YARN-2674: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 6s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 5 new or modified test files. | | {color:green}+1{color} | javac | 7m 35s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 17s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 2m 43s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 8m 9s | Tests passed in hadoop-yarn-applications-distributedshell. | | {color:green}+1{color} | yarn tests | 6m 3s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:green}+1{color} | yarn tests | 1m 51s | Tests passed in hadoop-yarn-server-tests. | | | | 56m 55s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-applications-distributedshell | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12737867/YARN-2674.4.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / b2540f4 | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8194/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-applications-distributedshell.html | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/8194/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8194/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-tests test log | https://builds.apache.org/job/PreCommit-YARN-Build/8194/artifact/patchprocess/testrun_hadoop-yarn-server-tests.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8194/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8194/console | This message was automatically generated. > Distributed shell AM may re-launch containers if RM work preserving restart > happens > --- > > Key: YARN-2674 > URL: https://issues.apache.org/jira/browse/YARN-2674 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chun Chen >Assignee: Chun Chen > Attachments: YARN-2674.1.patch, YARN-2674.2.patch, YARN-2674.3.patch, > YARN-2674.4.patch, YARN-2674.5.patch > > > Currently, if RM work preserving restart happens while distributed shell is > running, distribute shell AM may re-launch all the containers, including > new/running/complete. We must make sure it won't re-launch the > running/complete containers. > We need to remove allocated containers from > AMRMClientImpl#remoteRequestsTable once AM receive them from RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3017) ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID
[ https://issues.apache.org/jira/browse/YARN-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574228#comment-14574228 ] Rohith commented on YARN-3017: -- bq. Could you give a little more detail about the possibility to break the rolling upgrade? I was thinking that does it cause any issue while parsing the containerId after upgrade. Say, current container id format is container_1430441527236_0001_01_01 which is running in the NM-1, after upgrade container-id format changes container_1430441527236_0001_01_01. But NM reports running containers as container_1430441527236_0001_01_01. > ContainerID in ResourceManager Log Has Slightly Different Format From > AppAttemptID > -- > > Key: YARN-3017 > URL: https://issues.apache.org/jira/browse/YARN-3017 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.8.0 >Reporter: MUFEED USMAN >Priority: Minor > Labels: PatchAvailable > Attachments: YARN-3017.patch, YARN-3017_1.patch, YARN-3017_2.patch > > > Not sure if this should be filed as a bug or not. > In the ResourceManager log in the events surrounding the creation of a new > application attempt, > ... > ... > 2014-11-14 17:45:37,258 INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching > masterappattempt_1412150883650_0001_02 > ... > ... > The application attempt has the ID format "_1412150883650_0001_02". > Whereas the associated ContainerID goes by "_1412150883650_0001_02_". > ... > ... > 2014-11-14 17:45:37,260 INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting > up > container Container: [ContainerId: container_1412150883650_0001_02_01, > NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource: vCores:1, > disks:0.0>, Priority: 0, Token: Token { kind: ContainerToken, service: > 10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_02 > ... > ... > Curious to know if this is kept like that for a reason. If not while using > filtering tools to, say, grep events surrounding a specific attempt by the > numeric ID part information may slip out during troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3770) SerializedException should also handle java.lang.Error
[ https://issues.apache.org/jira/browse/YARN-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lavkesh Lahngir updated YARN-3770: -- Description: IN SerializedExceptionPBImpl deserialize() method {code} Class classType = null; if (YarnException.class.isAssignableFrom(realClass)) { classType = YarnException.class; } else if (IOException.class.isAssignableFrom(realClass)) { classType = IOException.class; } else if (RuntimeException.class.isAssignableFrom(realClass)) { classType = RuntimeException.class; } else { classType = Exception.class; } return instantiateException(realClass.asSubclass(classType), getMessage(), cause == null ? null : cause.deSerialize()); } {code} if realClass is a subclass of java.lang.Error deSerialize() throws ClassCastException. in the last else statement classType should be equal to Trowable.class instead of Exception.class. was: IN SerializedExceptionPBImpl {code} Class classType = null; if (YarnException.class.isAssignableFrom(realClass)) { classType = YarnException.class; } else if (IOException.class.isAssignableFrom(realClass)) { classType = IOException.class; } else if (RuntimeException.class.isAssignableFrom(realClass)) { classType = RuntimeException.class; } else { classType = Exception.class; } return instantiateException(realClass.asSubclass(classType), getMessage(), cause == null ? null : cause.deSerialize()); } {code} if realClass is a subclass of java.lang.Error deSerialize() throws ClassCastException. in the last else statement classType should be equal to Trowable.class instead of Exception.class. > SerializedException should also handle java.lang.Error > --- > > Key: YARN-3770 > URL: https://issues.apache.org/jira/browse/YARN-3770 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > > IN SerializedExceptionPBImpl deserialize() method > {code} > Class classType = null; > if (YarnException.class.isAssignableFrom(realClass)) { > classType = YarnException.class; > } else if (IOException.class.isAssignableFrom(realClass)) { > classType = IOException.class; > } else if (RuntimeException.class.isAssignableFrom(realClass)) { > classType = RuntimeException.class; > } else { > classType = Exception.class; > } > return instantiateException(realClass.asSubclass(classType), getMessage(), > cause == null ? null : cause.deSerialize()); > } > {code} > if realClass is a subclass of java.lang.Error deSerialize() throws > ClassCastException. > in the last else statement classType should be equal to Trowable.class > instead of Exception.class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2674) Distributed shell AM may re-launch containers if RM work preserving restart happens
[ https://issues.apache.org/jira/browse/YARN-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574263#comment-14574263 ] Hadoop QA commented on YARN-2674: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 54s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 5 new or modified test files. | | {color:green}+1{color} | javac | 7m 38s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 51s | The applied patch generated 1 new checkstyle issues (total was 47, now 47). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 42s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 8m 8s | Tests passed in hadoop-yarn-applications-distributedshell. | | {color:green}+1{color} | yarn tests | 6m 2s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:green}+1{color} | yarn tests | 1m 51s | Tests passed in hadoop-yarn-server-tests. | | | | 56m 48s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12737886/YARN-2674.5.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / b2540f4 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8195/artifact/patchprocess/diffcheckstylehadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/8195/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8195/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-tests test log | https://builds.apache.org/job/PreCommit-YARN-Build/8195/artifact/patchprocess/testrun_hadoop-yarn-server-tests.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8195/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8195/console | This message was automatically generated. > Distributed shell AM may re-launch containers if RM work preserving restart > happens > --- > > Key: YARN-2674 > URL: https://issues.apache.org/jira/browse/YARN-2674 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chun Chen >Assignee: Chun Chen > Attachments: YARN-2674.1.patch, YARN-2674.2.patch, YARN-2674.3.patch, > YARN-2674.4.patch, YARN-2674.5.patch > > > Currently, if RM work preserving restart happens while distributed shell is > running, distribute shell AM may re-launch all the containers, including > new/running/complete. We must make sure it won't re-launch the > running/complete containers. > We need to remove allocated containers from > AMRMClientImpl#remoteRequestsTable once AM receive them from RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3754) Race condition when the NodeManager is shutting down and container is launched
[ https://issues.apache.org/jira/browse/YARN-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574305#comment-14574305 ] Bibin A Chundatt commented on YARN-3754: [~rohithsharma] and [~sunilg] Have tried with build containing YARN-3585 and YARN-3641. org.iq80.leveldb.DBException: Closed. exception i am not able to reproduce . > Race condition when the NodeManager is shutting down and container is launched > -- > > Key: YARN-3754 > URL: https://issues.apache.org/jira/browse/YARN-3754 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Environment: Suse 11 Sp3 >Reporter: Bibin A Chundatt >Assignee: Sunil G >Priority: Critical > Attachments: NM.log > > > Container is launched and returned to ContainerImpl > NodeManager closed the DB connection which resulting in > {{org.iq80.leveldb.DBException: Closed}}. > *Attaching the exception trace* > {code} > 2015-05-30 02:11:49,122 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Unable to update state store diagnostics for > container_e310_1432817693365_3338_01_02 > java.io.IOException: org.iq80.leveldb.DBException: Closed > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.storeContainerDiagnostics(NMLeveldbStateStoreService.java:261) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$ContainerDiagnosticsUpdateTransition.transition(ContainerImpl.java:1109) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$ContainerDiagnosticsUpdateTransition.transition(ContainerImpl.java:1101) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:1129) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:83) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:246) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.iq80.leveldb.DBException: Closed > at org.fusesource.leveldbjni.internal.JniDB.put(JniDB.java:123) > at org.fusesource.leveldbjni.internal.JniDB.put(JniDB.java:106) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.storeContainerDiagnostics(NMLeveldbStateStoreService.java:259) > ... 15 more > {code} > we can add a check whether DB is closed while we move container from ACQUIRED > state. > As per the discussion in YARN-3585 have add the same -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor
[ https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574306#comment-14574306 ] Lavkesh Lahngir commented on YARN-3745: --- Uh. Yes you are right cls.getConstructor() throws SecurityException, but we don't need to declared it to be thrown. We need to only capture NoSuchMethodException. > SerializedException should also try to instantiate internal exception with > the default constructor > -- > > Key: YARN-3745 > URL: https://issues.apache.org/jira/browse/YARN-3745 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > Attachments: YARN-3745.1.patch, YARN-3745.patch > > > While deserialising a SerializedException it tries to create internal > exception in instantiateException() with cn = > cls.getConstructor(String.class). > if cls does not has a constructor with String parameter it throws > Nosuchmethodexception > for example ClosedChannelException class. > We should also try to instantiate exception with default constructor so that > inner exception can to propagated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor
[ https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574308#comment-14574308 ] Lavkesh Lahngir commented on YARN-3745: --- sorry, typo: we don't need to declare it to be thrown. > SerializedException should also try to instantiate internal exception with > the default constructor > -- > > Key: YARN-3745 > URL: https://issues.apache.org/jira/browse/YARN-3745 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > Attachments: YARN-3745.1.patch, YARN-3745.patch > > > While deserialising a SerializedException it tries to create internal > exception in instantiateException() with cn = > cls.getConstructor(String.class). > if cls does not has a constructor with String parameter it throws > Nosuchmethodexception > for example ClosedChannelException class. > We should also try to instantiate exception with default constructor so that > inner exception can to propagated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor
[ https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lavkesh Lahngir updated YARN-3745: -- Attachment: YARN-3745.2.patch > SerializedException should also try to instantiate internal exception with > the default constructor > -- > > Key: YARN-3745 > URL: https://issues.apache.org/jira/browse/YARN-3745 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > Attachments: YARN-3745.1.patch, YARN-3745.2.patch, YARN-3745.patch > > > While deserialising a SerializedException it tries to create internal > exception in instantiateException() with cn = > cls.getConstructor(String.class). > if cls does not has a constructor with String parameter it throws > Nosuchmethodexception > for example ClosedChannelException class. > We should also try to instantiate exception with default constructor so that > inner exception can to propagated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor
[ https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574332#comment-14574332 ] Lavkesh Lahngir commented on YARN-3745: --- deSerialize() method throws class ClassNotFoundException which is wrapped in YarnRuntimeException if there would be some class loading issues. (Other tests also has it) No other exception should be thrown for the test to be passed. > SerializedException should also try to instantiate internal exception with > the default constructor > -- > > Key: YARN-3745 > URL: https://issues.apache.org/jira/browse/YARN-3745 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > Attachments: YARN-3745.1.patch, YARN-3745.2.patch, YARN-3745.patch > > > While deserialising a SerializedException it tries to create internal > exception in instantiateException() with cn = > cls.getConstructor(String.class). > if cls does not has a constructor with String parameter it throws > Nosuchmethodexception > for example ClosedChannelException class. > We should also try to instantiate exception with default constructor so that > inner exception can to propagated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3766) ATS Web UI breaks because of YARN-3467
[ https://issues.apache.org/jira/browse/YARN-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574347#comment-14574347 ] Hudson commented on YARN-3766: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #219 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/219/]) YARN-3766. Fixed the apps table column error of generic history web UI. Contributed by Xuan Gong. (zjshen: rev 18dd01d6bf67f4d522b947454c1f4347d1cbbc19) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSView.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java > ATS Web UI breaks because of YARN-3467 > -- > > Key: YARN-3766 > URL: https://issues.apache.org/jira/browse/YARN-3766 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, webapp, yarn >Affects Versions: 2.8.0 >Reporter: Xuan Gong >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.8.0 > > Attachments: ATSWebPageBreaks.png, YARN-3766.1.patch > > > The ATS web UI breaks because of the following changes made in YARN-3467. > {code} > +++ > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java > @@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs( >.append(", 'mRender': renderHadoopDate }") >.append("\n, {'sType':'numeric', bSearchable:false, 'aTargets':"); > if (isFairSchedulerPage) { > - sb.append("[11]"); > + sb.append("[13]"); > } else if (isResourceManager) { > - sb.append("[10]"); > + sb.append("[12]"); > } else { >sb.append("[9]"); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3764) CapacityScheduler should forbid moving LeafQueue from one parent to another
[ https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574343#comment-14574343 ] Hudson commented on YARN-3764: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #219 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/219/]) YARN-3764. CapacityScheduler should forbid moving LeafQueue from one parent to another. Contributed by Wangda Tan (jianhe: rev 6ad4e59cfc111a92747fdb1fb99cc6378044832a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java > CapacityScheduler should forbid moving LeafQueue from one parent to another > --- > > Key: YARN-3764 > URL: https://issues.apache.org/jira/browse/YARN-3764 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Fix For: 2.7.1 > > Attachments: YARN-3764.1.patch > > > Currently CapacityScheduler doesn't handle the case well, for example: > A queue structure: > {code} > root > | > a (100) > / \ >x y > (50) (50) > {code} > And reinitialize using following structure: > {code} > root > / \ > (50)a x (50) > | > y >(100) > {code} > The actual queue structure after reinitialize is: > {code} > root > /\ >a (50) x (50) > / \ > xy > (50) (100) > {code} > We should forbid admin doing that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2392) add more diags about app retry limits on AM failures
[ https://issues.apache.org/jira/browse/YARN-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574350#comment-14574350 ] Hudson commented on YARN-2392: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #219 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/219/]) YARN-2392. Add more diags about app retry limits on AM failures. Contributed by Steve Loughran (jianhe: rev 1970ca7cbcdb7efa160d0cedc2e3e22c1401fad6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/CHANGES.txt > add more diags about app retry limits on AM failures > > > Key: YARN-2392 > URL: https://issues.apache.org/jira/browse/YARN-2392 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-2392-001.patch, YARN-2392-002.patch, > YARN-2392-002.patch > > > # when an app fails the failure count is shown, but not what the global + > local limits are. If the two are different, they should both be printed. > # the YARN-2242 strings don't have enough whitespace between text and the URL -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574348#comment-14574348 ] Hudson commented on YARN-41: FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #219 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/219/]) YARN-41. The RM should handle the graceful shutdown of the NM. Contributed by Devaraj K. (junping_du: rev d7e7f6aa03c67b6a6ccf664adcb06d90bc963e58) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UnRegisterNodeManagerResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ClusterMetricsInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UnRegisterNodeManagerRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestResourceTrackerPBClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UnRegisterNodeManagerRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/ResourceTracker.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeEventType.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UnRegisterNodeManagerResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeState.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/LocalRMInterface.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYSCRPCFactories.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/ResourceTracker.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/s
[jira] [Commented] (YARN-3733) Fix DominantRC#compare() does not work as expected if cluster resource is empty
[ https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574353#comment-14574353 ] Hudson commented on YARN-3733: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #219 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/219/]) YARN-3733. Fix DominantRC#compare() does not work as expected if cluster resource is empty. (Rohith Sharmaks via wangda) (wangda: rev ebd797c48fe236b404cf3a125ac9d1f7714e291e) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/DominantResourceCalculator.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java Add missing test file of YARN-3733 (wangda: rev 405bbcf68c32d8fd8a83e46e686eacd14e5a533c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/resource/TestResourceCalculator.java > Fix DominantRC#compare() does not work as expected if cluster resource is > empty > --- > > Key: YARN-3733 > URL: https://issues.apache.org/jira/browse/YARN-3733 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Suse 11 Sp3 , 2 NM , 2 RM > one NM - 3 GB 6 v core >Reporter: Bibin A Chundatt >Assignee: Rohith >Priority: Blocker > Fix For: 2.7.1 > > Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, > 0002-YARN-3733.patch, YARN-3733.patch > > > Steps to reproduce > = > 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster) > 2. Configure map and reduce size to 512 MB after changing scheduler minimum > size to 512 MB > 3. Configure capacity scheduler and AM limit to .5 > (DominantResourceCalculator is configured) > 4. Submit 30 concurrent task > 5. Switch RM > Actual > = > For 12 Jobs AM gets allocated and all 12 starts running > No other Yarn child is initiated , *all 12 Jobs in Running state for ever* > Expected > === > Only 6 should be running at a time since max AM allocated is .5 (3072 MB) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3770) SerializedException should also handle java.lang.Error
[ https://issues.apache.org/jira/browse/YARN-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lavkesh Lahngir updated YARN-3770: -- Attachment: YARN-3770.patch > SerializedException should also handle java.lang.Error > --- > > Key: YARN-3770 > URL: https://issues.apache.org/jira/browse/YARN-3770 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > Attachments: YARN-3770.patch > > > IN SerializedExceptionPBImpl deserialize() method > {code} > Class classType = null; > if (YarnException.class.isAssignableFrom(realClass)) { > classType = YarnException.class; > } else if (IOException.class.isAssignableFrom(realClass)) { > classType = IOException.class; > } else if (RuntimeException.class.isAssignableFrom(realClass)) { > classType = RuntimeException.class; > } else { > classType = Exception.class; > } > return instantiateException(realClass.asSubclass(classType), getMessage(), > cause == null ? null : cause.deSerialize()); > } > {code} > if realClass is a subclass of java.lang.Error deSerialize() throws > ClassCastException. > in the last else statement classType should be equal to Trowable.class > instead of Exception.class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3771) "final" behavior is not honored for YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH since it is a String[]
[ https://issues.apache.org/jira/browse/YARN-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel moved HDFS-8526 to YARN-3771: --- Key: YARN-3771 (was: HDFS-8526) Project: Hadoop YARN (was: Hadoop HDFS) > "final" behavior is not honored for > YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH since it is a String[] > > > Key: YARN-3771 > URL: https://issues.apache.org/jira/browse/YARN-3771 > Project: Hadoop YARN > Issue Type: Bug >Reporter: nijel >Assignee: nijel > > i was going through some find bugs rules. One issue reported in that is > public static final String[] DEFAULT_YARN_APPLICATION_CLASSPATH = { > and > public static final String[] > DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH= > is not honoring the final qualifier. The string array contents can be re > assigned ! > Simple test > {code} > public class TestClass { > static final String[] t = { "1", "2" }; > public static void main(String[] args) { > System.out.println(12 < 10); > String[] t1={"u"}; > //t = t1; // this will show compilation error > t (1) = t1 (1) ; // But this works > } > } > {code} > One option is to use Collections.unmodifiableList > any thoughts ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2392) add more diags about app retry limits on AM failures
[ https://issues.apache.org/jira/browse/YARN-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574388#comment-14574388 ] Hudson commented on YARN-2392: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #949 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/949/]) YARN-2392. Add more diags about app retry limits on AM failures. Contributed by Steve Loughran (jianhe: rev 1970ca7cbcdb7efa160d0cedc2e3e22c1401fad6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/CHANGES.txt > add more diags about app retry limits on AM failures > > > Key: YARN-2392 > URL: https://issues.apache.org/jira/browse/YARN-2392 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-2392-001.patch, YARN-2392-002.patch, > YARN-2392-002.patch > > > # when an app fails the failure count is shown, but not what the global + > local limits are. If the two are different, they should both be printed. > # the YARN-2242 strings don't have enough whitespace between text and the URL -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3766) ATS Web UI breaks because of YARN-3467
[ https://issues.apache.org/jira/browse/YARN-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574385#comment-14574385 ] Hudson commented on YARN-3766: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #949 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/949/]) YARN-3766. Fixed the apps table column error of generic history web UI. Contributed by Xuan Gong. (zjshen: rev 18dd01d6bf67f4d522b947454c1f4347d1cbbc19) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSView.java > ATS Web UI breaks because of YARN-3467 > -- > > Key: YARN-3766 > URL: https://issues.apache.org/jira/browse/YARN-3766 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, webapp, yarn >Affects Versions: 2.8.0 >Reporter: Xuan Gong >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.8.0 > > Attachments: ATSWebPageBreaks.png, YARN-3766.1.patch > > > The ATS web UI breaks because of the following changes made in YARN-3467. > {code} > +++ > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java > @@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs( >.append(", 'mRender': renderHadoopDate }") >.append("\n, {'sType':'numeric', bSearchable:false, 'aTargets':"); > if (isFairSchedulerPage) { > - sb.append("[11]"); > + sb.append("[13]"); > } else if (isResourceManager) { > - sb.append("[10]"); > + sb.append("[12]"); > } else { >sb.append("[9]"); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3733) Fix DominantRC#compare() does not work as expected if cluster resource is empty
[ https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574391#comment-14574391 ] Hudson commented on YARN-3733: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #949 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/949/]) YARN-3733. Fix DominantRC#compare() does not work as expected if cluster resource is empty. (Rohith Sharmaks via wangda) (wangda: rev ebd797c48fe236b404cf3a125ac9d1f7714e291e) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/DominantResourceCalculator.java Add missing test file of YARN-3733 (wangda: rev 405bbcf68c32d8fd8a83e46e686eacd14e5a533c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/resource/TestResourceCalculator.java > Fix DominantRC#compare() does not work as expected if cluster resource is > empty > --- > > Key: YARN-3733 > URL: https://issues.apache.org/jira/browse/YARN-3733 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Suse 11 Sp3 , 2 NM , 2 RM > one NM - 3 GB 6 v core >Reporter: Bibin A Chundatt >Assignee: Rohith >Priority: Blocker > Fix For: 2.7.1 > > Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, > 0002-YARN-3733.patch, YARN-3733.patch > > > Steps to reproduce > = > 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster) > 2. Configure map and reduce size to 512 MB after changing scheduler minimum > size to 512 MB > 3. Configure capacity scheduler and AM limit to .5 > (DominantResourceCalculator is configured) > 4. Submit 30 concurrent task > 5. Switch RM > Actual > = > For 12 Jobs AM gets allocated and all 12 starts running > No other Yarn child is initiated , *all 12 Jobs in Running state for ever* > Expected > === > Only 6 should be running at a time since max AM allocated is .5 (3072 MB) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3764) CapacityScheduler should forbid moving LeafQueue from one parent to another
[ https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574381#comment-14574381 ] Hudson commented on YARN-3764: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #949 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/949/]) YARN-3764. CapacityScheduler should forbid moving LeafQueue from one parent to another. Contributed by Wangda Tan (jianhe: rev 6ad4e59cfc111a92747fdb1fb99cc6378044832a) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java > CapacityScheduler should forbid moving LeafQueue from one parent to another > --- > > Key: YARN-3764 > URL: https://issues.apache.org/jira/browse/YARN-3764 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Fix For: 2.7.1 > > Attachments: YARN-3764.1.patch > > > Currently CapacityScheduler doesn't handle the case well, for example: > A queue structure: > {code} > root > | > a (100) > / \ >x y > (50) (50) > {code} > And reinitialize using following structure: > {code} > root > / \ > (50)a x (50) > | > y >(100) > {code} > The actual queue structure after reinitialize is: > {code} > root > /\ >a (50) x (50) > / \ > xy > (50) (100) > {code} > We should forbid admin doing that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574386#comment-14574386 ] Hudson commented on YARN-41: SUCCESS: Integrated in Hadoop-Yarn-trunk #949 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/949/]) YARN-41. The RM should handle the graceful shutdown of the NM. Contributed by Devaraj K. (junping_du: rev d7e7f6aa03c67b6a6ccf664adcb06d90bc963e58) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeEventType.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/LocalRMInterface.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClusterMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestResourceTrackerPBClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/ResourceTracker.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceTrackerPBServiceImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/ResourceTracker.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UnRegisterNodeManagerRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ClusterMetricsInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYSCRPCFactories.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UnRegisterNodeManagerRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/client/ResourceTrackerPBClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apa
[jira] [Commented] (YARN-3758) The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not working as expected in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574393#comment-14574393 ] Rohith commented on YARN-3758: -- All these confusion should be solved probably after YARN-2986. This issue can be raised there whether they will be handling it. > The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not > working as expected in FairScheduler > > > Key: YARN-3758 > URL: https://issues.apache.org/jira/browse/YARN-3758 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: skrho > > Hello there~~ > I have 2 clusters > First cluster is 5 node , default 1 application queue, Capacity scheduler, 8G > Physical memory each node > Second cluster is 10 node, 2 application queuey, fair-scheduler, 230G > Physical memory each node > Wherever a mapreduce job is running, I want resourcemanager is to set the > minimum memory 256m to container > So I was changing configuration in yarn-site.xml & mapred-site.xml > yarn.scheduler.minimum-allocation-mb : 256 > mapreduce.map.java.opts : -Xms256m > mapreduce.reduce.java.opts : -Xms256m > mapreduce.map.memory.mb : 256 > mapreduce.reduce.memory.mb : 256 > In First cluster whenever a mapreduce job is running , I can see used memory > 256m in web console( http://installedIP:8088/cluster/nodes ) > But In Second cluster whenever a mapreduce job is running , I can see used > memory 1024m in web console( http://installedIP:8088/cluster/nodes ) > I know default memory value is 1024m, so if there is not changing memory > setting, the default value is working. > I have been testing for two weeks, but I don't know why mimimum memory > setting is not working in second cluster > Why this difference is happened? > Am I wrong setting configuration? > or Is there bug? > Thank you for reading~~ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3771) "final" behavior is not honored for YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH since it is a String[]
[ https://issues.apache.org/jira/browse/YARN-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-3771: Attachment: 0001-YARN-3771.patch Attached the patch. Please review > "final" behavior is not honored for > YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH since it is a String[] > > > Key: YARN-3771 > URL: https://issues.apache.org/jira/browse/YARN-3771 > Project: Hadoop YARN > Issue Type: Bug >Reporter: nijel >Assignee: nijel > Attachments: 0001-YARN-3771.patch > > > i was going through some find bugs rules. One issue reported in that is > public static final String[] DEFAULT_YARN_APPLICATION_CLASSPATH = { > and > public static final String[] > DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH= > is not honoring the final qualifier. The string array contents can be re > assigned ! > Simple test > {code} > public class TestClass { > static final String[] t = { "1", "2" }; > public static void main(String[] args) { > System.out.println(12 < 10); > String[] t1={"u"}; > //t = t1; // this will show compilation error > t (1) = t1 (1) ; // But this works > } > } > {code} > One option is to use Collections.unmodifiableList > any thoughts ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor
[ https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574402#comment-14574402 ] Hadoop QA commented on YARN-3745: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 21m 39s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 9m 46s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 12m 2s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 30s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 10s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 37s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 31s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 32s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 57s | Tests passed in hadoop-yarn-common. | | | | 50m 48s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12737915/YARN-3745.2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 790a861 | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8196/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8196/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8196/console | This message was automatically generated. > SerializedException should also try to instantiate internal exception with > the default constructor > -- > > Key: YARN-3745 > URL: https://issues.apache.org/jira/browse/YARN-3745 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > Attachments: YARN-3745.1.patch, YARN-3745.2.patch, YARN-3745.patch > > > While deserialising a SerializedException it tries to create internal > exception in instantiateException() with cn = > cls.getConstructor(String.class). > if cls does not has a constructor with String parameter it throws > Nosuchmethodexception > for example ClosedChannelException class. > We should also try to instantiate exception with default constructor so that > inner exception can to propagated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor
[ https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574421#comment-14574421 ] Lavkesh Lahngir commented on YARN-3745: --- [~zxu] Sorry my bad. It *must* throw ClassNotFoundException because there was no call to pb.init(cause); > SerializedException should also try to instantiate internal exception with > the default constructor > -- > > Key: YARN-3745 > URL: https://issues.apache.org/jira/browse/YARN-3745 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > Attachments: YARN-3745.1.patch, YARN-3745.2.patch, YARN-3745.patch > > > While deserialising a SerializedException it tries to create internal > exception in instantiateException() with cn = > cls.getConstructor(String.class). > if cls does not has a constructor with String parameter it throws > Nosuchmethodexception > for example ClosedChannelException class. > We should also try to instantiate exception with default constructor so that > inner exception can to propagated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3770) SerializedException should also handle java.lang.Error
[ https://issues.apache.org/jira/browse/YARN-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574439#comment-14574439 ] Hadoop QA commented on YARN-3770: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 1s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 22s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 45s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 52s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 31s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 31s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 55s | Tests passed in hadoop-yarn-common. | | | | 39m 58s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12737919/YARN-3770.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 790a861 | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8197/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8197/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8197/console | This message was automatically generated. > SerializedException should also handle java.lang.Error > --- > > Key: YARN-3770 > URL: https://issues.apache.org/jira/browse/YARN-3770 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > Attachments: YARN-3770.patch > > > IN SerializedExceptionPBImpl deserialize() method > {code} > Class classType = null; > if (YarnException.class.isAssignableFrom(realClass)) { > classType = YarnException.class; > } else if (IOException.class.isAssignableFrom(realClass)) { > classType = IOException.class; > } else if (RuntimeException.class.isAssignableFrom(realClass)) { > classType = RuntimeException.class; > } else { > classType = Exception.class; > } > return instantiateException(realClass.asSubclass(classType), getMessage(), > cause == null ? null : cause.deSerialize()); > } > {code} > if realClass is a subclass of java.lang.Error deSerialize() throws > ClassCastException. > in the last else statement classType should be equal to Trowable.class > instead of Exception.class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3259) FairScheduler: Update to fairShare could be triggered early on node events instead of waiting for update interval
[ https://issues.apache.org/jira/browse/YARN-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3259: Attachment: YARN-3259.003.patch Addressed feedback > FairScheduler: Update to fairShare could be triggered early on node events > instead of waiting for update interval > -- > > Key: YARN-3259 > URL: https://issues.apache.org/jira/browse/YARN-3259 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3259.001.patch, YARN-3259.002.patch, > YARN-3259.003.patch > > > Instead of waiting for update interval unconditionally, we can trigger early > updates on important events - for eg node join and leave. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3772) Let AMs to change the name of the application upon RM registration
Zoltán Zvara created YARN-3772: -- Summary: Let AMs to change the name of the application upon RM registration Key: YARN-3772 URL: https://issues.apache.org/jira/browse/YARN-3772 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Zoltán Zvara Many applications like to set their name in their own way with their own API internally, but also want to display that name on YARN. Therefore it is not always possible to know the name of the application during submission. YARN should let AMs to change their name on - at least - the first time they register with the RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3764) CapacityScheduler should forbid moving LeafQueue from one parent to another
[ https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574563#comment-14574563 ] Hudson commented on YARN-3764: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2147 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2147/]) YARN-3764. CapacityScheduler should forbid moving LeafQueue from one parent to another. Contributed by Wangda Tan (jianhe: rev 6ad4e59cfc111a92747fdb1fb99cc6378044832a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/CHANGES.txt > CapacityScheduler should forbid moving LeafQueue from one parent to another > --- > > Key: YARN-3764 > URL: https://issues.apache.org/jira/browse/YARN-3764 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Fix For: 2.7.1 > > Attachments: YARN-3764.1.patch > > > Currently CapacityScheduler doesn't handle the case well, for example: > A queue structure: > {code} > root > | > a (100) > / \ >x y > (50) (50) > {code} > And reinitialize using following structure: > {code} > root > / \ > (50)a x (50) > | > y >(100) > {code} > The actual queue structure after reinitialize is: > {code} > root > /\ >a (50) x (50) > / \ > xy > (50) (100) > {code} > We should forbid admin doing that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2392) add more diags about app retry limits on AM failures
[ https://issues.apache.org/jira/browse/YARN-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574571#comment-14574571 ] Hudson commented on YARN-2392: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2147 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2147/]) YARN-2392. Add more diags about app retry limits on AM failures. Contributed by Steve Loughran (jianhe: rev 1970ca7cbcdb7efa160d0cedc2e3e22c1401fad6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/CHANGES.txt > add more diags about app retry limits on AM failures > > > Key: YARN-2392 > URL: https://issues.apache.org/jira/browse/YARN-2392 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-2392-001.patch, YARN-2392-002.patch, > YARN-2392-002.patch > > > # when an app fails the failure count is shown, but not what the global + > local limits are. If the two are different, they should both be printed. > # the YARN-2242 strings don't have enough whitespace between text and the URL -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574569#comment-14574569 ] Hudson commented on YARN-41: SUCCESS: Integrated in Hadoop-Hdfs-trunk #2147 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2147/]) YARN-41. The RM should handle the graceful shutdown of the NM. Contributed by Devaraj K. (junping_du: rev d7e7f6aa03c67b6a6ccf664adcb06d90bc963e58) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/ResourceTracker.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestResourceTrackerPBClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UnRegisterNodeManagerResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeEventType.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYSCRPCFactories.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceTrackerPBServiceImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeState.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ClusterMetricsInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UnRegisterNodeManagerRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UnRegisterNodeManagerRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UnRegisterNodeManagerResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClusterMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn
[jira] [Commented] (YARN-3766) ATS Web UI breaks because of YARN-3467
[ https://issues.apache.org/jira/browse/YARN-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574567#comment-14574567 ] Hudson commented on YARN-3766: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2147 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2147/]) YARN-3766. Fixed the apps table column error of generic history web UI. Contributed by Xuan Gong. (zjshen: rev 18dd01d6bf67f4d522b947454c1f4347d1cbbc19) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSView.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java > ATS Web UI breaks because of YARN-3467 > -- > > Key: YARN-3766 > URL: https://issues.apache.org/jira/browse/YARN-3766 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, webapp, yarn >Affects Versions: 2.8.0 >Reporter: Xuan Gong >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.8.0 > > Attachments: ATSWebPageBreaks.png, YARN-3766.1.patch > > > The ATS web UI breaks because of the following changes made in YARN-3467. > {code} > +++ > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java > @@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs( >.append(", 'mRender': renderHadoopDate }") >.append("\n, {'sType':'numeric', bSearchable:false, 'aTargets':"); > if (isFairSchedulerPage) { > - sb.append("[11]"); > + sb.append("[13]"); > } else if (isResourceManager) { > - sb.append("[10]"); > + sb.append("[12]"); > } else { >sb.append("[9]"); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3733) Fix DominantRC#compare() does not work as expected if cluster resource is empty
[ https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574574#comment-14574574 ] Hudson commented on YARN-3733: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2147 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2147/]) YARN-3733. Fix DominantRC#compare() does not work as expected if cluster resource is empty. (Rohith Sharmaks via wangda) (wangda: rev ebd797c48fe236b404cf3a125ac9d1f7714e291e) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/DominantResourceCalculator.java Add missing test file of YARN-3733 (wangda: rev 405bbcf68c32d8fd8a83e46e686eacd14e5a533c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/resource/TestResourceCalculator.java > Fix DominantRC#compare() does not work as expected if cluster resource is > empty > --- > > Key: YARN-3733 > URL: https://issues.apache.org/jira/browse/YARN-3733 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Suse 11 Sp3 , 2 NM , 2 RM > one NM - 3 GB 6 v core >Reporter: Bibin A Chundatt >Assignee: Rohith >Priority: Blocker > Fix For: 2.7.1 > > Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, > 0002-YARN-3733.patch, YARN-3733.patch > > > Steps to reproduce > = > 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster) > 2. Configure map and reduce size to 512 MB after changing scheduler minimum > size to 512 MB > 3. Configure capacity scheduler and AM limit to .5 > (DominantResourceCalculator is configured) > 4. Submit 30 concurrent task > 5. Switch RM > Actual > = > For 12 Jobs AM gets allocated and all 12 starts running > No other Yarn child is initiated , *all 12 Jobs in Running state for ever* > Expected > === > Only 6 should be running at a time since max AM allocated is .5 (3072 MB) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3733) Fix DominantRC#compare() does not work as expected if cluster resource is empty
[ https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574607#comment-14574607 ] Hudson commented on YARN-3733: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #208 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/208/]) YARN-3733. Fix DominantRC#compare() does not work as expected if cluster resource is empty. (Rohith Sharmaks via wangda) (wangda: rev ebd797c48fe236b404cf3a125ac9d1f7714e291e) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/DominantResourceCalculator.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java Add missing test file of YARN-3733 (wangda: rev 405bbcf68c32d8fd8a83e46e686eacd14e5a533c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/resource/TestResourceCalculator.java > Fix DominantRC#compare() does not work as expected if cluster resource is > empty > --- > > Key: YARN-3733 > URL: https://issues.apache.org/jira/browse/YARN-3733 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Suse 11 Sp3 , 2 NM , 2 RM > one NM - 3 GB 6 v core >Reporter: Bibin A Chundatt >Assignee: Rohith >Priority: Blocker > Fix For: 2.7.1 > > Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, > 0002-YARN-3733.patch, YARN-3733.patch > > > Steps to reproduce > = > 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster) > 2. Configure map and reduce size to 512 MB after changing scheduler minimum > size to 512 MB > 3. Configure capacity scheduler and AM limit to .5 > (DominantResourceCalculator is configured) > 4. Submit 30 concurrent task > 5. Switch RM > Actual > = > For 12 Jobs AM gets allocated and all 12 starts running > No other Yarn child is initiated , *all 12 Jobs in Running state for ever* > Expected > === > Only 6 should be running at a time since max AM allocated is .5 (3072 MB) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3764) CapacityScheduler should forbid moving LeafQueue from one parent to another
[ https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574596#comment-14574596 ] Hudson commented on YARN-3764: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #208 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/208/]) YARN-3764. CapacityScheduler should forbid moving LeafQueue from one parent to another. Contributed by Wangda Tan (jianhe: rev 6ad4e59cfc111a92747fdb1fb99cc6378044832a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java > CapacityScheduler should forbid moving LeafQueue from one parent to another > --- > > Key: YARN-3764 > URL: https://issues.apache.org/jira/browse/YARN-3764 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Fix For: 2.7.1 > > Attachments: YARN-3764.1.patch > > > Currently CapacityScheduler doesn't handle the case well, for example: > A queue structure: > {code} > root > | > a (100) > / \ >x y > (50) (50) > {code} > And reinitialize using following structure: > {code} > root > / \ > (50)a x (50) > | > y >(100) > {code} > The actual queue structure after reinitialize is: > {code} > root > /\ >a (50) x (50) > / \ > xy > (50) (100) > {code} > We should forbid admin doing that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2392) add more diags about app retry limits on AM failures
[ https://issues.apache.org/jira/browse/YARN-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574604#comment-14574604 ] Hudson commented on YARN-2392: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #208 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/208/]) YARN-2392. Add more diags about app retry limits on AM failures. Contributed by Steve Loughran (jianhe: rev 1970ca7cbcdb7efa160d0cedc2e3e22c1401fad6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java > add more diags about app retry limits on AM failures > > > Key: YARN-2392 > URL: https://issues.apache.org/jira/browse/YARN-2392 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-2392-001.patch, YARN-2392-002.patch, > YARN-2392-002.patch > > > # when an app fails the failure count is shown, but not what the global + > local limits are. If the two are different, they should both be printed. > # the YARN-2242 strings don't have enough whitespace between text and the URL -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574602#comment-14574602 ] Hudson commented on YARN-41: FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #208 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/208/]) YARN-41. The RM should handle the graceful shutdown of the NM. Contributed by Devaraj K. (junping_du: rev d7e7f6aa03c67b6a6ccf664adcb06d90bc963e58) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UnRegisterNodeManagerResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UnRegisterNodeManagerResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/ResourceTracker.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeState.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeEventType.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UnRegisterNodeManagerRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/LocalRMInterface.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/client/ResourceTrackerPBClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestResourceTrackerPBClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ClusterMetricsInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceTrackerPBServiceImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java * hadoop-yarn-project/hadoop-yar
[jira] [Commented] (YARN-3766) ATS Web UI breaks because of YARN-3467
[ https://issues.apache.org/jira/browse/YARN-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574600#comment-14574600 ] Hudson commented on YARN-3766: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #208 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/208/]) YARN-3766. Fixed the apps table column error of generic history web UI. Contributed by Xuan Gong. (zjshen: rev 18dd01d6bf67f4d522b947454c1f4347d1cbbc19) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSView.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java > ATS Web UI breaks because of YARN-3467 > -- > > Key: YARN-3766 > URL: https://issues.apache.org/jira/browse/YARN-3766 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, webapp, yarn >Affects Versions: 2.8.0 >Reporter: Xuan Gong >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.8.0 > > Attachments: ATSWebPageBreaks.png, YARN-3766.1.patch > > > The ATS web UI breaks because of the following changes made in YARN-3467. > {code} > +++ > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java > @@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs( >.append(", 'mRender': renderHadoopDate }") >.append("\n, {'sType':'numeric', bSearchable:false, 'aTargets':"); > if (isFairSchedulerPage) { > - sb.append("[11]"); > + sb.append("[13]"); > } else if (isResourceManager) { > - sb.append("[10]"); > + sb.append("[12]"); > } else { >sb.append("[9]"); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3733) Fix DominantRC#compare() does not work as expected if cluster resource is empty
[ https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574630#comment-14574630 ] Hudson commented on YARN-3733: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2165 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2165/]) YARN-3733. Fix DominantRC#compare() does not work as expected if cluster resource is empty. (Rohith Sharmaks via wangda) (wangda: rev ebd797c48fe236b404cf3a125ac9d1f7714e291e) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/DominantResourceCalculator.java Add missing test file of YARN-3733 (wangda: rev 405bbcf68c32d8fd8a83e46e686eacd14e5a533c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/resource/TestResourceCalculator.java > Fix DominantRC#compare() does not work as expected if cluster resource is > empty > --- > > Key: YARN-3733 > URL: https://issues.apache.org/jira/browse/YARN-3733 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Suse 11 Sp3 , 2 NM , 2 RM > one NM - 3 GB 6 v core >Reporter: Bibin A Chundatt >Assignee: Rohith >Priority: Blocker > Fix For: 2.7.1 > > Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, > 0002-YARN-3733.patch, YARN-3733.patch > > > Steps to reproduce > = > 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster) > 2. Configure map and reduce size to 512 MB after changing scheduler minimum > size to 512 MB > 3. Configure capacity scheduler and AM limit to .5 > (DominantResourceCalculator is configured) > 4. Submit 30 concurrent task > 5. Switch RM > Actual > = > For 12 Jobs AM gets allocated and all 12 starts running > No other Yarn child is initiated , *all 12 Jobs in Running state for ever* > Expected > === > Only 6 should be running at a time since max AM allocated is .5 (3072 MB) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2392) add more diags about app retry limits on AM failures
[ https://issues.apache.org/jira/browse/YARN-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574627#comment-14574627 ] Hudson commented on YARN-2392: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2165 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2165/]) YARN-2392. Add more diags about app retry limits on AM failures. Contributed by Steve Loughran (jianhe: rev 1970ca7cbcdb7efa160d0cedc2e3e22c1401fad6) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java > add more diags about app retry limits on AM failures > > > Key: YARN-2392 > URL: https://issues.apache.org/jira/browse/YARN-2392 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-2392-001.patch, YARN-2392-002.patch, > YARN-2392-002.patch > > > # when an app fails the failure count is shown, but not what the global + > local limits are. If the two are different, they should both be printed. > # the YARN-2242 strings don't have enough whitespace between text and the URL -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3766) ATS Web UI breaks because of YARN-3467
[ https://issues.apache.org/jira/browse/YARN-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574624#comment-14574624 ] Hudson commented on YARN-3766: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2165 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2165/]) YARN-3766. Fixed the apps table column error of generic history web UI. Contributed by Xuan Gong. (zjshen: rev 18dd01d6bf67f4d522b947454c1f4347d1cbbc19) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSView.java > ATS Web UI breaks because of YARN-3467 > -- > > Key: YARN-3766 > URL: https://issues.apache.org/jira/browse/YARN-3766 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, webapp, yarn >Affects Versions: 2.8.0 >Reporter: Xuan Gong >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.8.0 > > Attachments: ATSWebPageBreaks.png, YARN-3766.1.patch > > > The ATS web UI breaks because of the following changes made in YARN-3467. > {code} > +++ > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java > @@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs( >.append(", 'mRender': renderHadoopDate }") >.append("\n, {'sType':'numeric', bSearchable:false, 'aTargets':"); > if (isFairSchedulerPage) { > - sb.append("[11]"); > + sb.append("[13]"); > } else if (isResourceManager) { > - sb.append("[10]"); > + sb.append("[12]"); > } else { >sb.append("[9]"); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3773) adoop-yarn-server-nodemanager's use of Linux /sbin/tc is non-portable
Alan Burlison created YARN-3773: --- Summary: adoop-yarn-server-nodemanager's use of Linux /sbin/tc is non-portable Key: YARN-3773 URL: https://issues.apache.org/jira/browse/YARN-3773 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Environment: BSD OSX Solaris Windows Linux Reporter: Alan Burlison hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c makes use of the Linux-only executable /sbin/tc (http://lartc.org/manpages/tc.txt) but there is no corresponding functionality for non-Linux platforms. The code in question also seems to try to execute tc even on platforms where it will never exist. Other platforms provide similar functionality, e.g. Solaris has an extensive range of network management features (http://www.oracle.com/technetwork/articles/servers-storage-admin/o11-095-s11-app-traffic-525038.html). Work is needed to abstract the network management features of Yarn so that the same facilities for network management can be provided on all platforms that provide the requisite functionality, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3773) hadoop-yarn-server-nodemanager's use of Linux /sbin/tc is non-portable
[ https://issues.apache.org/jira/browse/YARN-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Burlison updated YARN-3773: Summary: hadoop-yarn-server-nodemanager's use of Linux /sbin/tc is non-portable (was: adoop-yarn-server-nodemanager's use of Linux /sbin/tc is non-portable) > hadoop-yarn-server-nodemanager's use of Linux /sbin/tc is non-portable > -- > > Key: YARN-3773 > URL: https://issues.apache.org/jira/browse/YARN-3773 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Environment: BSD OSX Solaris Windows Linux >Reporter: Alan Burlison > > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c > makes use of the Linux-only executable /sbin/tc > (http://lartc.org/manpages/tc.txt) but there is no corresponding > functionality for non-Linux platforms. The code in question also seems to try > to execute tc even on platforms where it will never exist. > Other platforms provide similar functionality, e.g. Solaris has an extensive > range of network management features > (http://www.oracle.com/technetwork/articles/servers-storage-admin/o11-095-s11-app-traffic-525038.html). > Work is needed to abstract the network management features of Yarn so that > the same facilities for network management can be provided on all platforms > that provide the requisite functionality, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3764) CapacityScheduler should forbid moving LeafQueue from one parent to another
[ https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574620#comment-14574620 ] Hudson commented on YARN-3764: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2165 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2165/]) YARN-3764. CapacityScheduler should forbid moving LeafQueue from one parent to another. Contributed by Wangda Tan (jianhe: rev 6ad4e59cfc111a92747fdb1fb99cc6378044832a) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java > CapacityScheduler should forbid moving LeafQueue from one parent to another > --- > > Key: YARN-3764 > URL: https://issues.apache.org/jira/browse/YARN-3764 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Fix For: 2.7.1 > > Attachments: YARN-3764.1.patch > > > Currently CapacityScheduler doesn't handle the case well, for example: > A queue structure: > {code} > root > | > a (100) > / \ >x y > (50) (50) > {code} > And reinitialize using following structure: > {code} > root > / \ > (50)a x (50) > | > y >(100) > {code} > The actual queue structure after reinitialize is: > {code} > root > /\ >a (50) x (50) > / \ > xy > (50) (100) > {code} > We should forbid admin doing that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3724) Native compilation on Solaris fails on Yarn due to use of FTS
[ https://issues.apache.org/jira/browse/YARN-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Burlison reassigned YARN-3724: --- Assignee: Alan Burlison > Native compilation on Solaris fails on Yarn due to use of FTS > - > > Key: YARN-3724 > URL: https://issues.apache.org/jira/browse/YARN-3724 > Project: Hadoop YARN > Issue Type: Sub-task > Environment: Solaris 11.2 >Reporter: Malcolm Kavalsky >Assignee: Alan Burlison > Original Estimate: 24h > Remaining Estimate: 24h > > Compiling the Yarn Node Manager results in "fts" not found. On Solaris we > have an alternative ftw with similar functionality. > This is isolated to a single file container-executor.c > Note that this will just fix the compilation error. A more serious issue is > that Solaris does not support cgroups as Linux does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3766) ATS Web UI breaks because of YARN-3467
[ https://issues.apache.org/jira/browse/YARN-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574652#comment-14574652 ] Hudson commented on YARN-3766: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #217 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/217/]) YARN-3766. Fixed the apps table column error of generic history web UI. Contributed by Xuan Gong. (zjshen: rev 18dd01d6bf67f4d522b947454c1f4347d1cbbc19) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSView.java > ATS Web UI breaks because of YARN-3467 > -- > > Key: YARN-3766 > URL: https://issues.apache.org/jira/browse/YARN-3766 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, webapp, yarn >Affects Versions: 2.8.0 >Reporter: Xuan Gong >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.8.0 > > Attachments: ATSWebPageBreaks.png, YARN-3766.1.patch > > > The ATS web UI breaks because of the following changes made in YARN-3467. > {code} > +++ > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java > @@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs( >.append(", 'mRender': renderHadoopDate }") >.append("\n, {'sType':'numeric', bSearchable:false, 'aTargets':"); > if (isFairSchedulerPage) { > - sb.append("[11]"); > + sb.append("[13]"); > } else if (isResourceManager) { > - sb.append("[10]"); > + sb.append("[12]"); > } else { >sb.append("[9]"); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3764) CapacityScheduler should forbid moving LeafQueue from one parent to another
[ https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574648#comment-14574648 ] Hudson commented on YARN-3764: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #217 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/217/]) YARN-3764. CapacityScheduler should forbid moving LeafQueue from one parent to another. Contributed by Wangda Tan (jianhe: rev 6ad4e59cfc111a92747fdb1fb99cc6378044832a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java * hadoop-yarn-project/CHANGES.txt > CapacityScheduler should forbid moving LeafQueue from one parent to another > --- > > Key: YARN-3764 > URL: https://issues.apache.org/jira/browse/YARN-3764 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Fix For: 2.7.1 > > Attachments: YARN-3764.1.patch > > > Currently CapacityScheduler doesn't handle the case well, for example: > A queue structure: > {code} > root > | > a (100) > / \ >x y > (50) (50) > {code} > And reinitialize using following structure: > {code} > root > / \ > (50)a x (50) > | > y >(100) > {code} > The actual queue structure after reinitialize is: > {code} > root > /\ >a (50) x (50) > / \ > xy > (50) (100) > {code} > We should forbid admin doing that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2392) add more diags about app retry limits on AM failures
[ https://issues.apache.org/jira/browse/YARN-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574655#comment-14574655 ] Hudson commented on YARN-2392: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #217 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/217/]) YARN-2392. Add more diags about app retry limits on AM failures. Contributed by Steve Loughran (jianhe: rev 1970ca7cbcdb7efa160d0cedc2e3e22c1401fad6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java > add more diags about app retry limits on AM failures > > > Key: YARN-2392 > URL: https://issues.apache.org/jira/browse/YARN-2392 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-2392-001.patch, YARN-2392-002.patch, > YARN-2392-002.patch > > > # when an app fails the failure count is shown, but not what the global + > local limits are. If the two are different, they should both be printed. > # the YARN-2242 strings don't have enough whitespace between text and the URL -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3733) Fix DominantRC#compare() does not work as expected if cluster resource is empty
[ https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574658#comment-14574658 ] Hudson commented on YARN-3733: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #217 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/217/]) YARN-3733. Fix DominantRC#compare() does not work as expected if cluster resource is empty. (Rohith Sharmaks via wangda) (wangda: rev ebd797c48fe236b404cf3a125ac9d1f7714e291e) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/DominantResourceCalculator.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/CHANGES.txt Add missing test file of YARN-3733 (wangda: rev 405bbcf68c32d8fd8a83e46e686eacd14e5a533c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/resource/TestResourceCalculator.java > Fix DominantRC#compare() does not work as expected if cluster resource is > empty > --- > > Key: YARN-3733 > URL: https://issues.apache.org/jira/browse/YARN-3733 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Suse 11 Sp3 , 2 NM , 2 RM > one NM - 3 GB 6 v core >Reporter: Bibin A Chundatt >Assignee: Rohith >Priority: Blocker > Fix For: 2.7.1 > > Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, > 0002-YARN-3733.patch, YARN-3733.patch > > > Steps to reproduce > = > 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster) > 2. Configure map and reduce size to 512 MB after changing scheduler minimum > size to 512 MB > 3. Configure capacity scheduler and AM limit to .5 > (DominantResourceCalculator is configured) > 4. Submit 30 concurrent task > 5. Switch RM > Actual > = > For 12 Jobs AM gets allocated and all 12 starts running > No other Yarn child is initiated , *all 12 Jobs in Running state for ever* > Expected > === > Only 6 should be running at a time since max AM allocated is .5 (3072 MB) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3259) FairScheduler: Update to fairShare could be triggered early on node events instead of waiting for update interval
[ https://issues.apache.org/jira/browse/YARN-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574665#comment-14574665 ] Hadoop QA commented on YARN-3259: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 59s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 42s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 46s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 30s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 27s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 50m 2s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 87m 54s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12737948/YARN-3259.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 790a861 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8198/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8198/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8198/console | This message was automatically generated. > FairScheduler: Update to fairShare could be triggered early on node events > instead of waiting for update interval > -- > > Key: YARN-3259 > URL: https://issues.apache.org/jira/browse/YARN-3259 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3259.001.patch, YARN-3259.002.patch, > YARN-3259.003.patch > > > Instead of waiting for update interval unconditionally, we can trigger early > updates on important events - for eg node join and leave. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation
[ https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574746#comment-14574746 ] Karthik Kambatla commented on YARN-3655: Thanks for the clarifications, Zhihai. The latest patch looks mostly good, nice test. Few nit picks before we get this in: # In hasContainerForNode, the patch has some spurious changes. Also, would be nice to add a comment for the newly added check. # File a follow-up JIRA to separate out the code paths for assigning a reserved container and a non-reserved container. # File a follow-up JIRA to move all reservation-related tests from TestFairScheduler to TestFairSchedulerReservations > FairScheduler: potential livelock due to maxAMShare limitation and container > reservation > - > > Key: YARN-3655 > URL: https://issues.apache.org/jira/browse/YARN-3655 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-3655.000.patch, YARN-3655.001.patch, > YARN-3655.002.patch, YARN-3655.003.patch > > > FairScheduler: potential livelock due to maxAMShare limitation and container > reservation. > If a node is reserved by an application, all the other applications don't > have any chance to assign a new container on this node, unless the > application which reserves the node assigns a new container on this node or > releases the reserved container on this node. > The problem is if an application tries to call assignReservedContainer and > fail to get a new container due to maxAMShare limitation, it will block all > other applications to use the nodes it reserves. If all other running > applications can't release their AM containers due to being blocked by these > reserved containers. A livelock situation can happen. > The following is the code at FSAppAttempt#assignContainer which can cause > this potential livelock. > {code} > // Check the AM resource usage for the leaf queue > if (!isAmRunning() && !getUnmanagedAM()) { > List ask = appSchedulingInfo.getAllResourceRequests(); > if (ask.isEmpty() || !getQueue().canRunAppAM( > ask.get(0).getCapability())) { > if (LOG.isDebugEnabled()) { > LOG.debug("Skipping allocation because maxAMShare limit would " + > "be exceeded"); > } > return Resources.none(); > } > } > {code} > To fix this issue, we can unreserve the node if we can't allocate the AM > container on the node due to Max AM share limitation and the node is reserved > by the application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3508) Preemption processing occuring on the main RM dispatcher
[ https://issues.apache.org/jira/browse/YARN-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574753#comment-14574753 ] Jason Lowe commented on YARN-3508: -- Yes, it's not a cure-all to move the preemption processing to the scheduler event queue when the scheduler is the bottleneck, but we do have separate event queues for a reason. If it didn't matter who was the bottleneck then we'd just have one event queue for everything, correct? The scheduler event queue is primarily blocked by the big scheduler lock, and IMHO we should dispatch events that need that lock to that queue. Doing otherwise starts to couple the two event dispatchers together and we might as well just have the one event queue to rule them all. > Preemption processing occuring on the main RM dispatcher > > > Key: YARN-3508 > URL: https://issues.apache.org/jira/browse/YARN-3508 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Varun Saxena > Attachments: YARN-3508.002.patch, YARN-3508.01.patch > > > We recently saw the RM for a large cluster lag far behind on the > AsyncDispacher event queue. The AsyncDispatcher thread was consistently > blocked on the highly-contended CapacityScheduler lock trying to dispatch > preemption-related events for RMContainerPreemptEventDispatcher. Preemption > processing should occur on the scheduler event dispatcher thread or a > separate thread to avoid delaying the processing of other events in the > primary dispatcher queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3259) FairScheduler: Trigger fairShare updates on node events
[ https://issues.apache.org/jira/browse/YARN-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3259: --- Summary: FairScheduler: Trigger fairShare updates on node events (was: FairScheduler: Update to fairShare could be triggered early on node events instead of waiting for update interval ) > FairScheduler: Trigger fairShare updates on node events > --- > > Key: YARN-3259 > URL: https://issues.apache.org/jira/browse/YARN-3259 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3259.001.patch, YARN-3259.002.patch, > YARN-3259.003.patch > > > Instead of waiting for update interval unconditionally, we can trigger early > updates on important events - for eg node join and leave. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3259) FairScheduler: Trigger fairShare updates on node events
[ https://issues.apache.org/jira/browse/YARN-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574817#comment-14574817 ] Hudson commented on YARN-3259: -- FAILURE: Integrated in Hadoop-trunk-Commit #7976 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7976/]) YARN-3259. FairScheduler: Trigger fairShare updates on node events. (Anubhav Dhoot via kasha) (kasha: rev 75885852cc19dd6de12e62498b112d5d70ce87f4) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestSchedulingUpdate.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSOpDurations.java > FairScheduler: Trigger fairShare updates on node events > --- > > Key: YARN-3259 > URL: https://issues.apache.org/jira/browse/YARN-3259 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Fix For: 2.8.0 > > Attachments: YARN-3259.001.patch, YARN-3259.002.patch, > YARN-3259.003.patch > > > Instead of waiting for update interval unconditionally, we can trigger early > updates on important events - for eg node join and leave. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3774) ZKRMStateStore should use Curator 3.0 and avail CuratorOp
Karthik Kambatla created YARN-3774: -- Summary: ZKRMStateStore should use Curator 3.0 and avail CuratorOp Key: YARN-3774 URL: https://issues.apache.org/jira/browse/YARN-3774 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.8.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker YARN-2716 changes ZKRMStateStore to use Curator. Transactions added there are somewhat involved, and could be improved using CuratorOp introduced in Curator 3.0. Hadoop 3.0.0 would be a good time to upgrade the Curator version and make this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated YARN-467: Attachment: YARN-574.1.patch > Jobs fail during resource localization when public distributed-cache hits > unix directory limits > --- > > Key: YARN-467 > URL: https://issues.apache.org/jira/browse/YARN-467 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Fix For: 2.1.0-beta > > Attachments: YARN-574.1.patch, yarn-467-20130322.1.patch, > yarn-467-20130322.2.patch, yarn-467-20130322.3.patch, > yarn-467-20130322.patch, yarn-467-20130325.1.patch, yarn-467-20130325.path, > yarn-467-20130328.patch, yarn-467-20130401.patch, yarn-467-20130402.1.patch, > yarn-467-20130402.2.patch, yarn-467-20130402.patch, yarn-467-testCode.tar > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache (PUBLIC). The jobs start failing with > the below exception. > java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 > failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > we need to have a mechanism where in we can create directory hierarchy and > limit number of files per directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-574) PrivateLocalizer does not support parallel resource download via ContainerLocalizer
[ https://issues.apache.org/jira/browse/YARN-574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated YARN-574: Attachment: YARN-574.1.patch > PrivateLocalizer does not support parallel resource download via > ContainerLocalizer > --- > > Key: YARN-574 > URL: https://issues.apache.org/jira/browse/YARN-574 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.6.0, 2.8.0, 2.7.1 >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: YARN-574.1.patch > > > At present private resources will be downloaded in parallel only if multiple > containers request the same resource. However otherwise it will be serial. > The protocol between PrivateLocalizer and ContainerLocalizer supports > multiple downloads however it is not used and only one resource is sent for > downloading at a time. > I think we can increase / assure parallelism (even for single container > requesting resource) for private/application resources by making multiple > downloads per ContainerLocalizer. > Total Parallelism before > = number of threads allotted for PublicLocalizer [public resource] + number > of containers[private and application resource] > Total Parallelism after > = number of threads allotted for PublicLocalizer [public resource] + number > of containers * max downloads per container [private and application resource] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated YARN-467: Attachment: (was: YARN-574.1.patch) > Jobs fail during resource localization when public distributed-cache hits > unix directory limits > --- > > Key: YARN-467 > URL: https://issues.apache.org/jira/browse/YARN-467 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Fix For: 2.1.0-beta > > Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, > yarn-467-20130322.3.patch, yarn-467-20130322.patch, > yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, > yarn-467-20130401.patch, yarn-467-20130402.1.patch, > yarn-467-20130402.2.patch, yarn-467-20130402.patch, yarn-467-testCode.tar > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache (PUBLIC). The jobs start failing with > the below exception. > java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 > failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > we need to have a mechanism where in we can create directory hierarchy and > limit number of files per directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-574) PrivateLocalizer does not support parallel resource download via ContainerLocalizer
[ https://issues.apache.org/jira/browse/YARN-574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated YARN-574: Release Note: YARN-574. Allow parallel download of resources in PrivateLocalizer. Contributed by Zheng Shao. (was: YARN-543. Allow parallel download of resources in PrivateLocalizer. Contributed by Zheng Shao.) > PrivateLocalizer does not support parallel resource download via > ContainerLocalizer > --- > > Key: YARN-574 > URL: https://issues.apache.org/jira/browse/YARN-574 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.6.0, 2.8.0, 2.7.1 >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: YARN-574.1.patch > > > At present private resources will be downloaded in parallel only if multiple > containers request the same resource. However otherwise it will be serial. > The protocol between PrivateLocalizer and ContainerLocalizer supports > multiple downloads however it is not used and only one resource is sent for > downloading at a time. > I think we can increase / assure parallelism (even for single container > requesting resource) for private/application resources by making multiple > downloads per ContainerLocalizer. > Total Parallelism before > = number of threads allotted for PublicLocalizer [public resource] + number > of containers[private and application resource] > Total Parallelism after > = number of threads allotted for PublicLocalizer [public resource] + number > of containers * max downloads per container [private and application resource] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-1502) Protocol changes in RM side to support change container resource
[ https://issues.apache.org/jira/browse/YARN-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan resolved YARN-1502. -- Resolution: Duplicate This one is actually duplicated with YARN-1646, "protocol changes" are already done. > Protocol changes in RM side to support change container resource > > > Key: YARN-1502 > URL: https://issues.apache.org/jira/browse/YARN-1502 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: Wangda Tan (No longer used) > Attachments: yarn-1502.1.patch, yarn-1502.2.patch > > > This JIRA is to track protocol (including ApplicationMasterProtocol and > ApplicationClientProtocol) changes to support change container resource in RM > side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3259) FairScheduler: Trigger fairShare updates on node events
[ https://issues.apache.org/jira/browse/YARN-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574918#comment-14574918 ] Anubhav Dhoot commented on YARN-3259: - Thanks [~kasha] for review and commit! > FairScheduler: Trigger fairShare updates on node events > --- > > Key: YARN-3259 > URL: https://issues.apache.org/jira/browse/YARN-3259 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Fix For: 2.8.0 > > Attachments: YARN-3259.001.patch, YARN-3259.002.patch, > YARN-3259.003.patch > > > Instead of waiting for update interval unconditionally, we can trigger early > updates on important events - for eg node join and leave. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing
[ https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574928#comment-14574928 ] Ashwin Shankar commented on YARN-3453: -- hey folks, Looking into the patch, will get back with comments. > Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator > even in DRF mode causing thrashing > > > Key: YARN-3453 > URL: https://issues.apache.org/jira/browse/YARN-3453 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Ashwin Shankar >Assignee: Arun Suresh > Attachments: YARN-3453.1.patch, YARN-3453.2.patch > > > There are two places in preemption code flow where DefaultResourceCalculator > is used, even in DRF mode. > Which basically results in more resources getting preempted than needed, and > those extra preempted containers aren’t even getting to the “starved” queue > since scheduling logic is based on DRF's Calculator. > Following are the two places : > 1. {code:title=FSLeafQueue.java|borderStyle=solid} > private boolean isStarved(Resource share) > {code} > A queue shouldn’t be marked as “starved” if the dominant resource usage > is >= fair/minshare. > 2. {code:title=FairScheduler.java|borderStyle=solid} > protected Resource resToPreempt(FSLeafQueue sched, long curTime) > {code} > -- > One more thing that I believe needs to change in DRF mode is : during a > preemption round,if preempting a few containers results in satisfying needs > of a resource type, then we should exit that preemption round, since the > containers that we just preempted should bring the dominant resource usage to > min/fair share. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3508) Preemption processing occuring on the main RM dispatcher
[ https://issues.apache.org/jira/browse/YARN-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574935#comment-14574935 ] Jian He commented on YARN-3508: --- I think this make sense to move preemption events from main dispatcher to the scheduler dispatcher. Otherwise, any non-scheduler events on main dispatcher will be waiting for the preemption events to grab the scheduler lock, which is unnecessary. > Preemption processing occuring on the main RM dispatcher > > > Key: YARN-3508 > URL: https://issues.apache.org/jira/browse/YARN-3508 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Varun Saxena > Attachments: YARN-3508.002.patch, YARN-3508.01.patch > > > We recently saw the RM for a large cluster lag far behind on the > AsyncDispacher event queue. The AsyncDispatcher thread was consistently > blocked on the highly-contended CapacityScheduler lock trying to dispatch > preemption-related events for RMContainerPreemptEventDispatcher. Preemption > processing should occur on the scheduler event dispatcher thread or a > separate thread to avoid delaying the processing of other events in the > primary dispatcher queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3774) ZKRMStateStore should use Curator 3.0 and avail CuratorOp
[ https://issues.apache.org/jira/browse/YARN-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574957#comment-14574957 ] Sean Busbey commented on YARN-3774: --- Please make sure to document the impact of moving to Curator 3. The last time we updated the curator version (2.6.0 -> 2.7.1) they broke compatibility. > ZKRMStateStore should use Curator 3.0 and avail CuratorOp > - > > Key: YARN-3774 > URL: https://issues.apache.org/jira/browse/YARN-3774 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Blocker > > YARN-2716 changes ZKRMStateStore to use Curator. Transactions added there are > somewhat involved, and could be improved using CuratorOp introduced in > Curator 3.0. Hadoop 3.0.0 would be a good time to upgrade the Curator version > and make this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2716) Refactor ZKRMStateStore retry code with Apache Curator
[ https://issues.apache.org/jira/browse/YARN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574979#comment-14574979 ] Karthik Kambatla commented on YARN-2716: Thanks for the thorough review, Jian. Sorry for missing some simple things in the patch. bq. will the safeDelete throw noNodeExist exception if deleting a non-existing zone? safeDelete checks if the znode exists before attempting to delete it. So, shouldn't throw NoNodeException. bq. why in HA case, zkRetryInterval is calculated as below When HA is not enabled, we should give the store as much time as possible to connect to ZK. When HA is enabled, it is possible the other RM has better chance of connecting to ZK; so, we should give up trying by session-timeout. YARN-2054 has all the details. Posting a patch shortly to address all the review feedback. > Refactor ZKRMStateStore retry code with Apache Curator > -- > > Key: YARN-2716 > URL: https://issues.apache.org/jira/browse/YARN-2716 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Karthik Kambatla > Attachments: yarn-2716-1.patch, yarn-2716-prelim.patch, > yarn-2716-prelim.patch, yarn-2716-super-prelim.patch > > > Per suggestion by [~kasha] in YARN-2131, it's nice to use curator to > simplify the retry logic in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2716) Refactor ZKRMStateStore retry code with Apache Curator
[ https://issues.apache.org/jira/browse/YARN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2716: --- Attachment: yarn-2716-2.patch New patch to address review comments. [~jianhe] - once you are comfortable with the patch, I would like to move methods in {{ZKRMStateStore}} around to put all Curator-related methods together towards the end for better readability. > Refactor ZKRMStateStore retry code with Apache Curator > -- > > Key: YARN-2716 > URL: https://issues.apache.org/jira/browse/YARN-2716 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Karthik Kambatla > Attachments: yarn-2716-1.patch, yarn-2716-2.patch, > yarn-2716-prelim.patch, yarn-2716-prelim.patch, yarn-2716-super-prelim.patch > > > Per suggestion by [~kasha] in YARN-2131, it's nice to use curator to > simplify the retry logic in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-574) PrivateLocalizer does not support parallel resource download via ContainerLocalizer
[ https://issues.apache.org/jira/browse/YARN-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574988#comment-14574988 ] Hadoop QA commented on YARN-574: \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 17s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 37s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 18s | The applied patch generated 2 new checkstyle issues (total was 213, now 214). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 44s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 26s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 6m 4s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 47m 49s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12737994/YARN-574.1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 7588585 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8199/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8199/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8199/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8199/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8199/console | This message was automatically generated. > PrivateLocalizer does not support parallel resource download via > ContainerLocalizer > --- > > Key: YARN-574 > URL: https://issues.apache.org/jira/browse/YARN-574 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.6.0, 2.8.0, 2.7.1 >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: YARN-574.1.patch > > > At present private resources will be downloaded in parallel only if multiple > containers request the same resource. However otherwise it will be serial. > The protocol between PrivateLocalizer and ContainerLocalizer supports > multiple downloads however it is not used and only one resource is sent for > downloading at a time. > I think we can increase / assure parallelism (even for single container > requesting resource) for private/application resources by making multiple > downloads per ContainerLocalizer. > Total Parallelism before > = number of threads allotted for PublicLocalizer [public resource] + number > of containers[private and application resource] > Total Parallelism after > = number of threads allotted for PublicLocalizer [public resource] + number > of containers * max downloads per container [private and application resource] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) YARN Timeline Service: Next generation
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574991#comment-14574991 ] James Taylor commented on YARN-2928: Nice writeup, [~vrushalic]. For your benchmarks, if you're pre-splitting for the HBase direct write path but not for the Phoenix write path, you're not really comparing apples-to-apples. There are a number of ways you can install your KeyPrefixRegionSplitPolicy in Phoenix. The easiest is probably to create the HBase table the same way (through code or using the HBase shell) with the KeyPrefixRegionSplitPolicy specified at create time. Then, in Phoenix you can issue a CREATE TABLE statement against the existing HBase table and it'll just map to it. Then you'll have your split policy for your benchmark in both write paths. An alternative to dynamic columns is to define views over your Phoenix table (http://phoenix.apache.org/views.html). In each view, you could specify the set of columns it contains. Then you can use the regular JDBC metadata APIs to get the set of columns that define your view: http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getColumns%28java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String%29 Another interesting angle with views (not sure if this is relevant for your use case or not), but they're capable of being multi-tenant where the definition of the "tenant" is up to you (maybe it would map to a User?). In this case, each tenant can define their own derived view and add columns specific to their usage. You can even create secondary indexes over a view. This is the way Phoenix surfaces NoSQL in the SQL world. More here: http://phoenix.apache.org/multi-tenancy.html There is room for improvement in the Phoenix write path, though. I've filed PHOENIX-2028 and plan to work on that shortly. If you do end up going with a direct HBase write path, I'd encourage you to use the Phoenix serialization format (through PDataType and derived classes) to ensure you can do adhoc querying on the data. The most important aspect is how your row key is written and the separators you use if you're storing multiple values in the row key. > YARN Timeline Service: Next generation > -- > > Key: YARN-2928 > URL: https://issues.apache.org/jira/browse/YARN-2928 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal > v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx, > TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf > > > We have the application timeline server implemented in yarn per YARN-1530 and > YARN-321. Although it is a great feature, we have recognized several critical > issues and features that need to be addressed. > This JIRA proposes the design and implementation changes to address those. > This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3774) ZKRMStateStore should use Curator 3.0 and avail CuratorOp
[ https://issues.apache.org/jira/browse/YARN-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574993#comment-14574993 ] Karthik Kambatla commented on YARN-3774: Fair point, Sean. We should hold off until Hadoop 3, primarily to handle potential compat issues. > ZKRMStateStore should use Curator 3.0 and avail CuratorOp > - > > Key: YARN-3774 > URL: https://issues.apache.org/jira/browse/YARN-3774 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Blocker > > YARN-2716 changes ZKRMStateStore to use Curator. Transactions added there are > somewhat involved, and could be improved using CuratorOp introduced in > Curator 3.0. Hadoop 3.0.0 would be a good time to upgrade the Curator version > and make this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs
[ https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574996#comment-14574996 ] Sergey Shelukhin commented on YARN-1462: Patch looks like it won't break Tez > AHS API and other AHS changes to handle tags for completed MR jobs > -- > > Key: YARN-1462 > URL: https://issues.apache.org/jira/browse/YARN-1462 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-1462-branch-2.7-1.2.patch, > YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, > YARN-1462.3.patch, YARN-1462.4.patch > > > AHS related work for tags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3706) Generalize native HBase writer for additional tables
[ https://issues.apache.org/jira/browse/YARN-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575025#comment-14575025 ] Joep Rottinghuis commented on YARN-3706: Oops, I introduced an infinite loop: {noformat} java.lang.StackOverflowError at org.apache.hadoop.yarn.server.timelineservice.storage.common.TimelineWriterUtils.joinEncoded(TimelineWriterUtils.java:185) {noformat} I'll fix that, it makes the writer a bit slow... > Generalize native HBase writer for additional tables > > > Key: YARN-3706 > URL: https://issues.apache.org/jira/browse/YARN-3706 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Joep Rottinghuis >Assignee: Joep Rottinghuis >Priority: Minor > Attachments: YARN-3706-YARN-2928.001.patch, > YARN-3726-YARN-2928.002.patch, YARN-3726-YARN-2928.003.patch, > YARN-3726-YARN-2928.004.patch, YARN-3726-YARN-2928.005.patch > > > When reviewing YARN-3411 we noticed that we could change the class hierarchy > a little in order to accommodate additional tables easily. > In order to get ready for benchmark testing we left the original layout in > place, as performance would not be impacted by the code hierarchy. > Here is a separate jira to address the hierarchy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) YARN Timeline Service: Next generation
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575059#comment-14575059 ] Vrushali C commented on YARN-2928: -- Hi [~jamestaylor] Thank you for taking the time to look through the write up and for filing PHOENIX-2028. In the context of pre-splits, yes, we wanted to have both writers write to tables that were pre-split with the same presplit strategy. However, I believe the folks working on the Phoenix writer mentioned that the only way to achieve in Phoenix that was to use SPLIT ON substatement, which required that approach to rewrite the HBase presplitting strategy. Perhaps [~gtCarrera9] might be able to speak to this better. bq. I'd encourage you to use the Phoenix serialization format (through PDataType and derived classes) to ensure you can do adhoc querying on the data Okay, thanks, I will check that out. We are working on a whole set of enhancements for the base writer as well and I will look at this. bq. The most important aspect is how your row key is written and the separators you use if you're storing multiple values in the row key. You’ve hit the nail on the head. We do have multiple values with different datatypes in row key as well as in column names with and without prefixes, so we have different datatypes and bunch of separators. [~jrottinghuis] has been addressing these points in YARN-3706 , for e.g. dealing with storing and parsing byte representations of separators. The timeline service schema has more tables and we are considering storing aggregated values in these Phoenix based tables (current thinking is to have them populated via co-processors watching the basic entity table). Thanks for suggesting defining views on Phoenix tables, I will look up more details on that. Thanks once again, Vrushali > YARN Timeline Service: Next generation > -- > > Key: YARN-2928 > URL: https://issues.apache.org/jira/browse/YARN-2928 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal > v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx, > TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf > > > We have the application timeline server implemented in yarn per YARN-1530 and > YARN-321. Although it is a great feature, we have recognized several critical > issues and features that need to be addressed. > This JIRA proposes the design and implementation changes to address those. > This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs
[ https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575094#comment-14575094 ] Xuan Gong commented on YARN-1462: - Committed into trunk/branch-2. Thanks, zhijie for review, and Sergey for verification. > AHS API and other AHS changes to handle tags for completed MR jobs > -- > > Key: YARN-1462 > URL: https://issues.apache.org/jira/browse/YARN-1462 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-1462-branch-2.7-1.2.patch, > YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, > YARN-1462.3.patch, YARN-1462.4.patch > > > AHS related work for tags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs
[ https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575102#comment-14575102 ] Hudson commented on YARN-1462: -- FAILURE: Integrated in Hadoop-trunk-Commit #7977 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7977/]) YARN-1462. AHS API and other AHS changes to handle tags for completed MR jobs. Contributed by Xuan Gong (xgong: rev 3e000a919fede85230fcfb06309a1f1d5e0c479c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/CHANGES.txt > AHS API and other AHS changes to handle tags for completed MR jobs > -- > > Key: YARN-1462 > URL: https://issues.apache.org/jira/browse/YARN-1462 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-1462-branch-2.7-1.2.patch, > YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, > YARN-1462.3.patch, YARN-1462.4.patch > > > AHS related work for tags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-38) Add an option to drain the ResourceManager of all apps for upgrades
[ https://issues.apache.org/jira/browse/YARN-38?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla resolved YARN-38. -- Resolution: Won't Fix Now that we have work-preserving RM/NM restarts, rolling restarts/upgrades shouldn't require draining apps from the RM. YARN-914 addresses draining jobs when a node is decommissioned. > Add an option to drain the ResourceManager of all apps for upgrades > --- > > Key: YARN-38 > URL: https://issues.apache.org/jira/browse/YARN-38 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Arun C Murthy > > MAPREDUCE-4575 for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2716) Refactor ZKRMStateStore retry code with Apache Curator
[ https://issues.apache.org/jira/browse/YARN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575163#comment-14575163 ] Hadoop QA commented on YARN-2716: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 18s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | javac | 7m 33s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 35s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 5s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 57s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 31s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 50m 47s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 92m 58s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12738007/yarn-2716-2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 7588585 | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8200/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8200/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8200/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8200/console | This message was automatically generated. > Refactor ZKRMStateStore retry code with Apache Curator > -- > > Key: YARN-2716 > URL: https://issues.apache.org/jira/browse/YARN-2716 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Karthik Kambatla > Attachments: yarn-2716-1.patch, yarn-2716-2.patch, > yarn-2716-prelim.patch, yarn-2716-prelim.patch, yarn-2716-super-prelim.patch > > > Per suggestion by [~kasha] in YARN-2131, it's nice to use curator to > simplify the retry logic in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) YARN Timeline Service: Next generation
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575358#comment-14575358 ] Li Lu commented on YARN-2928: - Hi [~jamestaylor] Thank you very much for your suggestions and PHOENIX-2028! I wrote the experimental Phoenix writer code and currently have some follow up questions w.r.t your comments. bq. The easiest is probably to create the HBase table the same way (through code or using the HBase shell) with the KeyPrefixRegionSplitPolicy specified at create time. Then, in Phoenix you can issue a CREATE TABLE statement against the existing HBase table and it'll just map to it. Then you'll have your split policy for your benchmark in both write paths. If I understand this correctly, in this case, Phoenix will inherit pre-split settings from HBase? Will this alter the existing HBase table, including its schema and/or data inside? In general, if one runs CREATE TABLE IF NOT EXISTS or simply CREATE TABLE commands over a pre-split existing HBase table, will Phoenix simply accept the existing table as-is? bq. An alternative to dynamic columns is to define views over your Phoenix table (http://phoenix.apache.org/views.html). I once looked at views but I'm not sure if that fits our write path use case well. Let me briefly talk about our use case in YARN first. In general, we would like to dynamically store the configuration and metrics for each YARN timeline entity in a Phoenix database, such that our timeline reader apps or users can use SQL to query historical data. Phoenix view may make a perfect solution for the reader use cases. However, we are hitting problems on the writer side. We store each configuration/metric key-value pair in a dynamic column. This causes us two main troubles. First, we need to use a dynamically generated SQL statement to write to the Phoenix table which is cumbersome and error-prone. Second, when performing aggregations, we need to aggregate on all available metrics for an application (or a user, flow), but we cannot simply iterate on those dynamic columns because there is no such API. I'm not sure how to resolve these two problems via Phoenix view, or via existing Phoenix APIs. Actually, I suspect that if it's possible to fall back to the HBase-style APIs, our writer path would be much simpler. bq. If you do end up going with a direct HBase write path, I'd encourage you to use the Phoenix serialization format (through PDataType and derived classes) to ensure you can do adhoc querying on the data. We're currently looking into this method in the aggregation part. We're doing our best to support SQL on the aggregated data by using Phoenix. One potential solution is to use HBase coprocessors to aggregate application data from the HBase storage, and then store them in a Phoenix aggregation table. However, if we want to keep aggregating on the Phoenix table, can we also write a HBase coprocessor that read the Phoenix PDataTypes, and aggregate them into other Phoenix tables? If it's possible, are there any stable (or "safe") APIs for PDataTypes? A slightly more generalized question here is, is SQL the _only_ API for Phoenix, or there may be more? I ask this question because from a YARN timeline service perspective, Phoenix is a nice tool through which we can easily add SQL support to our final users, but we may not necessarily use SQL to program it all the time? Thank you very much for your comments and help from the Phoenix side. Our current Phoenix writer is more of an experimental version, but we really hope to have something for our aggregators and readers in near future. > YARN Timeline Service: Next generation > -- > > Key: YARN-2928 > URL: https://issues.apache.org/jira/browse/YARN-2928 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal > v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx, > TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf > > > We have the application timeline server implemented in yarn per YARN-1530 and > YARN-321. Although it is a great feature, we have recognized several critical > issues and features that need to be addressed. > This JIRA proposes the design and implementation changes to address those. > This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3775) Job does not exit after all node become unhealthy
Chengshun Xia created YARN-3775: --- Summary: Job does not exit after all node become unhealthy Key: YARN-3775 URL: https://issues.apache.org/jira/browse/YARN-3775 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.1 Environment: Environment: Version : 2.7.0 OS: RHEL7 NameNodes: xiachsh11 xiachsh12 (HA enabled) DataNodes: 5 xiachsh13-17 ResourceManage: xiachsh11 NodeManage: 5 xiachsh13-17 all nodes are openstack provisioned: MEM: 1.5G Disk: 16G Reporter: Chengshun Xia Running Terasort with data size 10G, all the containers exit since the disk space threshold 0.90 reached,at this point,the job does not exit with error 15/06/05 13:13:28 INFO mapreduce.Job: map 9% reduce 0% 15/06/05 13:13:52 INFO mapreduce.Job: map 10% reduce 0% 15/06/05 13:14:30 INFO mapreduce.Job: map 11% reduce 0% 15/06/05 13:15:11 INFO mapreduce.Job: map 12% reduce 0% 15/06/05 13:15:43 INFO mapreduce.Job: map 13% reduce 0% 15/06/05 13:16:38 INFO mapreduce.Job: map 14% reduce 0% 15/06/05 13:16:41 INFO mapreduce.Job: map 15% reduce 0% 15/06/05 13:16:53 INFO mapreduce.Job: map 16% reduce 0% 15/06/05 13:17:24 INFO mapreduce.Job: map 17% reduce 0% 15/06/05 13:17:53 INFO mapreduce.Job: map 18% reduce 0% 15/06/05 13:18:36 INFO mapreduce.Job: map 19% reduce 0% 15/06/05 13:19:03 INFO mapreduce.Job: map 20% reduce 0% 15/06/05 13:19:09 INFO mapreduce.Job: map 15% reduce 0% 15/06/05 13:19:32 INFO mapreduce.Job: map 16% reduce 0% 15/06/05 13:20:00 INFO mapreduce.Job: map 17% reduce 0% 15/06/05 13:20:36 INFO mapreduce.Job: map 18% reduce 0% 15/06/05 13:20:57 INFO mapreduce.Job: map 19% reduce 0% 15/06/05 13:21:22 INFO mapreduce.Job: map 18% reduce 0% 15/06/05 13:21:24 INFO mapreduce.Job: map 14% reduce 0% 15/06/05 13:21:25 INFO mapreduce.Job: map 9% reduce 0% 15/06/05 13:21:28 INFO mapreduce.Job: map 10% reduce 0% 15/06/05 13:22:22 INFO mapreduce.Job: map 11% reduce 0% 15/06/05 13:23:06 INFO mapreduce.Job: map 12% reduce 0% 15/06/05 13:23:41 INFO mapreduce.Job: map 9% reduce 0% 15/06/05 13:23:42 INFO mapreduce.Job: map 5% reduce 0% 15/06/05 13:24:38 INFO mapreduce.Job: map 6% reduce 0% 15/06/05 13:25:16 INFO mapreduce.Job: map 7% reduce 0% 15/06/05 13:25:53 INFO mapreduce.Job: map 8% reduce 0% 15/06/05 13:26:35 INFO mapreduce.Job: map 9% reduce 0% the last response time is 15/06/05 13:26:35 and current time : [root@xiachsh11 logs]# date Fri Jun 5 19:19:59 EDT 2015 [root@xiachsh11 logs]# [root@xiachsh11 logs]# yarn node -list 15/06/05 19:20:18 INFO client.RMProxy: Connecting to ResourceManager at xiachsh11.eng.platformlab.ibm.com/9.21.62.234:8032 Total Nodes:0 Node-Id Node-State Node-Http-Address Number-of-Running-Containers [root@xiachsh11 logs]# -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2716) Refactor ZKRMStateStore retry code with Apache Curator
[ https://issues.apache.org/jira/browse/YARN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575395#comment-14575395 ] Jian He commented on YARN-2716: --- bq. safeDelete checks if the znode exists before attempting to delete it. So, shouldn't throw NoNodeException. ah, right sorry, I overlooked the implementation of the method only comment is : - removeApplicationStateInternal can also use the {{curatorFramework.delete().deletingChildrenIfNeeded()}} instead of adding all children manually ? > Refactor ZKRMStateStore retry code with Apache Curator > -- > > Key: YARN-2716 > URL: https://issues.apache.org/jira/browse/YARN-2716 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Karthik Kambatla > Attachments: yarn-2716-1.patch, yarn-2716-2.patch, > yarn-2716-prelim.patch, yarn-2716-prelim.patch, yarn-2716-super-prelim.patch > > > Per suggestion by [~kasha] in YARN-2131, it's nice to use curator to > simplify the retry logic in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2716) Refactor ZKRMStateStore retry code with Apache Curator
[ https://issues.apache.org/jira/browse/YARN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575406#comment-14575406 ] Karthik Kambatla commented on YARN-2716: bq. removeApplicationStateInternal can also use the curatorFramework.delete().deletingChildrenIfNeeded() instead of adding all children manually ? safeDelete adds the nodes to a transaction that creates and deletes the fencing node as well. Curator transactions don't support {{deletingChildrenIfNeeded}} yet. > Refactor ZKRMStateStore retry code with Apache Curator > -- > > Key: YARN-2716 > URL: https://issues.apache.org/jira/browse/YARN-2716 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Karthik Kambatla > Attachments: yarn-2716-1.patch, yarn-2716-2.patch, > yarn-2716-prelim.patch, yarn-2716-prelim.patch, yarn-2716-super-prelim.patch > > > Per suggestion by [~kasha] in YARN-2131, it's nice to use curator to > simplify the retry logic in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3775) Job does not exit after all node become unhealthy
[ https://issues.apache.org/jira/browse/YARN-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengshun Xia updated YARN-3775: Attachment: logs.tar.gz log of resource manager and node manager, /etc/hadoop command output of terasort > Job does not exit after all node become unhealthy > - > > Key: YARN-3775 > URL: https://issues.apache.org/jira/browse/YARN-3775 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 > Environment: Environment: > Version : 2.7.0 > OS: RHEL7 > NameNodes: xiachsh11 xiachsh12 (HA enabled) > DataNodes: 5 xiachsh13-17 > ResourceManage: xiachsh11 > NodeManage: 5 xiachsh13-17 > all nodes are openstack provisioned: > MEM: 1.5G > Disk: 16G >Reporter: Chengshun Xia > Attachments: logs.tar.gz > > > Running Terasort with data size 10G, all the containers exit since the disk > space threshold 0.90 reached,at this point,the job does not exit with error > 15/06/05 13:13:28 INFO mapreduce.Job: map 9% reduce 0% > 15/06/05 13:13:52 INFO mapreduce.Job: map 10% reduce 0% > 15/06/05 13:14:30 INFO mapreduce.Job: map 11% reduce 0% > 15/06/05 13:15:11 INFO mapreduce.Job: map 12% reduce 0% > 15/06/05 13:15:43 INFO mapreduce.Job: map 13% reduce 0% > 15/06/05 13:16:38 INFO mapreduce.Job: map 14% reduce 0% > 15/06/05 13:16:41 INFO mapreduce.Job: map 15% reduce 0% > 15/06/05 13:16:53 INFO mapreduce.Job: map 16% reduce 0% > 15/06/05 13:17:24 INFO mapreduce.Job: map 17% reduce 0% > 15/06/05 13:17:53 INFO mapreduce.Job: map 18% reduce 0% > 15/06/05 13:18:36 INFO mapreduce.Job: map 19% reduce 0% > 15/06/05 13:19:03 INFO mapreduce.Job: map 20% reduce 0% > 15/06/05 13:19:09 INFO mapreduce.Job: map 15% reduce 0% > 15/06/05 13:19:32 INFO mapreduce.Job: map 16% reduce 0% > 15/06/05 13:20:00 INFO mapreduce.Job: map 17% reduce 0% > 15/06/05 13:20:36 INFO mapreduce.Job: map 18% reduce 0% > 15/06/05 13:20:57 INFO mapreduce.Job: map 19% reduce 0% > 15/06/05 13:21:22 INFO mapreduce.Job: map 18% reduce 0% > 15/06/05 13:21:24 INFO mapreduce.Job: map 14% reduce 0% > 15/06/05 13:21:25 INFO mapreduce.Job: map 9% reduce 0% > 15/06/05 13:21:28 INFO mapreduce.Job: map 10% reduce 0% > 15/06/05 13:22:22 INFO mapreduce.Job: map 11% reduce 0% > 15/06/05 13:23:06 INFO mapreduce.Job: map 12% reduce 0% > 15/06/05 13:23:41 INFO mapreduce.Job: map 9% reduce 0% > 15/06/05 13:23:42 INFO mapreduce.Job: map 5% reduce 0% > 15/06/05 13:24:38 INFO mapreduce.Job: map 6% reduce 0% > 15/06/05 13:25:16 INFO mapreduce.Job: map 7% reduce 0% > 15/06/05 13:25:53 INFO mapreduce.Job: map 8% reduce 0% > 15/06/05 13:26:35 INFO mapreduce.Job: map 9% reduce 0% > the last response time is 15/06/05 13:26:35 > and current time : > [root@xiachsh11 logs]# date > Fri Jun 5 19:19:59 EDT 2015 > [root@xiachsh11 logs]# > [root@xiachsh11 logs]# yarn node -list > 15/06/05 19:20:18 INFO client.RMProxy: Connecting to ResourceManager at > xiachsh11.eng.platformlab.ibm.com/9.21.62.234:8032 > Total Nodes:0 > Node-Id Node-State Node-Http-Address > Number-of-Running-Containers > [root@xiachsh11 logs]# -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2716) Refactor ZKRMStateStore retry code with Apache Curator
[ https://issues.apache.org/jira/browse/YARN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575417#comment-14575417 ] Jian He commented on YARN-2716: --- bq. creates and deletes the fencing node as well. I see, thanks. > Refactor ZKRMStateStore retry code with Apache Curator > -- > > Key: YARN-2716 > URL: https://issues.apache.org/jira/browse/YARN-2716 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Karthik Kambatla > Attachments: yarn-2716-1.patch, yarn-2716-2.patch, > yarn-2716-prelim.patch, yarn-2716-prelim.patch, yarn-2716-super-prelim.patch > > > Per suggestion by [~kasha] in YARN-2131, it's nice to use curator to > simplify the retry logic in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2716) Refactor ZKRMStateStore retry code with Apache Curator
[ https://issues.apache.org/jira/browse/YARN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2716: --- Attachment: yarn-2716-3.patch New patch just moves the class members (fields and methods) around to put all curator-related methods together. > Refactor ZKRMStateStore retry code with Apache Curator > -- > > Key: YARN-2716 > URL: https://issues.apache.org/jira/browse/YARN-2716 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Karthik Kambatla > Attachments: yarn-2716-1.patch, yarn-2716-2.patch, yarn-2716-3.patch, > yarn-2716-prelim.patch, yarn-2716-prelim.patch, yarn-2716-super-prelim.patch > > > Per suggestion by [~kasha] in YARN-2131, it's nice to use curator to > simplify the retry logic in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation
[ https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3655: Attachment: YARN-3655.004.patch > FairScheduler: potential livelock due to maxAMShare limitation and container > reservation > - > > Key: YARN-3655 > URL: https://issues.apache.org/jira/browse/YARN-3655 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-3655.000.patch, YARN-3655.001.patch, > YARN-3655.002.patch, YARN-3655.003.patch, YARN-3655.004.patch > > > FairScheduler: potential livelock due to maxAMShare limitation and container > reservation. > If a node is reserved by an application, all the other applications don't > have any chance to assign a new container on this node, unless the > application which reserves the node assigns a new container on this node or > releases the reserved container on this node. > The problem is if an application tries to call assignReservedContainer and > fail to get a new container due to maxAMShare limitation, it will block all > other applications to use the nodes it reserves. If all other running > applications can't release their AM containers due to being blocked by these > reserved containers. A livelock situation can happen. > The following is the code at FSAppAttempt#assignContainer which can cause > this potential livelock. > {code} > // Check the AM resource usage for the leaf queue > if (!isAmRunning() && !getUnmanagedAM()) { > List ask = appSchedulingInfo.getAllResourceRequests(); > if (ask.isEmpty() || !getQueue().canRunAppAM( > ask.get(0).getCapability())) { > if (LOG.isDebugEnabled()) { > LOG.debug("Skipping allocation because maxAMShare limit would " + > "be exceeded"); > } > return Resources.none(); > } > } > {code} > To fix this issue, we can unreserve the node if we can't allocate the AM > container on the node due to Max AM share limitation and the node is reserved > by the application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation
[ https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3655: Attachment: (was: YARN-3655.004.patch) > FairScheduler: potential livelock due to maxAMShare limitation and container > reservation > - > > Key: YARN-3655 > URL: https://issues.apache.org/jira/browse/YARN-3655 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-3655.000.patch, YARN-3655.001.patch, > YARN-3655.002.patch, YARN-3655.003.patch > > > FairScheduler: potential livelock due to maxAMShare limitation and container > reservation. > If a node is reserved by an application, all the other applications don't > have any chance to assign a new container on this node, unless the > application which reserves the node assigns a new container on this node or > releases the reserved container on this node. > The problem is if an application tries to call assignReservedContainer and > fail to get a new container due to maxAMShare limitation, it will block all > other applications to use the nodes it reserves. If all other running > applications can't release their AM containers due to being blocked by these > reserved containers. A livelock situation can happen. > The following is the code at FSAppAttempt#assignContainer which can cause > this potential livelock. > {code} > // Check the AM resource usage for the leaf queue > if (!isAmRunning() && !getUnmanagedAM()) { > List ask = appSchedulingInfo.getAllResourceRequests(); > if (ask.isEmpty() || !getQueue().canRunAppAM( > ask.get(0).getCapability())) { > if (LOG.isDebugEnabled()) { > LOG.debug("Skipping allocation because maxAMShare limit would " + > "be exceeded"); > } > return Resources.none(); > } > } > {code} > To fix this issue, we can unreserve the node if we can't allocate the AM > container on the node due to Max AM share limitation and the node is reserved > by the application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation
[ https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3655: Attachment: YARN-3655.004.patch > FairScheduler: potential livelock due to maxAMShare limitation and container > reservation > - > > Key: YARN-3655 > URL: https://issues.apache.org/jira/browse/YARN-3655 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-3655.000.patch, YARN-3655.001.patch, > YARN-3655.002.patch, YARN-3655.003.patch, YARN-3655.004.patch > > > FairScheduler: potential livelock due to maxAMShare limitation and container > reservation. > If a node is reserved by an application, all the other applications don't > have any chance to assign a new container on this node, unless the > application which reserves the node assigns a new container on this node or > releases the reserved container on this node. > The problem is if an application tries to call assignReservedContainer and > fail to get a new container due to maxAMShare limitation, it will block all > other applications to use the nodes it reserves. If all other running > applications can't release their AM containers due to being blocked by these > reserved containers. A livelock situation can happen. > The following is the code at FSAppAttempt#assignContainer which can cause > this potential livelock. > {code} > // Check the AM resource usage for the leaf queue > if (!isAmRunning() && !getUnmanagedAM()) { > List ask = appSchedulingInfo.getAllResourceRequests(); > if (ask.isEmpty() || !getQueue().canRunAppAM( > ask.get(0).getCapability())) { > if (LOG.isDebugEnabled()) { > LOG.debug("Skipping allocation because maxAMShare limit would " + > "be exceeded"); > } > return Resources.none(); > } > } > {code} > To fix this issue, we can unreserve the node if we can't allocate the AM > container on the node due to Max AM share limitation and the node is reserved > by the application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3706) Generalize native HBase writer for additional tables
[ https://issues.apache.org/jira/browse/YARN-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575435#comment-14575435 ] Sangjin Lee commented on YARN-3706: --- OK, I finally got around to making one pass. These are high level comments (the stack overflow issue notwithstanding). I generally agree with the approach taken here. This will make future implementation work on this a lot safer and easier with less duplication. (BaseTable.java) - l.102: how about requiring subclasses to provide the default table name along with the conf name and then provide the default implementation for getTableName()? For example, {code} private final String defaultTableName; protected BaseTable(String tableNameConfName, String defaultTableName) { this.tableNameConfName = tableNameConfName; this.defaultTableName = defaultTableName; } ... public TableName getTableName(Configuration hbaseConf) { return TableName.valueOf(hbaseConf.get(tableNameConfName, defaultTableName)); } {code} - l.55: I'm not sure if I understand the rationale of the setTableName() method; it sounds more like a static helper method, but then it's really a trivial helper method; should it even be here? (BufferedMutatorDelegator.java) - nit: I would remove all the trivial method comments (EntityTable.java) - l.92: should be static - l.111: just curious, is there a strong reason it has to be a singleton? I generally shun singletons (which also causes bit of a challenge with unit testst). (ColumnImpl.java) - It doesn't implement Column? Shouldn't it? - l.57: it should have TypedBufferedMutator as opposed to TypedBufferedMutator, right? (TimelineEntitySchemaConstants.java) - l.67: nit: username_splits -> USERNAME_SPLITS - findbugs will flag any public constants or methods that return the raw byte[]... See if you can live without them (or make them non-public) (TimelineWriterUtils.java) - l.72: do you think it'd be possible to do the separator encoding once and keep reusing it? It's probably not terribly expensive, but if it is in a critical path, its cost may add up > Generalize native HBase writer for additional tables > > > Key: YARN-3706 > URL: https://issues.apache.org/jira/browse/YARN-3706 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Joep Rottinghuis >Assignee: Joep Rottinghuis >Priority: Minor > Attachments: YARN-3706-YARN-2928.001.patch, > YARN-3726-YARN-2928.002.patch, YARN-3726-YARN-2928.003.patch, > YARN-3726-YARN-2928.004.patch, YARN-3726-YARN-2928.005.patch > > > When reviewing YARN-3411 we noticed that we could change the class hierarchy > a little in order to accommodate additional tables easily. > In order to get ready for benchmark testing we left the original layout in > place, as performance would not be impacted by the code hierarchy. > Here is a separate jira to address the hierarchy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3776) FairScheduler code refactoring to separate out the code paths for assigning a reserved container and a non-reserved container
zhihai xu created YARN-3776: --- Summary: FairScheduler code refactoring to separate out the code paths for assigning a reserved container and a non-reserved container Key: YARN-3776 URL: https://issues.apache.org/jira/browse/YARN-3776 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Affects Versions: 2.7.0 Reporter: zhihai xu Assignee: zhihai xu FairScheduler code refactoring toSeparate out the code paths for assigning a reserved container and a non-reserved container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3776) FairScheduler code refactoring to separate out the code paths for assigning a reserved container and a non-reserved container
[ https://issues.apache.org/jira/browse/YARN-3776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3776: Description: FairScheduler code refactoring to separate out the code paths for assigning a reserved container and a non-reserved container. (was: FairScheduler code refactoring toSeparate out the code paths for assigning a reserved container and a non-reserved container.) > FairScheduler code refactoring to separate out the code paths for assigning a > reserved container and a non-reserved container > - > > Key: YARN-3776 > URL: https://issues.apache.org/jira/browse/YARN-3776 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu > > FairScheduler code refactoring to separate out the code paths for assigning > a reserved container and a non-reserved container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3776) FairScheduler code refactoring to separate out the code paths for assigning a reserved container and a non-reserved container
[ https://issues.apache.org/jira/browse/YARN-3776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3776: Description: FairScheduler code refactoring, as discussed at YARN-3655, Separate out the code paths for assigning a reserved container and a non-reserved container. (was: FairScheduler code refactoring to separate out the code paths for assigning a reserved container and a non-reserved container.) > FairScheduler code refactoring to separate out the code paths for assigning a > reserved container and a non-reserved container > - > > Key: YARN-3776 > URL: https://issues.apache.org/jira/browse/YARN-3776 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu > > FairScheduler code refactoring, as discussed at YARN-3655, Separate out the > code paths for assigning a reserved container and a non-reserved container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3748) Cleanup Findbugs volatile warnings
[ https://issues.apache.org/jira/browse/YARN-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Liptak updated YARN-3748: --- Attachment: YARN-3748.5.patch > Cleanup Findbugs volatile warnings > -- > > Key: YARN-3748 > URL: https://issues.apache.org/jira/browse/YARN-3748 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Gabor Liptak >Priority: Minor > Attachments: YARN-3748.1.patch, YARN-3748.2.patch, YARN-3748.3.patch, > YARN-3748.5.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3777) Move all reservation-related tests from TestFairScheduler to TestFairSchedulerReservations.
zhihai xu created YARN-3777: --- Summary: Move all reservation-related tests from TestFairScheduler to TestFairSchedulerReservations. Key: YARN-3777 URL: https://issues.apache.org/jira/browse/YARN-3777 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler, test Affects Versions: 2.7.0 Reporter: zhihai xu Assignee: zhihai xu Priority: Minor As discussed at YARN-3655, Move all reservation-related tests from TestFairScheduler to TestFairSchedulerReservations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation
[ https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575472#comment-14575472 ] zhihai xu commented on YARN-3655: - [~kasha], thanks for the thorough review, I uploaded a new patch YARN-3655.004.patch which addressed your first comment. And I created two follow up JIRAs YARN-3776 and YARN-3777 which addressed your second and third comments. Please review it. Many thanks. > FairScheduler: potential livelock due to maxAMShare limitation and container > reservation > - > > Key: YARN-3655 > URL: https://issues.apache.org/jira/browse/YARN-3655 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-3655.000.patch, YARN-3655.001.patch, > YARN-3655.002.patch, YARN-3655.003.patch, YARN-3655.004.patch > > > FairScheduler: potential livelock due to maxAMShare limitation and container > reservation. > If a node is reserved by an application, all the other applications don't > have any chance to assign a new container on this node, unless the > application which reserves the node assigns a new container on this node or > releases the reserved container on this node. > The problem is if an application tries to call assignReservedContainer and > fail to get a new container due to maxAMShare limitation, it will block all > other applications to use the nodes it reserves. If all other running > applications can't release their AM containers due to being blocked by these > reserved containers. A livelock situation can happen. > The following is the code at FSAppAttempt#assignContainer which can cause > this potential livelock. > {code} > // Check the AM resource usage for the leaf queue > if (!isAmRunning() && !getUnmanagedAM()) { > List ask = appSchedulingInfo.getAllResourceRequests(); > if (ask.isEmpty() || !getQueue().canRunAppAM( > ask.get(0).getCapability())) { > if (LOG.isDebugEnabled()) { > LOG.debug("Skipping allocation because maxAMShare limit would " + > "be exceeded"); > } > return Resources.none(); > } > } > {code} > To fix this issue, we can unreserve the node if we can't allocate the AM > container on the node due to Max AM share limitation and the node is reserved > by the application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3777) Move all reservation-related tests from TestFairScheduler to TestFairSchedulerReservations.
[ https://issues.apache.org/jira/browse/YARN-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3777: Component/s: (was: fairscheduler) > Move all reservation-related tests from TestFairScheduler to > TestFairSchedulerReservations. > --- > > Key: YARN-3777 > URL: https://issues.apache.org/jira/browse/YARN-3777 > Project: Hadoop YARN > Issue Type: Improvement > Components: test >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Minor > > As discussed at YARN-3655, Move all reservation-related tests from > TestFairScheduler to TestFairSchedulerReservations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-574) PrivateLocalizer does not support parallel resource download via ContainerLocalizer
[ https://issues.apache.org/jira/browse/YARN-574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated YARN-574: Attachment: YARN-574.2.patch Fixed syntax error. > PrivateLocalizer does not support parallel resource download via > ContainerLocalizer > --- > > Key: YARN-574 > URL: https://issues.apache.org/jira/browse/YARN-574 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.6.0, 2.8.0, 2.7.1 >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: YARN-574.1.patch, YARN-574.2.patch > > > At present private resources will be downloaded in parallel only if multiple > containers request the same resource. However otherwise it will be serial. > The protocol between PrivateLocalizer and ContainerLocalizer supports > multiple downloads however it is not used and only one resource is sent for > downloading at a time. > I think we can increase / assure parallelism (even for single container > requesting resource) for private/application resources by making multiple > downloads per ContainerLocalizer. > Total Parallelism before > = number of threads allotted for PublicLocalizer [public resource] + number > of containers[private and application resource] > Total Parallelism after > = number of threads allotted for PublicLocalizer [public resource] + number > of containers * max downloads per container [private and application resource] -- This message was sent by Atlassian JIRA (v6.3.4#6332)