[jira] [Commented] (YARN-3375) NodeHealthScriptRunner.shouldRun() check is performing 3 times for starting NodeHealthScriptRunner
[ https://issues.apache.org/jira/browse/YARN-3375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370844#comment-14370844 ] Hadoop QA commented on YARN-3375: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705561/YARN-3375.patch against trunk revision 4e886eb. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7038//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7038//console This message is automatically generated. > NodeHealthScriptRunner.shouldRun() check is performing 3 times for starting > NodeHealthScriptRunner > -- > > Key: YARN-3375 > URL: https://issues.apache.org/jira/browse/YARN-3375 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Devaraj K >Assignee: Devaraj K > Attachments: YARN-3375.patch > > > 1. NodeHealthScriptRunner.shouldRun() check is happening 3 times for starting > the NodeHealthScriptRunner. > {code:title=NodeManager.java|borderStyle=solid} > if(!NodeHealthScriptRunner.shouldRun(nodeHealthScript)) { > LOG.info("Abey khali"); > return null; > } > {code} > {code:title=NodeHealthCheckerService.java|borderStyle=solid} > if (NodeHealthScriptRunner.shouldRun( > conf.get(YarnConfiguration.NM_HEALTH_CHECK_SCRIPT_PATH))) { > addService(nodeHealthScriptRunner); > } > {code} > {code:title=NodeHealthScriptRunner.java|borderStyle=solid} > if (!shouldRun(nodeHealthScript)) { > LOG.info("Not starting node health monitor"); > return; > } > {code} > 2. If we don't configure node health script or configured health script > doesn't execute permission, NM logs with the below message. > {code:xml} > 2015-03-19 19:55:45,713 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeManager: Abey khali > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated YARN-3021: Attachment: YARN-3021.005.patch > YARN's delegation-token handling disallows certain trust setups to operate > properly over DistCp > --- > > Key: YARN-3021 > URL: https://issues.apache.org/jira/browse/YARN-3021 > Project: Hadoop YARN > Issue Type: Bug > Components: security >Affects Versions: 2.3.0 >Reporter: Harsh J >Assignee: Yongjun Zhang > Attachments: YARN-3021.001.patch, YARN-3021.002.patch, > YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, YARN-3021.patch > > > Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, > and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN > clusters. > Now if one logs in with a COMMON credential, and runs a job on A's YARN that > needs to access B's HDFS (such as a DistCp), the operation fails in the RM, > as it attempts a renewDelegationToken(…) synchronously during application > submission (to validate the managed token before it adds it to a scheduler > for automatic renewal). The call obviously fails cause B realm will not trust > A's credentials (here, the RM's principal is the renewer). > In the 1.x JobTracker the same call is present, but it is done asynchronously > and once the renewal attempt failed we simply ceased to schedule any further > attempts of renewals, rather than fail the job immediately. > We should change the logic such that we attempt the renewal but go easy on > the failure and skip the scheduling alone, rather than bubble back an error > to the client, failing the app submission. This way the old behaviour is > retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370811#comment-14370811 ] Yongjun Zhang commented on YARN-3021: - HI Jian, Thanks a lot for your detailed review and comments! I'm attaching rev5 to address all of them. * Replaced {{new Text("HDFS_DELEGATION_TOKEN")}} with predefined constant * About "does conf.getStrings strip off the leading or ending empty strings? if not, we may strip those off.", I followed {{JobSubmitter#populateTokenCache}}. I think it makes sense for user to not to put leading and ending empty strings. * Removed NON_RENEWER. But still use empty renewer string instead of null. * I did test rev 4 earlier, and I also tested rev5 with real clusters. Thanks for taking look at the new rev. > YARN's delegation-token handling disallows certain trust setups to operate > properly over DistCp > --- > > Key: YARN-3021 > URL: https://issues.apache.org/jira/browse/YARN-3021 > Project: Hadoop YARN > Issue Type: Bug > Components: security >Affects Versions: 2.3.0 >Reporter: Harsh J >Assignee: Yongjun Zhang > Attachments: YARN-3021.001.patch, YARN-3021.002.patch, > YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.patch > > > Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, > and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN > clusters. > Now if one logs in with a COMMON credential, and runs a job on A's YARN that > needs to access B's HDFS (such as a DistCp), the operation fails in the RM, > as it attempts a renewDelegationToken(…) synchronously during application > submission (to validate the managed token before it adds it to a scheduler > for automatic renewal). The call obviously fails cause B realm will not trust > A's credentials (here, the RM's principal is the renewer). > In the 1.x JobTracker the same call is present, but it is done asynchronously > and once the renewal attempt failed we simply ceased to schedule any further > attempts of renewals, rather than fail the job immediately. > We should change the logic such that we attempt the renewal but go easy on > the failure and skip the scheduling alone, rather than bubble back an error > to the client, failing the app submission. This way the old behaviour is > retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3378) a load test client that can replay a volume of history files
[ https://issues.apache.org/jira/browse/YARN-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370784#comment-14370784 ] Li Lu commented on YARN-3378: - Hi [~Naganarasimha], YARN-2556 was mainly opened for measuring the existing ATS v1's performance. In this JIRA our main focus we'd like to build a client to generate reasonable load to guide v2 timeline service's storage design. From our discussion about Phoenix/hbase I believe this is a quite necessary step for us to understand our v2 design. These two JIRAs also work on two different branches. Just like timeline service v1 and v2 may co-exist (so do all related jiras) in YARN-2928 branch, I don't see any reason to prevent both JIRAs exist. If you have any special concerns about this feel free to let us know. Thanks! > a load test client that can replay a volume of history files > > > Key: YARN-3378 > URL: https://issues.apache.org/jira/browse/YARN-3378 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Li Lu > > It might be good to create a load test client that can replay a large volume > of history files into the timeline service. One can envision running such a > load test client as a mapreduce job and generate a fair amount of load. It > would be useful to spot check correctness, and more importantly observe > performance characteristic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3379) Missing data in localityTable and ResourceRequests table in RM WebUI
[ https://issues.apache.org/jira/browse/YARN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370777#comment-14370777 ] Hudson commented on YARN-3379: -- FAILURE: Integrated in Hadoop-trunk-Commit #7379 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7379/]) YARN-3379. Fixed missing data in localityTable and ResourceRequests table in RM WebUI. Contributed by Xuan Gong (jianhe: rev 4e886eb9cbd2dcb128bbfd17309c734083093a4c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppAttemptBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppAttemptPage.java * hadoop-yarn-project/CHANGES.txt > Missing data in localityTable and ResourceRequests table in RM WebUI > > > Key: YARN-3379 > URL: https://issues.apache.org/jira/browse/YARN-3379 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, webapp, yarn >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-3379.1.patch, YARN-3379.2.patch, > YARN-3379.3.1.patch, YARN-3379.3.patch > > > After YARN-1809, we have common appBlock/attemptBlock for both RM WebUI and > AHS WebUI. > But there are some information, such as containerLocalityStatistics, > ResourceRequests, are only useful for the Running Applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3378) a load test client that can replay a volume of history files
[ https://issues.apache.org/jira/browse/YARN-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370772#comment-14370772 ] Naganarasimha G R commented on YARN-3378: - Hi [~sjlee0] & [~gtCarrera], Is this jira planning to do something differently from YARN-2556, Already patch is available for it there and if the current scope of the jira is not targeting to do anything differently than the older one, then we can close this jira and continue to do with YARN-2556, else may be we can close the older jira and leverage the patch here or continue with the new changes... > a load test client that can replay a volume of history files > > > Key: YARN-3378 > URL: https://issues.apache.org/jira/browse/YARN-3378 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Li Lu > > It might be good to create a load test client that can replay a large volume > of history files into the timeline service. One can envision running such a > load test client as a mapreduce job and generate a fair amount of load. It > would be useful to spot check correctness, and more importantly observe > performance characteristic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3126) FairScheduler: queue's usedResource is always more than the maxResource limit
[ https://issues.apache.org/jira/browse/YARN-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370767#comment-14370767 ] Hadoop QA commented on YARN-3126: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697432/resourcelimit-02.patch against trunk revision e37ca22. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/7035//artifact/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7035//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7035//console This message is automatically generated. > FairScheduler: queue's usedResource is always more than the maxResource limit > - > > Key: YARN-3126 > URL: https://issues.apache.org/jira/browse/YARN-3126 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.3.0 > Environment: hadoop2.3.0. fair scheduler. spark 1.1.0. >Reporter: Xia Hu > Labels: assignContainer, fairscheduler, resources > Fix For: trunk-win > > Attachments: resourcelimit-02.patch, resourcelimit.patch > > > When submitting spark application(both spark-on-yarn-cluster and > spark-on-yarn-cleint model), the queue's usedResources assigned by > fairscheduler always can be more than the queue's maxResources limit. > And by reading codes of fairscheduler, I suppose this issue happened because > of ignore to check the request resources when assign Container. > Here is the detail: > 1. choose a queue. In this process, it will check if queue's usedResource is > bigger than its max, with assignContainerPreCheck. > 2. then choose a app in the certain queue. > 3. then choose a container. And here is the question, there is no check > whether this container would make the queue sources over its max limit. If a > queue's usedResource is 13G, the maxResource limit is 16G, then a container > which asking for 4G resources may be assigned successful. > This problem will always happen in spark application, cause we can ask for > different container resources in different applications. > By the way, I have already use the patch from YARN-2083. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3382) Some of UserMetricsInfo metrics are incorrectly set to root queue metrics
[ https://issues.apache.org/jira/browse/YARN-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370768#comment-14370768 ] Hadoop QA commented on YARN-3382: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705811/YARN-3382.patch against trunk revision e37ca22. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7036//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7036//console This message is automatically generated. > Some of UserMetricsInfo metrics are incorrectly set to root queue metrics > - > > Key: YARN-3382 > URL: https://issues.apache.org/jira/browse/YARN-3382 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: Rohit Agarwal >Assignee: Rohit Agarwal > Attachments: YARN-3382.patch > > > {{appsCompleted}}, {{appsPending}}, {{appsRunning}} etc. in > {{UserMetricsInfo}} are incorrectly set to the root queue's value instead of > the user's value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3368) Improve YARN web UI
[ https://issues.apache.org/jira/browse/YARN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370738#comment-14370738 ] Jian He commented on YARN-3368: --- Yes, we should refine the current web service and also expose the stuff that is missing. The intention is to build a nicer UI using some front-end tools like Bootstrap. yes, it can be on branch. > Improve YARN web UI > --- > > Key: YARN-3368 > URL: https://issues.apache.org/jira/browse/YARN-3368 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He > > The goal is to improve YARN UI for better usability. > We may take advantage of some existing front-end frameworks to build a > fancier, easier-to-use UI. > The old UI continue to exist until we feel it's ready to flip to the new UI. > This serves as an umbrella jira to track the tasks. we can do this in a > branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3381) A typographical error in "InvalidStateTransitonException"
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370727#comment-14370727 ] Hadoop QA commented on YARN-3381: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705819/YARN-3381.patch against trunk revision e37ca22. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7037//console This message is automatically generated. > A typographical error in "InvalidStateTransitonException" > - > > Key: YARN-3381 > URL: https://issues.apache.org/jira/browse/YARN-3381 > Project: Hadoop YARN > Issue Type: Improvement > Components: api >Affects Versions: 2.6.0 >Reporter: John Wang >Assignee: Brahma Reddy Battula > Attachments: YARN-3381.patch > > > Appears that "InvalidStateTransitonException" should be > "InvalidStateTransitionException". Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die
[ https://issues.apache.org/jira/browse/YARN-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370661#comment-14370661 ] Rohith commented on YARN-3369: -- Thanks [~brahmareddy] for providing patch. I think all the caller of getResourceRequest() needs be verified for null check. Below code also should check for null else NPE can be thrown. {code} public synchronized Resource getResource(Priority priority) { ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY); return request.getCapability(); } {code} > Missing NullPointer check in AppSchedulingInfo causes RM to die > > > Key: YARN-3369 > URL: https://issues.apache.org/jira/browse/YARN-3369 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Giovanni Matteo Fumarola >Assignee: Brahma Reddy Battula >Priority: Blocker > Attachments: YARN-3369.2.patch, YARN-3369.patch > > > In AppSchedulingInfo.java the method checkForDeactivation() has these 2 > consecutive lines: > {code} > ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY); > if (request.getNumContainers() > 0) { > {code} > the first line calls getResourceRequest and it can return null. > {code} > synchronized public ResourceRequest getResourceRequest( > Priority priority, String resourceName) { > Map nodeRequests = requests.get(priority); > return (nodeRequests == null) ? {color:red} null : > nodeRequests.get(resourceName); > } > {code} > The second line dereferences the pointer directly without a check. > If the pointer is null, the RM dies. > {quote}2015-03-17 14:14:04,757 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_UPDATE to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739) > at java.lang.Thread.run(Thread.java:722) > {color:red} *2015-03-17 14:14:04,758 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, > bbye..*{color} {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3381) A typographical error in "InvalidStateTransitonException"
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370651#comment-14370651 ] Brahma Reddy Battula commented on YARN-3381: Attached patch..Kindly Review!!! > A typographical error in "InvalidStateTransitonException" > - > > Key: YARN-3381 > URL: https://issues.apache.org/jira/browse/YARN-3381 > Project: Hadoop YARN > Issue Type: Improvement > Components: api >Affects Versions: 2.6.0 >Reporter: John Wang >Assignee: Brahma Reddy Battula > Attachments: YARN-3381.patch > > > Appears that "InvalidStateTransitonException" should be > "InvalidStateTransitionException". Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370649#comment-14370649 ] Devaraj K commented on YARN-3225: - ok sure, I will update the patch soon, Thanks. > New parameter or CLI for decommissioning node gracefully in RMAdmin CLI > --- > > Key: YARN-3225 > URL: https://issues.apache.org/jira/browse/YARN-3225 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Devaraj K > Attachments: YARN-3225.patch, YARN-914.patch > > > New CLI (or existing CLI with parameters) should put each node on > decommission list to decommissioning status and track timeout to terminate > the nodes that haven't get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3381) A typographical error in "InvalidStateTransitonException"
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-3381: --- Attachment: YARN-3381.patch > A typographical error in "InvalidStateTransitonException" > - > > Key: YARN-3381 > URL: https://issues.apache.org/jira/browse/YARN-3381 > Project: Hadoop YARN > Issue Type: Improvement > Components: api >Affects Versions: 2.6.0 >Reporter: John Wang >Assignee: Brahma Reddy Battula > Attachments: YARN-3381.patch > > > Appears that "InvalidStateTransitonException" should be > "InvalidStateTransitionException". Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3381) A typographical error in "InvalidStateTransitonException"
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-3381: --- Target Version/s: 3.0.0 (was: 2.8.0) Hadoop Flags: Incompatible change Marking this as an incompatible change. > A typographical error in "InvalidStateTransitonException" > - > > Key: YARN-3381 > URL: https://issues.apache.org/jira/browse/YARN-3381 > Project: Hadoop YARN > Issue Type: Improvement > Components: api >Affects Versions: 2.6.0 >Reporter: John Wang >Assignee: Brahma Reddy Battula > > Appears that "InvalidStateTransitonException" should be > "InvalidStateTransitionException". Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3111) Fix ratio problem on FairScheduler page
[ https://issues.apache.org/jira/browse/YARN-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370625#comment-14370625 ] Peng Zhang commented on YARN-3111: -- [~ashwinshankar77] Thanks, I got it. I'll update patch to implement 1 & 3 in your advices. > Fix ratio problem on FairScheduler page > --- > > Key: YARN-3111 > URL: https://issues.apache.org/jira/browse/YARN-3111 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Peng Zhang >Assignee: Peng Zhang >Priority: Minor > Attachments: YARN-3111.1.patch, YARN-3111.png, parenttooltip.png > > > Found 3 problems on FairScheduler page: > 1. Only compute memory for ratio even when queue schedulingPolicy is DRF. > 2. When min resources is configured larger than real resources, the steady > fair share ratio is so long that it is out the page. > 3. When cluster resources is 0(no nodemanager start), ratio is displayed as > "NaN% used" > Attached image shows the snapshot of above problems. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3382) Some of UserMetricsInfo metrics are incorrectly set to root queue metrics
[ https://issues.apache.org/jira/browse/YARN-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohit Agarwal updated YARN-3382: Attachment: YARN-3382.patch Patch attached. > Some of UserMetricsInfo metrics are incorrectly set to root queue metrics > - > > Key: YARN-3382 > URL: https://issues.apache.org/jira/browse/YARN-3382 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: Rohit Agarwal >Assignee: Rohit Agarwal > Attachments: YARN-3382.patch > > > {{appsCompleted}}, {{appsPending}}, {{appsRunning}} etc. in > {{UserMetricsInfo}} are incorrectly set to the root queue's value instead of > the user's value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3382) Some of UserMetricsInfo metrics are incorrectly set to root queue metrics
Rohit Agarwal created YARN-3382: --- Summary: Some of UserMetricsInfo metrics are incorrectly set to root queue metrics Key: YARN-3382 URL: https://issues.apache.org/jira/browse/YARN-3382 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Rohit Agarwal Assignee: Rohit Agarwal {{appsCompleted}}, {{appsPending}}, {{appsRunning}} etc. in {{UserMetricsInfo}} are incorrectly set to the root queue's value instead of the user's value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3381) A typographical error in "InvalidStateTransitonException"
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula reassigned YARN-3381: -- Assignee: Brahma Reddy Battula > A typographical error in "InvalidStateTransitonException" > - > > Key: YARN-3381 > URL: https://issues.apache.org/jira/browse/YARN-3381 > Project: Hadoop YARN > Issue Type: Improvement > Components: api >Affects Versions: 2.6.0 >Reporter: John Wang >Assignee: Brahma Reddy Battula > > Appears that "InvalidStateTransitonException" should be > "InvalidStateTransitionException". Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3381) A typographical error in "InvalidStateTransitonException"
John Wang created YARN-3381: --- Summary: A typographical error in "InvalidStateTransitonException" Key: YARN-3381 URL: https://issues.apache.org/jira/browse/YARN-3381 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: John Wang Appears that "InvalidStateTransitonException" should be "InvalidStateTransitionException". Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die
[ https://issues.apache.org/jira/browse/YARN-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370573#comment-14370573 ] Brahma Reddy Battula commented on YARN-3369: Thanks a lot!!! > Missing NullPointer check in AppSchedulingInfo causes RM to die > > > Key: YARN-3369 > URL: https://issues.apache.org/jira/browse/YARN-3369 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Giovanni Matteo Fumarola >Assignee: Brahma Reddy Battula >Priority: Blocker > Attachments: YARN-3369.2.patch, YARN-3369.patch > > > In AppSchedulingInfo.java the method checkForDeactivation() has these 2 > consecutive lines: > {code} > ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY); > if (request.getNumContainers() > 0) { > {code} > the first line calls getResourceRequest and it can return null. > {code} > synchronized public ResourceRequest getResourceRequest( > Priority priority, String resourceName) { > Map nodeRequests = requests.get(priority); > return (nodeRequests == null) ? {color:red} null : > nodeRequests.get(resourceName); > } > {code} > The second line dereferences the pointer directly without a check. > If the pointer is null, the RM dies. > {quote}2015-03-17 14:14:04,757 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_UPDATE to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739) > at java.lang.Thread.run(Thread.java:722) > {color:red} *2015-03-17 14:14:04,758 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, > bbye..*{color} {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3345) Add non-exclusive node label RMAdmin CLI/API
[ https://issues.apache.org/jira/browse/YARN-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370549#comment-14370549 ] Hadoop QA commented on YARN-3345: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705763/YARN-3345.5.patch against trunk revision 91baca1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 9 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/7034//artifact/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.api.TestPBImplRecords Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7034//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7034//console This message is automatically generated. > Add non-exclusive node label RMAdmin CLI/API > > > Key: YARN-3345 > URL: https://issues.apache.org/jira/browse/YARN-3345 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-3345.1.patch, YARN-3345.2.patch, YARN-3345.3.patch, > YARN-3345.4.patch, YARN-3345.5.patch > > > As described in YARN-3214 (see design doc attached to that JIRA), we need add > non-exclusive node label RMAdmin API and CLI implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3356) Capacity Scheduler FiCaSchedulerApp should use ResourceUsage to track used-resources-by-label.
[ https://issues.apache.org/jira/browse/YARN-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370503#comment-14370503 ] Hadoop QA commented on YARN-3356: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705759/YARN-3356.5.patch against trunk revision 91baca1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7033//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7033//console This message is automatically generated. > Capacity Scheduler FiCaSchedulerApp should use ResourceUsage to track > used-resources-by-label. > -- > > Key: YARN-3356 > URL: https://issues.apache.org/jira/browse/YARN-3356 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-3356.1.patch, YARN-3356.2.patch, YARN-3356.3.patch, > YARN-3356.4.patch, YARN-3356.5.patch > > > Simliar to YARN-3099, Capacity Scheduler's LeafQueue.User/FiCaSchedulerApp > should use ResourceRequest to track resource-usage/pending by label for > better resource tracking and preemption. > And also, when application's pending resource changed (container allocated, > app completed, moved, etc.), we need update ResourceUsage of queue > hierarchies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die
[ https://issues.apache.org/jira/browse/YARN-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370480#comment-14370480 ] Hadoop QA commented on YARN-3369: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705751/YARN-3369.2.patch against trunk revision 91baca1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7032//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7032//console This message is automatically generated. > Missing NullPointer check in AppSchedulingInfo causes RM to die > > > Key: YARN-3369 > URL: https://issues.apache.org/jira/browse/YARN-3369 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Giovanni Matteo Fumarola >Assignee: Brahma Reddy Battula >Priority: Blocker > Attachments: YARN-3369.2.patch, YARN-3369.patch > > > In AppSchedulingInfo.java the method checkForDeactivation() has these 2 > consecutive lines: > {code} > ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY); > if (request.getNumContainers() > 0) { > {code} > the first line calls getResourceRequest and it can return null. > {code} > synchronized public ResourceRequest getResourceRequest( > Priority priority, String resourceName) { > Map nodeRequests = requests.get(priority); > return (nodeRequests == null) ? {color:red} null : > nodeRequests.get(resourceName); > } > {code} > The second line dereferences the pointer directly without a check. > If the pointer is null, the RM dies. > {quote}2015-03-17 14:14:04,757 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_UPDATE to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedu
[jira] [Commented] (YARN-2828) Enable auto refresh of web pages (using http parameter)
[ https://issues.apache.org/jira/browse/YARN-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370467#comment-14370467 ] Hadoop QA commented on YARN-2828: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705729/YARN-2828.006.patch against trunk revision 91baca1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7029//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7029//console This message is automatically generated. > Enable auto refresh of web pages (using http parameter) > --- > > Key: YARN-2828 > URL: https://issues.apache.org/jira/browse/YARN-2828 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Tim Robertson >Assignee: Vijay Bhat >Priority: Minor > Attachments: YARN-2828.001.patch, YARN-2828.002.patch, > YARN-2828.003.patch, YARN-2828.004.patch, YARN-2828.005.patch, > YARN-2828.006.patch > > > The MR1 Job Tracker had a useful HTTP parameter of e.g. "&refresh=3" that > could be appended to URLs which enabled a page reload. This was very useful > when developing mapreduce jobs, especially to watch counters changing. This > is lost in the the Yarn interface. > Could be implemented as a page element (e.g. drop down or so), but I'd > recommend that the page not be more cluttered, and simply bring back the > optional "refresh" HTTP param. It worked really nicely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3379) Missing data in localityTable and ResourceRequests table in RM WebUI
[ https://issues.apache.org/jira/browse/YARN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370449#comment-14370449 ] Hadoop QA commented on YARN-3379: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705739/YARN-3379.3.1.patch against trunk revision 91baca1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7031//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7031//console This message is automatically generated. > Missing data in localityTable and ResourceRequests table in RM WebUI > > > Key: YARN-3379 > URL: https://issues.apache.org/jira/browse/YARN-3379 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, webapp, yarn >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-3379.1.patch, YARN-3379.2.patch, > YARN-3379.3.1.patch, YARN-3379.3.patch > > > After YARN-1809, we have common appBlock/attemptBlock for both RM WebUI and > AHS WebUI. > But there are some information, such as containerLocalityStatistics, > ResourceRequests, are only useful for the Running Applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3212) RMNode State Transition Update with DECOMMISSIONING state
[ https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370446#comment-14370446 ] Junping Du commented on YARN-3212: -- Thanks [~jlowe] and [~mingma] for review and comments! bq. Do we want to handle the DECOMMISSIONING_WITH_TIMEOUT event when the node is already in the DECOMMISSIONING state? Curious if we might get a duplicate decommission event somehow and want to ignore it or if we know for sure this cannot happen in practice. This case is possibly happen when user submit another decommssion CLI while the node still in decommissioning. I think it just ignore it now as nothing need to update if node already in decommissioning. We will not have timeout tracking and update in RM side (may only pass to AM for notification) according to discussions in YARN-914 and YARN-3225. bq. Do we want to consider DECOMMISSIONING nodes as not active? There are containers actively running on them, and in that sense they are participating in the cluster (and contributing to the overall cluster resource). I think they should still be considered active, but I could be persuaded otherwise. I think we discussed this on YARN-914 before. The conclusion so far is keeping node in decommissioning as active (or may broken some services - I am not 100% sure on this) and make node resource equals to resource of assigned containers at anytime. Do we need to change this conclusion? bq. In the reconnected node transition there is a switch statement that will debug-log an unexpected state message when in fact the DECOMMISSIONING state is expected for this transition. That's a good point. Will fix it in v3 patch. Thanks! bq. Curious why the null check is needed in handleNMContainerStatuses? What about this change allows the container statuses to be null? I think so. Look like the RMNodeReconnectEvent comes from RegisterNodeManagerRequest and containerStatuses field (getting from proto) could be nullable. So there is an NPE bug here and I found through unit test where we created event like "new RMNodeReconnectEvent(node.getNodeID(), node, null, null)" even before this patch. Am I missing something here? bq. It would be nice to see some refactoring of the common code between StatusUpdateWhenHealthyTransition, StatusUpdateWhenUnhealthyTransition, and StatusUpdateWhenDecommissioningTransition. Yes. I should do earlier. Will do it in v3 patch. bq. These change seems unnecessary? These are still necessary because we changed state transition from one final state to multiple final states (like below example) and interface only accept EnumSet. {code} public static class ReconnectNodeTransition implements - SingleArcTransition { + MultipleArcTransition { {code} bq. Do we need to support the scenario where NM becomes dead when it is being decommissioned? Say decommission timeout is 30 minutes larger than the NM liveness timeout. The node drops out of the cluster for some time and rejoin later all within the decommission time out. Will Yarn show the status as just dead node, or {dead, decommissioning} Now, the node can be LOST (dead) when it is in decommissioning. It is not different with running node get lost but cannot join back except user put it back through recommission. Make sense? bq. Seems useful for admins to know about it. If we need that, we can consider two types of NodeState. One is liveness state, one is admin state. Then you will have different combinations. We can add necessary log to let admin know about it. Are you talking about scenario like this: admin put some node in decommissioning with a timeout, some upgrade script doing OS upgrade and finish with a restart in a random time which could be shorter than decommissioning time. Admin want these nodes can join back automatically. But how YARN know about Admin want these nodes back or not after a restart? May be a explicitly set back to white list (recommission) still necessary. > RMNode State Transition Update with DECOMMISSIONING state > - > > Key: YARN-3212 > URL: https://issues.apache.org/jira/browse/YARN-3212 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Junping Du >Assignee: Junping Du > Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch, > YARN-3212-v2.patch > > > As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and > can transition from “running” state triggered by a new event - > “decommissioning”. > This new state can be transit to state of “decommissioned” when > Resource_Update if no running apps on this NM or NM reconnect after restart. > Or it received DECOMMISSIONED event (after timeout from CLI). > In addition, it can back to “running” if user decides to cancel prev
[jira] [Created] (YARN-3380) Add protobuf compatibility checker to jenkins test runs
Li Lu created YARN-3380: --- Summary: Add protobuf compatibility checker to jenkins test runs Key: YARN-3380 URL: https://issues.apache.org/jira/browse/YARN-3380 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu We may want to run the protobuf compatibility checker for each incoming patch, to prevent incompatible changes for rolling upgrades. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370440#comment-14370440 ] Jian He commented on YARN-3021: --- thanks Yongjun, some comments on the patch ! - DelegationTokenRenewer: the skipTokenRenewal check should be done under the existing code {{if (token.getKind().equals(new Text("HDFS_DELEGATION_TOKEN")))}} as below. And I think only doing this check is enough, we don't need checks in other places. {code} if (token.isManaged()) { if (token.getKind().equals(new Text("HDFS_DELEGATION_TOKEN"))) { LOG.info(applicationId + " found existing hdfs token " + token); hasHdfsToken = true; Text renewer = ((Token) token). decodeIdentifier().getRenewer(); if ((renewer != null && renewer.toString() .equals(Token.NON_RENEWER))) { continue; } } {code} - does conf.getStrings strip off the leading or ending empty strings? if not, we may strip those off. {code} String [] nns = conf.getStrings(MRJobConfig.JOB_NAMENODES_TOKEN_RENEWAL_EXCLUDE); {code} - given that this is a work-around fix, maybe not adding the NON_RENEWER publicly in common ? just check for null ? - Did you test the patch on real cluster ? > YARN's delegation-token handling disallows certain trust setups to operate > properly over DistCp > --- > > Key: YARN-3021 > URL: https://issues.apache.org/jira/browse/YARN-3021 > Project: Hadoop YARN > Issue Type: Bug > Components: security >Affects Versions: 2.3.0 >Reporter: Harsh J >Assignee: Yongjun Zhang > Attachments: YARN-3021.001.patch, YARN-3021.002.patch, > YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.patch > > > Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, > and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN > clusters. > Now if one logs in with a COMMON credential, and runs a job on A's YARN that > needs to access B's HDFS (such as a DistCp), the operation fails in the RM, > as it attempts a renewDelegationToken(…) synchronously during application > submission (to validate the managed token before it adds it to a scheduler > for automatic renewal). The call obviously fails cause B realm will not trust > A's credentials (here, the RM's principal is the renewer). > In the 1.x JobTracker the same call is present, but it is done asynchronously > and once the renewal attempt failed we simply ceased to schedule any further > attempts of renewals, rather than fail the job immediately. > We should change the logic such that we attempt the renewal but go easy on > the failure and skip the scheduling alone, rather than bubble back an error > to the client, failing the app submission. This way the old behaviour is > retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370440#comment-14370440 ] Jian He edited comment on YARN-3021 at 3/20/15 12:29 AM: - thanks Yongjun, some comments on the patch : - DelegationTokenRenewer: the skipTokenRenewal check should be done under the existing code {{if (token.getKind().equals(new Text("HDFS_DELEGATION_TOKEN")))}} as below. And I think only doing this check is enough, we don't need checks in other places. {code} if (token.isManaged()) { if (token.getKind().equals(new Text("HDFS_DELEGATION_TOKEN"))) { LOG.info(applicationId + " found existing hdfs token " + token); hasHdfsToken = true; Text renewer = ((Token) token). decodeIdentifier().getRenewer(); if ((renewer != null && renewer.toString() .equals(Token.NON_RENEWER))) { continue; } } {code} - does conf.getStrings strip off the leading or ending empty strings? if not, we may strip those off. {code} String [] nns = conf.getStrings(MRJobConfig.JOB_NAMENODES_TOKEN_RENEWAL_EXCLUDE); {code} - given that this is a work-around fix, maybe not adding the NON_RENEWER publicly in common ? just check for null ? - Did you test the patch on real cluster ? was (Author: jianhe): thanks Yongjun, some comments on the patch ! - DelegationTokenRenewer: the skipTokenRenewal check should be done under the existing code {{if (token.getKind().equals(new Text("HDFS_DELEGATION_TOKEN")))}} as below. And I think only doing this check is enough, we don't need checks in other places. {code} if (token.isManaged()) { if (token.getKind().equals(new Text("HDFS_DELEGATION_TOKEN"))) { LOG.info(applicationId + " found existing hdfs token " + token); hasHdfsToken = true; Text renewer = ((Token) token). decodeIdentifier().getRenewer(); if ((renewer != null && renewer.toString() .equals(Token.NON_RENEWER))) { continue; } } {code} - does conf.getStrings strip off the leading or ending empty strings? if not, we may strip those off. {code} String [] nns = conf.getStrings(MRJobConfig.JOB_NAMENODES_TOKEN_RENEWAL_EXCLUDE); {code} - given that this is a work-around fix, maybe not adding the NON_RENEWER publicly in common ? just check for null ? - Did you test the patch on real cluster ? > YARN's delegation-token handling disallows certain trust setups to operate > properly over DistCp > --- > > Key: YARN-3021 > URL: https://issues.apache.org/jira/browse/YARN-3021 > Project: Hadoop YARN > Issue Type: Bug > Components: security >Affects Versions: 2.3.0 >Reporter: Harsh J >Assignee: Yongjun Zhang > Attachments: YARN-3021.001.patch, YARN-3021.002.patch, > YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.patch > > > Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, > and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN > clusters. > Now if one logs in with a COMMON credential, and runs a job on A's YARN that > needs to access B's HDFS (such as a DistCp), the operation fails in the RM, > as it attempts a renewDelegationToken(…) synchronously during application > submission (to validate the managed token before it adds it to a scheduler > for automatic renewal). The call obviously fails cause B realm will not trust > A's credentials (here, the RM's principal is the renewer). > In the 1.x JobTracker the same call is present, but it is done asynchronously > and once the renewal attempt failed we simply ceased to schedule any further > attempts of renewals, rather than fail the job immediately. > We should change the logic such that we attempt the renewal but go easy on > the failure and skip the scheduling alone, rather than bubble back an error > to the client, failing the app submission. This way the old behaviour is > retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3362) Add node label usage in RM CapacityScheduler web UI
[ https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370438#comment-14370438 ] Wangda Tan commented on YARN-3362: -- For the active-user-info, we need some queue-user-by-label metrics as well, such as used-resource-by-user-and-label. Which can be placed to queue-label metrics table. > Add node label usage in RM CapacityScheduler web UI > --- > > Key: YARN-3362 > URL: https://issues.apache.org/jira/browse/YARN-3362 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager, webapp >Reporter: Wangda Tan >Assignee: Naganarasimha G R > > We don't have node label usage in RM CapacityScheduler web UI now, without > this, user will be hard to understand what happened to nodes have labels > assign to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3362) Add node label usage in RM CapacityScheduler web UI
[ https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370434#comment-14370434 ] Wangda Tan commented on YARN-3362: -- Hi [~Naganarasimha], Thanks for your comments, bq. There will be some common queue metrics across the labels, wont it get repeated across for each label if a queue is mapped to multiple labels? Some common fields may get repeated (like absolute max capacity, etc.). I think repeat some of them is not a very big issue to me. I think we can show queue-label-metrics + queue-common-metrics for each queue-label bq. IIUC most of the queue Metrics might not be specific to a label, like Capacity, Absolute max capacity, Max apps, Max AM's per user etc... . Correct me if my understanding on this is wrong. Yes, they're, but there're more parameters / metrics in queues for both label/queue, different labels under same queue can have different user-limit/capacity/maximum-capacity/max-am-resource, etc.). We also need to show them to users if possible bq. Apart from the label specific queue metrics like (label capacity, label abs capacity,used) are there any new Label specific queue metrics you have in your mind ? I think above can answer your question. bq. would it be better to list like If we have this view, 1) How you show label-specific metrics? 2) What's the "used-resource" in queue level means (used-resource make more sense when it's per-label). 3) How to check "label-wise" resource usage for parent queues. bq. Also if required we can have seperate page (/in the labels page/append at the end of CS page) like I think my proposal is still a little more clear, we need to show label-wise metrics to user. And with that, user can clear understand resource usage for each partition (just check each label's usage. Also parent's label-wise usage can show as well. > Add node label usage in RM CapacityScheduler web UI > --- > > Key: YARN-3362 > URL: https://issues.apache.org/jira/browse/YARN-3362 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager, webapp >Reporter: Wangda Tan >Assignee: Naganarasimha G R > > We don't have node label usage in RM CapacityScheduler web UI now, without > this, user will be hard to understand what happened to nodes have labels > assign to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370432#comment-14370432 ] Robert Kanter commented on YARN-3040: - Sorry I didn't reply earlier. I still haven't found the cycles to do patches for ATS work, so please go ahead and continue working on the updated patch. I'll be sure to review it. > [Data Model] Make putEntities operation be aware of the app's context > - > > Key: YARN-3040 > URL: https://issues.apache.org/jira/browse/YARN-3040 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3040.1.patch > > > Per design in YARN-2928, implement client-side API for handling *flows*. > Frameworks should be able to define and pass in all attributes of flows and > flow runs to YARN, and they should be passed into ATS writers. > YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3345) Add non-exclusive node label RMAdmin CLI/API
[ https://issues.apache.org/jira/browse/YARN-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3345: - Attachment: YARN-3345.5.patch bq. public/unstable annotations for the newly added records, e.g. SetNodeLabelsAttributesRequest, Done bq. NodeLabelAttributes -> NodeLabel, so that AddToClusterNodeLabelsRequest can later on use the same data structure. Done bq. for node exclusiveness - I think we may use NodeLabel#(get/set)IsExclusive Done bq. “ an un existed node-label=%s” - “non-existing node-label” Done bq. throw YarnException instead of IOException Done bq. below code, how about user wants to set the attributes to be empty if (attr.getAttributes().isEmpty()) { // simply ignore continue; } Done, removed map of attributes, added top level "shareable" bq. add a newInstance method in SetNodeLabelsAttributesResponse and use that Done bq. revert RMNodeLabelsManager change There're some renames, so we cannot revert RMNodeLabelsMgr changes. > Add non-exclusive node label RMAdmin CLI/API > > > Key: YARN-3345 > URL: https://issues.apache.org/jira/browse/YARN-3345 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-3345.1.patch, YARN-3345.2.patch, YARN-3345.3.patch, > YARN-3345.4.patch, YARN-3345.5.patch > > > As described in YARN-3214 (see design doc attached to that JIRA), we need add > non-exclusive node label RMAdmin API and CLI implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3356) Capacity Scheduler FiCaSchedulerApp should use ResourceUsage to track used-resources-by-label.
[ https://issues.apache.org/jira/browse/YARN-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370401#comment-14370401 ] Wangda Tan commented on YARN-3356: -- Thanks for review! Addressed in (ver.5) > Capacity Scheduler FiCaSchedulerApp should use ResourceUsage to track > used-resources-by-label. > -- > > Key: YARN-3356 > URL: https://issues.apache.org/jira/browse/YARN-3356 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-3356.1.patch, YARN-3356.2.patch, YARN-3356.3.patch, > YARN-3356.4.patch, YARN-3356.5.patch > > > Simliar to YARN-3099, Capacity Scheduler's LeafQueue.User/FiCaSchedulerApp > should use ResourceRequest to track resource-usage/pending by label for > better resource tracking and preemption. > And also, when application's pending resource changed (container allocated, > app completed, moved, etc.), we need update ResourceUsage of queue > hierarchies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3356) Capacity Scheduler FiCaSchedulerApp should use ResourceUsage to track used-resources-by-label.
[ https://issues.apache.org/jira/browse/YARN-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3356: - Attachment: YARN-3356.5.patch > Capacity Scheduler FiCaSchedulerApp should use ResourceUsage to track > used-resources-by-label. > -- > > Key: YARN-3356 > URL: https://issues.apache.org/jira/browse/YARN-3356 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-3356.1.patch, YARN-3356.2.patch, YARN-3356.3.patch, > YARN-3356.4.patch, YARN-3356.5.patch > > > Simliar to YARN-3099, Capacity Scheduler's LeafQueue.User/FiCaSchedulerApp > should use ResourceRequest to track resource-usage/pending by label for > better resource tracking and preemption. > And also, when application's pending resource changed (container allocated, > app completed, moved, etc.), we need update ResourceUsage of queue > hierarchies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3356) Capacity Scheduler FiCaSchedulerApp should use ResourceUsage to track used-resources-by-label.
[ https://issues.apache.org/jira/browse/YARN-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370396#comment-14370396 ] Jian He commented on YARN-3356: --- - how about doing the for loop inside the write lock ? {code} public void copyAllUsed(ResourceUsage other) { for (Entry entry : other.usages.entrySet()) { setUsed(entry.getKey(), Resources.clone(entry.getValue().getUsed())); } } {code} > Capacity Scheduler FiCaSchedulerApp should use ResourceUsage to track > used-resources-by-label. > -- > > Key: YARN-3356 > URL: https://issues.apache.org/jira/browse/YARN-3356 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-3356.1.patch, YARN-3356.2.patch, YARN-3356.3.patch, > YARN-3356.4.patch > > > Simliar to YARN-3099, Capacity Scheduler's LeafQueue.User/FiCaSchedulerApp > should use ResourceRequest to track resource-usage/pending by label for > better resource tracking and preemption. > And also, when application's pending resource changed (container allocated, > app completed, moved, etc.), we need update ResourceUsage of queue > hierarchies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370386#comment-14370386 ] Hadoop QA commented on YARN-2495: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705721/YARN-2495.20150320-1.patch against trunk revision 91baca1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7028//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7028//console This message is automatically generated. > Allow admin specify labels from each NM (Distributed configuration) > --- > > Key: YARN-2495 > URL: https://issues.apache.org/jira/browse/YARN-2495 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, > YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, > YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, > YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, > YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, > YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, > YARN-2495_20141022.1.patch > > > Target of this JIRA is to allow admin specify labels in each NM, this covers > - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or > using script suggested by [~aw] (YARN-2729) ) > - NM will send labels to RM via ResourceTracker API > - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370370#comment-14370370 ] Junping Du commented on YARN-3225: -- bq. I feel timeout would be enough, anyway we can wait for some other to comment or suggest here. I am fine for keeping timeout here for now. We can discuss the naming issue on other JIRAs later. Shouldn't block the major feature here. > New parameter or CLI for decommissioning node gracefully in RMAdmin CLI > --- > > Key: YARN-3225 > URL: https://issues.apache.org/jira/browse/YARN-3225 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Devaraj K > Attachments: YARN-3225.patch, YARN-914.patch > > > New CLI (or existing CLI with parameters) should put each node on > decommission list to decommissioning status and track timeout to terminate > the nodes that haven't get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3379) Missing data in localityTable and ResourceRequests table in RM WebUI
[ https://issues.apache.org/jira/browse/YARN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370363#comment-14370363 ] Hadoop QA commented on YARN-3379: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705737/YARN-3379.3.patch against trunk revision 91baca1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7030//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7030//console This message is automatically generated. > Missing data in localityTable and ResourceRequests table in RM WebUI > > > Key: YARN-3379 > URL: https://issues.apache.org/jira/browse/YARN-3379 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, webapp, yarn >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-3379.1.patch, YARN-3379.2.patch, > YARN-3379.3.1.patch, YARN-3379.3.patch > > > After YARN-1809, we have common appBlock/attemptBlock for both RM WebUI and > AHS WebUI. > But there are some information, such as containerLocalityStatistics, > ResourceRequests, are only useful for the Running Applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3379) Missing data in localityTable and ResourceRequests table in RM WebUI
[ https://issues.apache.org/jira/browse/YARN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370361#comment-14370361 ] Jian He commented on YARN-3379: --- looks good overall, the attempt page has some format issues which is tracked down at YARN-3301. > Missing data in localityTable and ResourceRequests table in RM WebUI > > > Key: YARN-3379 > URL: https://issues.apache.org/jira/browse/YARN-3379 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, webapp, yarn >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-3379.1.patch, YARN-3379.2.patch, > YARN-3379.3.1.patch, YARN-3379.3.patch > > > After YARN-1809, we have common appBlock/attemptBlock for both RM WebUI and > AHS WebUI. > But there are some information, such as containerLocalityStatistics, > ResourceRequests, are only useful for the Running Applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die
[ https://issues.apache.org/jira/browse/YARN-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3369: - Attachment: YARN-3369.2.patch Attached same patch with correct indent, will commit when Jenkins get back. > Missing NullPointer check in AppSchedulingInfo causes RM to die > > > Key: YARN-3369 > URL: https://issues.apache.org/jira/browse/YARN-3369 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Giovanni Matteo Fumarola >Assignee: Brahma Reddy Battula >Priority: Blocker > Attachments: YARN-3369.2.patch, YARN-3369.patch > > > In AppSchedulingInfo.java the method checkForDeactivation() has these 2 > consecutive lines: > {code} > ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY); > if (request.getNumContainers() > 0) { > {code} > the first line calls getResourceRequest and it can return null. > {code} > synchronized public ResourceRequest getResourceRequest( > Priority priority, String resourceName) { > Map nodeRequests = requests.get(priority); > return (nodeRequests == null) ? {color:red} null : > nodeRequests.get(resourceName); > } > {code} > The second line dereferences the pointer directly without a check. > If the pointer is null, the RM dies. > {quote}2015-03-17 14:14:04,757 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_UPDATE to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739) > at java.lang.Thread.run(Thread.java:722) > {color:red} *2015-03-17 14:14:04,758 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, > bbye..*{color} {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370333#comment-14370333 ] Wangda Tan commented on YARN-2495: -- Go through patch again, hopefully this is last round from my side :) 1) StringArrayProto.stringElement -> elements 2) After thought, I think {{optional bool areNodeLabelsAcceptedByRM = 7 \[default = false\];}} should be true to be more defensive: We need make sure there's no error when somebody forget to set this field. 3) testNodeHeartbeatRequestPBImplWithNullLabels: remove {{original.setNodeLabels(null);}}, test should still pass. 4) NodeLabelsProviderService -> NodeLabelsProvider, like most other modules, we don't need to make "service" as a part of the classname, change sub classes and NodeManager.createNodeLabelsProviderService as well. 5) NodeStatusUpdaterImpl.run: {code} 617 int lastHeartbeatID = 0; 618 Set nodeLabelsLastUpdatedToRM = null; 619 if (hasNodeLabelsProvider) { {code} No matter if hasNodeLabelsProvider, nodeLabelsLastUpdatedToRM should be null? By default is "not change" instead of "empty", correct? 6) nodeLabelsLastUpdatedToRM -> lastUpdatedNodeLabelsToRM 7) areNodeLabelsUpdated: Need check null? And could you add more test to cover the case when new fetched node labels and/or last node labels are null? > Allow admin specify labels from each NM (Distributed configuration) > --- > > Key: YARN-2495 > URL: https://issues.apache.org/jira/browse/YARN-2495 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, > YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, > YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, > YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, > YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, > YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, > YARN-2495_20141022.1.patch > > > Target of this JIRA is to allow admin specify labels in each NM, this covers > - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or > using script suggested by [~aw] (YARN-2729) ) > - NM will send labels to RM via ResourceTracker API > - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3379) Missing data in localityTable and ResourceRequests table in RM WebUI
[ https://issues.apache.org/jira/browse/YARN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3379: Attachment: YARN-3379.3.1.patch > Missing data in localityTable and ResourceRequests table in RM WebUI > > > Key: YARN-3379 > URL: https://issues.apache.org/jira/browse/YARN-3379 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, webapp, yarn >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-3379.1.patch, YARN-3379.2.patch, > YARN-3379.3.1.patch, YARN-3379.3.patch > > > After YARN-1809, we have common appBlock/attemptBlock for both RM WebUI and > AHS WebUI. > But there are some information, such as containerLocalityStatistics, > ResourceRequests, are only useful for the Running Applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370299#comment-14370299 ] Sangjin Lee commented on YARN-3034: --- [~Naganarasimha], I do see the commit, and I'm able to pull it (dda84085cabd8fdf143b380e54e1730802fd9912). You might want to try again. > [Collector wireup] Implement RM starting its timeline collector > --- > > Key: YARN-3034 > URL: https://issues.apache.org/jira/browse/YARN-3034 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, > YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch > > > Per design in YARN-2928, implement resource managers starting their own ATS > writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3379) Missing data in localityTable and ResourceRequests table in RM WebUI
[ https://issues.apache.org/jira/browse/YARN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370290#comment-14370290 ] Xuan Gong commented on YARN-3379: - Submit a new patch after have some offline- discussion with Jian > Missing data in localityTable and ResourceRequests table in RM WebUI > > > Key: YARN-3379 > URL: https://issues.apache.org/jira/browse/YARN-3379 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, webapp, yarn >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-3379.1.patch, YARN-3379.2.patch, YARN-3379.3.patch > > > After YARN-1809, we have common appBlock/attemptBlock for both RM WebUI and > AHS WebUI. > But there are some information, such as containerLocalityStatistics, > ResourceRequests, are only useful for the Running Applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3379) Missing data in localityTable and ResourceRequests table in RM WebUI
[ https://issues.apache.org/jira/browse/YARN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3379: Attachment: YARN-3379.3.patch > Missing data in localityTable and ResourceRequests table in RM WebUI > > > Key: YARN-3379 > URL: https://issues.apache.org/jira/browse/YARN-3379 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, webapp, yarn >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-3379.1.patch, YARN-3379.2.patch, YARN-3379.3.patch > > > After YARN-1809, we have common appBlock/attemptBlock for both RM WebUI and > AHS WebUI. > But there are some information, such as containerLocalityStatistics, > ResourceRequests, are only useful for the Running Applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2581) NMs need to find a way to get LogAggregationContext
[ https://issues.apache.org/jira/browse/YARN-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370279#comment-14370279 ] Anubhav Dhoot commented on YARN-2581: - This is a breaking change and should have been marked as such. This is the error seen on upgrade from previous version {noformat} 2015-03-17 10:29:57,984 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager org.apache.hadoop.service.ServiceStateException: java.io.EOFException at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:253) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:462) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:509) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.hadoop.yarn.security.ContainerTokenIdentifier.readFields(ContainerTokenIdentifier.java:187) at org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:142) at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerTokenIdentifier(BuilderUtils.java:271) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:298) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:254) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:237) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) ... 5 more 2015-03-17 10:29:57,995 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG: {noformat} > NMs need to find a way to get LogAggregationContext > --- > > Key: YARN-2581 > URL: https://issues.apache.org/jira/browse/YARN-2581 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.6.0 > > Attachments: YARN-2581.1.patch, YARN-2581.2.patch, YARN-2581.3.patch, > YARN-2581.4.patch > > > After YARN-2569, we have LogAggregationContext for application in > ApplicationSubmissionContext. NMs need to find a way to get this information. > We have this requirement: For all containers in the same application should > honor the same LogAggregationContext. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die
[ https://issues.apache.org/jira/browse/YARN-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370271#comment-14370271 ] Hadoop QA commented on YARN-3369: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705656/YARN-3369.patch against trunk revision 61a4c7f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color}. The applied patch generated 1152 javac compiler warnings (more than the trunk's current 206 warnings). {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 43 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/7026//artifact/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.ipc.TestRPCWaitForProxy org.apache.hadoop.tracing.TestTracing org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.qjournal.TestSecureNNWithQJM Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7026//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7026//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7026//console This message is automatically generated. > Missing NullPointer check in AppSchedulingInfo causes RM to die > > > Key: YARN-3369 > URL: https://issues.apache.org/jira/browse/YARN-3369 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Giovanni Matteo Fumarola >Assignee: Brahma Reddy Battula >Priority: Blocker > Attachments: YARN-3369.patch > > > In AppSchedulingInfo.java the method checkForDeactivation() has these 2 > consecutive lines: > {code} > ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY); > if (request.getNumContainers() > 0) { > {code} > the first line calls getResourceRequest and it can return null. > {code} > synchronized public ResourceRequest getResourceRequest( > Priority priority, String resourceName) { > Map nodeRequests = requests.get(priority); > return (nodeRequests == null) ? {color:red} null : > nodeRequests.get(resourceName); > } > {code} > The second line dereferences the pointer directly without a check. > If the pointer is null, the RM dies. > {quote}2015-03-17 14:14:04,757 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_UPDATE to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.j
[jira] [Commented] (YARN-3379) Missing data in localityTable and ResourceRequests table in RM WebUI
[ https://issues.apache.org/jira/browse/YARN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370246#comment-14370246 ] Hadoop QA commented on YARN-3379: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705704/YARN-3379.2.patch against trunk revision 61a4c7f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7027//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7027//console This message is automatically generated. > Missing data in localityTable and ResourceRequests table in RM WebUI > > > Key: YARN-3379 > URL: https://issues.apache.org/jira/browse/YARN-3379 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, webapp, yarn >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-3379.1.patch, YARN-3379.2.patch > > > After YARN-1809, we have common appBlock/attemptBlock for both RM WebUI and > AHS WebUI. > But there are some information, such as containerLocalityStatistics, > ResourceRequests, are only useful for the Running Applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2828) Enable auto refresh of web pages (using http parameter)
[ https://issues.apache.org/jira/browse/YARN-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay Bhat updated YARN-2828: - Attachment: YARN-2828.006.patch > Enable auto refresh of web pages (using http parameter) > --- > > Key: YARN-2828 > URL: https://issues.apache.org/jira/browse/YARN-2828 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Tim Robertson >Assignee: Vijay Bhat >Priority: Minor > Attachments: YARN-2828.001.patch, YARN-2828.002.patch, > YARN-2828.003.patch, YARN-2828.004.patch, YARN-2828.005.patch, > YARN-2828.006.patch > > > The MR1 Job Tracker had a useful HTTP parameter of e.g. "&refresh=3" that > could be appended to URLs which enabled a page reload. This was very useful > when developing mapreduce jobs, especially to watch counters changing. This > is lost in the the Yarn interface. > Could be implemented as a page element (e.g. drop down or so), but I'd > recommend that the page not be more cluttered, and simply bring back the > optional "refresh" HTTP param. It worked really nicely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2495: Attachment: YARN-2495.20150320-1.patch Hi [~wangda], Updated Patch with following changes : * Typo, lable -> label, * NodeStatusUpdaterImpl: no need to call nodeLabelsProvider.getNodeLabels() twice when register/heartbeat * HeartBeat -> Heartbeat * NodeStatusUpdaterImpl: When labels are rejected by RM, you should log it with diag message. * StringArrayProto instead of NodeIdToLabelsProto * NodeStatusUpdaterTest.testNMRegistrationWithLabels to testNodeStatusUpdaterForNodeLabels * TestResourceTrackerService, lblsMgr -> nodeLabelsMgr * Validation for "heartbeat without updating labels" > Allow admin specify labels from each NM (Distributed configuration) > --- > > Key: YARN-2495 > URL: https://issues.apache.org/jira/browse/YARN-2495 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, > YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, > YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, > YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, > YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, > YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, > YARN-2495_20141022.1.patch > > > Target of this JIRA is to allow admin specify labels in each NM, this covers > - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or > using script suggested by [~aw] (YARN-2729) ) > - NM will send labels to RM via ResourceTracker API > - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3379) Missing data in localityTable and ResourceRequests table in RM WebUI
[ https://issues.apache.org/jira/browse/YARN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3379: Attachment: YARN-3379.2.patch > Missing data in localityTable and ResourceRequests table in RM WebUI > > > Key: YARN-3379 > URL: https://issues.apache.org/jira/browse/YARN-3379 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, webapp, yarn >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-3379.1.patch, YARN-3379.2.patch > > > After YARN-1809, we have common appBlock/attemptBlock for both RM WebUI and > AHS WebUI. > But there are some information, such as containerLocalityStatistics, > ResourceRequests, are only useful for the Running Applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370056#comment-14370056 ] Wangda Tan commented on YARN-2495: -- ResourceTrackerForLabels.labels -> lastReceivedLabels {code} 262 // heartbeat without updating labels 263 nm.getNodeStatusUpdater().sendOutofBandHeartBeat(); 264 resourceTracker.waitTillHeartBeat(); 265 resourceTracker.resetNMHeartbeatReceiveFlag(); {code} Need add some check for this operation? > Allow admin specify labels from each NM (Distributed configuration) > --- > > Key: YARN-2495 > URL: https://issues.apache.org/jira/browse/YARN-2495 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, > YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, > YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, > YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, > YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, > YARN-2495.20150318-1.patch, YARN-2495_20141022.1.patch > > > Target of this JIRA is to allow admin specify labels in each NM, this covers > - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or > using script suggested by [~aw] (YARN-2729) ) > - NM will send labels to RM via ResourceTracker API > - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3379) Missing data in localityTable and ResourceRequests table in RM WebUI
[ https://issues.apache.org/jira/browse/YARN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3379: Attachment: YARN-3379.1.patch > Missing data in localityTable and ResourceRequests table in RM WebUI > > > Key: YARN-3379 > URL: https://issues.apache.org/jira/browse/YARN-3379 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, webapp, yarn >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-3379.1.patch > > > After YARN-1809, we have common appBlock/attemptBlock for both RM WebUI and > AHS WebUI. > But there are some information, such as containerLocalityStatistics, > ResourceRequests, are only useful for the Running Applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3379) Missing data in localityTable and ResourceRequests table in RM WebUI
[ https://issues.apache.org/jira/browse/YARN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3379: Description: After YARN-1809, we have common appBlock/attemptBlock for both RM WebUI and AHS WebUI. But there are some information, such as containerLocalityStatistics, ResourceRequests, are only useful for the Running Applications. > Missing data in localityTable and ResourceRequests table in RM WebUI > > > Key: YARN-3379 > URL: https://issues.apache.org/jira/browse/YARN-3379 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, webapp, yarn >Reporter: Xuan Gong >Assignee: Xuan Gong > > After YARN-1809, we have common appBlock/attemptBlock for both RM WebUI and > AHS WebUI. > But there are some information, such as containerLocalityStatistics, > ResourceRequests, are only useful for the Running Applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3379) Missing data in localityTable and ResourceRequests table in RM WebUI
[ https://issues.apache.org/jira/browse/YARN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-3379: --- Assignee: Xuan Gong > Missing data in localityTable and ResourceRequests table in RM WebUI > > > Key: YARN-3379 > URL: https://issues.apache.org/jira/browse/YARN-3379 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, webapp, yarn >Reporter: Xuan Gong >Assignee: Xuan Gong > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3379) Missing data in localityTable and ResourceRequests table in RM WebUI
Xuan Gong created YARN-3379: --- Summary: Missing data in localityTable and ResourceRequests table in RM WebUI Key: YARN-3379 URL: https://issues.apache.org/jira/browse/YARN-3379 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370041#comment-14370041 ] Naganarasimha G R commented on YARN-3034: - Hi [~djp], thanks for informing, i was trying to update from YARN-2928 branch but unfortunately was not getting the modifications from YARN-, tried using {{git pull -v --progress "origin"}} and also pulling through tools like git-cola. Anyway will try to clone the branch tomorrow and try again, mean while can you check once whether you are able to get the modifications in YARN-2928 branch ? > [Collector wireup] Implement RM starting its timeline collector > --- > > Key: YARN-3034 > URL: https://issues.apache.org/jira/browse/YARN-3034 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, > YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch > > > Per design in YARN-2928, implement resource managers starting their own ATS > writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3181) FairScheduler: Fix up outdated findbugs issues
[ https://issues.apache.org/jira/browse/YARN-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3181: --- Priority: Major (was: Blocker) The earlier commit was causing issues, and I have reverted it recently. I don't think this is an urgent concern. The only reason we would want to fix this is to generally reduce the tech debt in the FairScheduler. Am I missing something? > FairScheduler: Fix up outdated findbugs issues > -- > > Key: YARN-3181 > URL: https://issues.apache.org/jira/browse/YARN-3181 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Brahma Reddy Battula > Attachments: YARN-3181-002.patch, yarn-3181-1.patch > > > In FairScheduler, we have excluded some findbugs-reported errors. Some of > them aren't applicable anymore, and there are a few that can be easily fixed > without needing an exclusion. It would be nice to fix them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2369) Environment variable handling assumes values should be appended
[ https://issues.apache.org/jira/browse/YARN-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369965#comment-14369965 ] Jason Lowe commented on YARN-2369: -- Sure, feel free to post a proposed design/patch for it. > Environment variable handling assumes values should be appended > --- > > Key: YARN-2369 > URL: https://issues.apache.org/jira/browse/YARN-2369 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jason Lowe >Assignee: Dustin Cote > > When processing environment variables for a container context the code > assumes that the value should be appended to any pre-existing value in the > environment. This may be desired behavior for handling path-like environment > variables such as PATH, LD_LIBRARY_PATH, CLASSPATH, etc. but it is a > non-intuitive and harmful way to handle any variable that does not have > path-like semantics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2369) Environment variable handling assumes values should be appended
[ https://issues.apache.org/jira/browse/YARN-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369953#comment-14369953 ] Dustin Cote commented on YARN-2369: --- [~jlowe] or [~aw] is this one still needed? If it is, I'd like to take a crack at it. I've had problems with the LD_LIBRARY_PATH in my own experiences, so if it's not fixed by something else in a later version I think it should be. > Environment variable handling assumes values should be appended > --- > > Key: YARN-2369 > URL: https://issues.apache.org/jira/browse/YARN-2369 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jason Lowe >Assignee: Dustin Cote > > When processing environment variables for a container context the code > assumes that the value should be appended to any pre-existing value in the > environment. This may be desired behavior for handling path-like environment > variables such as PATH, LD_LIBRARY_PATH, CLASSPATH, etc. but it is a > non-intuitive and harmful way to handle any variable that does not have > path-like semantics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2740) RM AdminService should prevent admin change labels on nodes when distributed node label configuration enabled
[ https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R reassigned YARN-2740: --- Assignee: Naganarasimha G R (was: Wangda Tan) > RM AdminService should prevent admin change labels on nodes when distributed > node label configuration enabled > - > > Key: YARN-2740 > URL: https://issues.apache.org/jira/browse/YARN-2740 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch > > > According to YARN-2495, labels of nodes will be specified when NM do > heartbeat. We shouldn't allow admin modify labels on nodes when distributed > node label configuration enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2740) RM AdminService should prevent admin change labels on nodes when distributed node label configuration enabled
[ https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2740: Attachment: YARN-2740.20150320-1.patch Hi [~wangda], As per your [comment|https://issues.apache.org/jira/browse/YARN-2495?focusedCommentId=14353353&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14353353] in YARN-2495, bq. when distributed node label configuration is set, any direct modify node to labels mapping from RMAdminCLI should be rejected (like -replaceNodeToLabels). This can be done in a separated JIRA. As there was jira already existing for it, taking over this jira and also it was requiring a similar check for RMWebServices flow so added a check and test case for the same > RM AdminService should prevent admin change labels on nodes when distributed > node label configuration enabled > - > > Key: YARN-2740 > URL: https://issues.apache.org/jira/browse/YARN-2740 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch > > > According to YARN-2495, labels of nodes will be specified when NM do > heartbeat. We shouldn't allow admin modify labels on nodes when distributed > node label configuration enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3377) TestTimelineServiceClientIntegration fails
[ https://issues.apache.org/jira/browse/YARN-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369926#comment-14369926 ] Zhijie Shen commented on YARN-3377: --- Saw that failure too. Will review the patch. > TestTimelineServiceClientIntegration fails > -- > > Key: YARN-3377 > URL: https://issues.apache.org/jira/browse/YARN-3377 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Minor > Attachments: YARN-3377.001.patch > > > TestTimelineServiceClientIntegration fails. It appears we are getting 500 > from the timeline collector. This appears to be mostly an issue with the test > itself. > {noformat} > --- > Test set: > org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration > --- > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 33.503 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration > testPutEntities(org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration) > Time elapsed: 32.606 sec <<< ERROR! > org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the response > from the timeline server. > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putObjects(TimelineClientImpl.java:457) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putObjects(TimelineClientImpl.java:391) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:368) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:342) > at > org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration.testPutEntities(TestTimelineServiceClientIntegration.java:74) > {noformat} > The relevant piece from the server side: > {noformat} > Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.PackagesResourceConfig init > INFO: Scanning for root resource and provider classes in the packages: > org.apache.hadoop.yarn.server.timelineservice.collector > org.apache.hadoop.yarn.webapp > org.apache.hadoop.yarn.webapp > Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.ScanningResourceConfig > logClasses > INFO: Root resource classes found: > class org.apache.hadoop.yarn.webapp.MyTestWebService > class > org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorWebService > Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.ScanningResourceConfig > logClasses > INFO: Provider classes found: > class org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider > class org.apache.hadoop.yarn.webapp.GenericExceptionHandler > class org.apache.hadoop.yarn.webapp.MyTestJAXBContextResolver > Mar 19, 2015 10:48:30 AM > com.sun.jersey.server.impl.application.WebApplicationImpl _initiate > INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM' > Mar 19, 2015 10:48:31 AM > com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 > resolve > SEVERE: null > java.lang.IllegalAccessException: Class > com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 can > not access a member of class > org.apache.hadoop.yarn.webapp.MyTestWebService$MyInfo with modifiers "public" > at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:95) > at java.lang.Class.newInstance0(Class.java:366) > at java.lang.Class.newInstance(Class.java:325) > at > com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8.resolve(WadlGeneratorJAXBGrammarGenerator.java:467) > at > com.sun.jersey.server.wadl.WadlGenerator$ExternalGrammarDefinition.resolve(WadlGenerator.java:181) > at > com.sun.jersey.server.wadl.ApplicationDescription.resolve(ApplicationDescription.java:81) > at > com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.attachTypes(WadlGeneratorJAXBGrammarGenerator.java:518) > at com.sun.jersey.server.wadl.WadlBuilder.generate(WadlBuilder.java:124) > at > com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:104) > at > com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:120) > at > com.sun.jersey.server.impl.wadl.WadlMethodFactory$WadlOptionsMethodDispatcher.dispatch(WadlMethodFactory.java:98) > at > com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) > at > com.sun.jersey.server.impl.u
[jira] [Commented] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die
[ https://issues.apache.org/jira/browse/YARN-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369917#comment-14369917 ] Wangda Tan commented on YARN-3369: -- I think we don't need check null here, since Map will be always created when priority added. > Missing NullPointer check in AppSchedulingInfo causes RM to die > > > Key: YARN-3369 > URL: https://issues.apache.org/jira/browse/YARN-3369 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Giovanni Matteo Fumarola >Assignee: Brahma Reddy Battula >Priority: Blocker > Attachments: YARN-3369.patch > > > In AppSchedulingInfo.java the method checkForDeactivation() has these 2 > consecutive lines: > {code} > ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY); > if (request.getNumContainers() > 0) { > {code} > the first line calls getResourceRequest and it can return null. > {code} > synchronized public ResourceRequest getResourceRequest( > Priority priority, String resourceName) { > Map nodeRequests = requests.get(priority); > return (nodeRequests == null) ? {color:red} null : > nodeRequests.get(resourceName); > } > {code} > The second line dereferences the pointer directly without a check. > If the pointer is null, the RM dies. > {quote}2015-03-17 14:14:04,757 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_UPDATE to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739) > at java.lang.Thread.run(Thread.java:722) > {color:red} *2015-03-17 14:14:04,758 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, > bbye..*{color} {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3377) TestTimelineServiceClientIntegration fails
[ https://issues.apache.org/jira/browse/YARN-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369915#comment-14369915 ] Sangjin Lee commented on YARN-3377: --- With the patch all our tests pass. Could someone take a quick look and provide reviews? Thanks! > TestTimelineServiceClientIntegration fails > -- > > Key: YARN-3377 > URL: https://issues.apache.org/jira/browse/YARN-3377 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Minor > Attachments: YARN-3377.001.patch > > > TestTimelineServiceClientIntegration fails. It appears we are getting 500 > from the timeline collector. This appears to be mostly an issue with the test > itself. > {noformat} > --- > Test set: > org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration > --- > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 33.503 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration > testPutEntities(org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration) > Time elapsed: 32.606 sec <<< ERROR! > org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the response > from the timeline server. > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putObjects(TimelineClientImpl.java:457) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putObjects(TimelineClientImpl.java:391) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:368) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:342) > at > org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration.testPutEntities(TestTimelineServiceClientIntegration.java:74) > {noformat} > The relevant piece from the server side: > {noformat} > Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.PackagesResourceConfig init > INFO: Scanning for root resource and provider classes in the packages: > org.apache.hadoop.yarn.server.timelineservice.collector > org.apache.hadoop.yarn.webapp > org.apache.hadoop.yarn.webapp > Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.ScanningResourceConfig > logClasses > INFO: Root resource classes found: > class org.apache.hadoop.yarn.webapp.MyTestWebService > class > org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorWebService > Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.ScanningResourceConfig > logClasses > INFO: Provider classes found: > class org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider > class org.apache.hadoop.yarn.webapp.GenericExceptionHandler > class org.apache.hadoop.yarn.webapp.MyTestJAXBContextResolver > Mar 19, 2015 10:48:30 AM > com.sun.jersey.server.impl.application.WebApplicationImpl _initiate > INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM' > Mar 19, 2015 10:48:31 AM > com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 > resolve > SEVERE: null > java.lang.IllegalAccessException: Class > com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 can > not access a member of class > org.apache.hadoop.yarn.webapp.MyTestWebService$MyInfo with modifiers "public" > at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:95) > at java.lang.Class.newInstance0(Class.java:366) > at java.lang.Class.newInstance(Class.java:325) > at > com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8.resolve(WadlGeneratorJAXBGrammarGenerator.java:467) > at > com.sun.jersey.server.wadl.WadlGenerator$ExternalGrammarDefinition.resolve(WadlGenerator.java:181) > at > com.sun.jersey.server.wadl.ApplicationDescription.resolve(ApplicationDescription.java:81) > at > com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.attachTypes(WadlGeneratorJAXBGrammarGenerator.java:518) > at com.sun.jersey.server.wadl.WadlBuilder.generate(WadlBuilder.java:124) > at > com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:104) > at > com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:120) > at > com.sun.jersey.server.impl.wadl.WadlMethodFactory$WadlOptionsMethodDispatcher.dispatch(WadlMethodFactory.java:98) > at > com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.
[jira] [Commented] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die
[ https://issues.apache.org/jira/browse/YARN-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369914#comment-14369914 ] Wangda Tan commented on YARN-3369: -- Mostly LGTM, could you make code inner {{if ... }} indent? > Missing NullPointer check in AppSchedulingInfo causes RM to die > > > Key: YARN-3369 > URL: https://issues.apache.org/jira/browse/YARN-3369 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Giovanni Matteo Fumarola >Assignee: Brahma Reddy Battula >Priority: Blocker > Attachments: YARN-3369.patch > > > In AppSchedulingInfo.java the method checkForDeactivation() has these 2 > consecutive lines: > {code} > ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY); > if (request.getNumContainers() > 0) { > {code} > the first line calls getResourceRequest and it can return null. > {code} > synchronized public ResourceRequest getResourceRequest( > Priority priority, String resourceName) { > Map nodeRequests = requests.get(priority); > return (nodeRequests == null) ? {color:red} null : > nodeRequests.get(resourceName); > } > {code} > The second line dereferences the pointer directly without a check. > If the pointer is null, the RM dies. > {quote}2015-03-17 14:14:04,757 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_UPDATE to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739) > at java.lang.Thread.run(Thread.java:722) > {color:red} *2015-03-17 14:14:04,758 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, > bbye..*{color} {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3377) TestTimelineServiceClientIntegration fails
[ https://issues.apache.org/jira/browse/YARN-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3377: -- Attachment: YARN-3377.001.patch In TimelineCollectorManager.startWebApp(), it sets the singleton instance of TimelineCollectorManager to the context. This is fine in a normal situation, but some tests need to provide mocked instances over the singleton. The patch keeps the same behavior for the non-test case, but makes tests work as well. > TestTimelineServiceClientIntegration fails > -- > > Key: YARN-3377 > URL: https://issues.apache.org/jira/browse/YARN-3377 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Minor > Attachments: YARN-3377.001.patch > > > TestTimelineServiceClientIntegration fails. It appears we are getting 500 > from the timeline collector. This appears to be mostly an issue with the test > itself. > {noformat} > --- > Test set: > org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration > --- > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 33.503 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration > testPutEntities(org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration) > Time elapsed: 32.606 sec <<< ERROR! > org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the response > from the timeline server. > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putObjects(TimelineClientImpl.java:457) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putObjects(TimelineClientImpl.java:391) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:368) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:342) > at > org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration.testPutEntities(TestTimelineServiceClientIntegration.java:74) > {noformat} > The relevant piece from the server side: > {noformat} > Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.PackagesResourceConfig init > INFO: Scanning for root resource and provider classes in the packages: > org.apache.hadoop.yarn.server.timelineservice.collector > org.apache.hadoop.yarn.webapp > org.apache.hadoop.yarn.webapp > Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.ScanningResourceConfig > logClasses > INFO: Root resource classes found: > class org.apache.hadoop.yarn.webapp.MyTestWebService > class > org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorWebService > Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.ScanningResourceConfig > logClasses > INFO: Provider classes found: > class org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider > class org.apache.hadoop.yarn.webapp.GenericExceptionHandler > class org.apache.hadoop.yarn.webapp.MyTestJAXBContextResolver > Mar 19, 2015 10:48:30 AM > com.sun.jersey.server.impl.application.WebApplicationImpl _initiate > INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM' > Mar 19, 2015 10:48:31 AM > com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 > resolve > SEVERE: null > java.lang.IllegalAccessException: Class > com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 can > not access a member of class > org.apache.hadoop.yarn.webapp.MyTestWebService$MyInfo with modifiers "public" > at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:95) > at java.lang.Class.newInstance0(Class.java:366) > at java.lang.Class.newInstance(Class.java:325) > at > com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8.resolve(WadlGeneratorJAXBGrammarGenerator.java:467) > at > com.sun.jersey.server.wadl.WadlGenerator$ExternalGrammarDefinition.resolve(WadlGenerator.java:181) > at > com.sun.jersey.server.wadl.ApplicationDescription.resolve(ApplicationDescription.java:81) > at > com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.attachTypes(WadlGeneratorJAXBGrammarGenerator.java:518) > at com.sun.jersey.server.wadl.WadlBuilder.generate(WadlBuilder.java:124) > at > com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:104) > at > com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:120) > at > com.sun.
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369909#comment-14369909 ] Wangda Tan commented on YARN-2495: -- Hi [~Naganarasimha], For 1) I suggest to use a separated PB message, such as NodeLabelsProto (or make it generic like StringArrayProto), which contains {{repeat string}}. When using NodeIdToLabelsProto, but don't use nodeId, that will confuse people. About test cases 1. NodeStatusUpdaterTest: Some places need to cover: - NM register, should check RTS (ResourceTrackerService) labels (done) - NM heartheat, should check RTS labels (TODO) - NM headtbeat without update, should check RTS received labels (TODO) - NM heartbeat with update, should check RTS received labels (TODO) 2. TestResourceTrackerService Test generally very good to me, - lblsMgr -> nodeLabelsMgr or labelsMgr > Allow admin specify labels from each NM (Distributed configuration) > --- > > Key: YARN-2495 > URL: https://issues.apache.org/jira/browse/YARN-2495 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, > YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, > YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, > YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, > YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, > YARN-2495.20150318-1.patch, YARN-2495_20141022.1.patch > > > Target of this JIRA is to allow admin specify labels in each NM, this covers > - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or > using script suggested by [~aw] (YARN-2729) ) > - NM will send labels to RM via ResourceTracker API > - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2369) Environment variable handling assumes values should be appended
[ https://issues.apache.org/jira/browse/YARN-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote reassigned YARN-2369: - Assignee: Dustin Cote > Environment variable handling assumes values should be appended > --- > > Key: YARN-2369 > URL: https://issues.apache.org/jira/browse/YARN-2369 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jason Lowe >Assignee: Dustin Cote > > When processing environment variables for a container context the code > assumes that the value should be appended to any pre-existing value in the > environment. This may be desired behavior for handling path-like environment > variables such as PATH, LD_LIBRARY_PATH, CLASSPATH, etc. but it is a > non-intuitive and harmful way to handle any variable that does not have > path-like semantics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die
[ https://issues.apache.org/jira/browse/YARN-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369897#comment-14369897 ] Brahma Reddy Battula commented on YARN-3369: Attached patch ..[~leftnoteasy] and [~kasha] kindly review!! > Missing NullPointer check in AppSchedulingInfo causes RM to die > > > Key: YARN-3369 > URL: https://issues.apache.org/jira/browse/YARN-3369 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Giovanni Matteo Fumarola >Assignee: Brahma Reddy Battula >Priority: Blocker > Attachments: YARN-3369.patch > > > In AppSchedulingInfo.java the method checkForDeactivation() has these 2 > consecutive lines: > {code} > ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY); > if (request.getNumContainers() > 0) { > {code} > the first line calls getResourceRequest and it can return null. > {code} > synchronized public ResourceRequest getResourceRequest( > Priority priority, String resourceName) { > Map nodeRequests = requests.get(priority); > return (nodeRequests == null) ? {color:red} null : > nodeRequests.get(resourceName); > } > {code} > The second line dereferences the pointer directly without a check. > If the pointer is null, the RM dies. > {quote}2015-03-17 14:14:04,757 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_UPDATE to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739) > at java.lang.Thread.run(Thread.java:722) > {color:red} *2015-03-17 14:14:04,758 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, > bbye..*{color} {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3181) FairScheduler: Fix up outdated findbugs issues
[ https://issues.apache.org/jira/browse/YARN-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369892#comment-14369892 ] Brahma Reddy Battula commented on YARN-3181: [~ka...@cloudera.com] kindly review the attached patch... ,This is failing all the patches, marking it as a blocker...please feel free to change defect severity, if wn't agree with me.. > FairScheduler: Fix up outdated findbugs issues > -- > > Key: YARN-3181 > URL: https://issues.apache.org/jira/browse/YARN-3181 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Brahma Reddy Battula >Priority: Blocker > Attachments: YARN-3181-002.patch, yarn-3181-1.patch > > > In FairScheduler, we have excluded some findbugs-reported errors. Some of > them aren't applicable anymore, and there are a few that can be easily fixed > without needing an exclusion. It would be nice to fix them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369886#comment-14369886 ] Yongjun Zhang commented on YARN-3021: - Running the failed test TestRM locally is successful. > YARN's delegation-token handling disallows certain trust setups to operate > properly over DistCp > --- > > Key: YARN-3021 > URL: https://issues.apache.org/jira/browse/YARN-3021 > Project: Hadoop YARN > Issue Type: Bug > Components: security >Affects Versions: 2.3.0 >Reporter: Harsh J >Assignee: Yongjun Zhang > Attachments: YARN-3021.001.patch, YARN-3021.002.patch, > YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.patch > > > Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, > and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN > clusters. > Now if one logs in with a COMMON credential, and runs a job on A's YARN that > needs to access B's HDFS (such as a DistCp), the operation fails in the RM, > as it attempts a renewDelegationToken(…) synchronously during application > submission (to validate the managed token before it adds it to a scheduler > for automatic renewal). The call obviously fails cause B realm will not trust > A's credentials (here, the RM's principal is the renewer). > In the 1.x JobTracker the same call is present, but it is done asynchronously > and once the renewal attempt failed we simply ceased to schedule any further > attempts of renewals, rather than fail the job immediately. > We should change the logic such that we attempt the renewal but go easy on > the failure and skip the scheduling alone, rather than bubble back an error > to the client, failing the app submission. This way the old behaviour is > retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3181) FairScheduler: Fix up outdated findbugs issues
[ https://issues.apache.org/jira/browse/YARN-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-3181: --- Priority: Blocker (was: Major) > FairScheduler: Fix up outdated findbugs issues > -- > > Key: YARN-3181 > URL: https://issues.apache.org/jira/browse/YARN-3181 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Brahma Reddy Battula >Priority: Blocker > Attachments: YARN-3181-002.patch, yarn-3181-1.patch > > > In FairScheduler, we have excluded some findbugs-reported errors. Some of > them aren't applicable anymore, and there are a few that can be easily fixed > without needing an exclusion. It would be nice to fix them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3181) FairScheduler: Fix up outdated findbugs issues
[ https://issues.apache.org/jira/browse/YARN-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-3181: --- Issue Type: Bug (was: Improvement) > FairScheduler: Fix up outdated findbugs issues > -- > > Key: YARN-3181 > URL: https://issues.apache.org/jira/browse/YARN-3181 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Brahma Reddy Battula > Attachments: YARN-3181-002.patch, yarn-3181-1.patch > > > In FairScheduler, we have excluded some findbugs-reported errors. Some of > them aren't applicable anymore, and there are a few that can be easily fixed > without needing an exclusion. It would be nice to fix them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die
[ https://issues.apache.org/jira/browse/YARN-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369875#comment-14369875 ] Brahma Reddy Battula commented on YARN-3369: Yes,Adding null check is sufficient..priority can null if container got allocated under any other priority,,please check the following snippet for same.. {code} Map nodeRequests = requests.get(priority); return (nodeRequests == null) ? null : nodeRequests.get(resourceName); {code} > Missing NullPointer check in AppSchedulingInfo causes RM to die > > > Key: YARN-3369 > URL: https://issues.apache.org/jira/browse/YARN-3369 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Giovanni Matteo Fumarola >Assignee: Brahma Reddy Battula >Priority: Blocker > Attachments: YARN-3369.patch > > > In AppSchedulingInfo.java the method checkForDeactivation() has these 2 > consecutive lines: > {code} > ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY); > if (request.getNumContainers() > 0) { > {code} > the first line calls getResourceRequest and it can return null. > {code} > synchronized public ResourceRequest getResourceRequest( > Priority priority, String resourceName) { > Map nodeRequests = requests.get(priority); > return (nodeRequests == null) ? {color:red} null : > nodeRequests.get(resourceName); > } > {code} > The second line dereferences the pointer directly without a check. > If the pointer is null, the RM dies. > {quote}2015-03-17 14:14:04,757 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_UPDATE to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739) > at java.lang.Thread.run(Thread.java:722) > {color:red} *2015-03-17 14:14:04,758 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, > bbye..*{color} {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369872#comment-14369872 ] Junping Du commented on YARN-3034: -- I have commit YARN- in. [~Naganarasimha], would you mind rebase the patch and address my minor comments? Thanks! > [Collector wireup] Implement RM starting its timeline collector > --- > > Key: YARN-3034 > URL: https://issues.apache.org/jira/browse/YARN-3034 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, > YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch > > > Per design in YARN-2928, implement resource managers starting their own ATS > writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die
[ https://issues.apache.org/jira/browse/YARN-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-3369: --- Attachment: YARN-3369.patch > Missing NullPointer check in AppSchedulingInfo causes RM to die > > > Key: YARN-3369 > URL: https://issues.apache.org/jira/browse/YARN-3369 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Giovanni Matteo Fumarola >Assignee: Brahma Reddy Battula >Priority: Blocker > Attachments: YARN-3369.patch > > > In AppSchedulingInfo.java the method checkForDeactivation() has these 2 > consecutive lines: > {code} > ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY); > if (request.getNumContainers() > 0) { > {code} > the first line calls getResourceRequest and it can return null. > {code} > synchronized public ResourceRequest getResourceRequest( > Priority priority, String resourceName) { > Map nodeRequests = requests.get(priority); > return (nodeRequests == null) ? {color:red} null : > nodeRequests.get(resourceName); > } > {code} > The second line dereferences the pointer directly without a check. > If the pointer is null, the RM dies. > {quote}2015-03-17 14:14:04,757 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_UPDATE to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739) > at java.lang.Thread.run(Thread.java:722) > {color:red} *2015-03-17 14:14:04,758 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, > bbye..*{color} {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369868#comment-14369868 ] Hadoop QA commented on YARN-3021: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705619/YARN-3021.004.patch against trunk revision 1ccbc29. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRM Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7024//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7024//console This message is automatically generated. > YARN's delegation-token handling disallows certain trust setups to operate > properly over DistCp > --- > > Key: YARN-3021 > URL: https://issues.apache.org/jira/browse/YARN-3021 > Project: Hadoop YARN > Issue Type: Bug > Components: security >Affects Versions: 2.3.0 >Reporter: Harsh J >Assignee: Yongjun Zhang > Attachments: YARN-3021.001.patch, YARN-3021.002.patch, > YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.patch > > > Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, > and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN > clusters. > Now if one logs in with a COMMON credential, and runs a job on A's YARN that > needs to access B's HDFS (such as a DistCp), the operation fails in the RM, > as it attempts a renewDelegationToken(…) synchronously during application > submission (to validate the managed token before it adds it to a scheduler > for automatic renewal). The call obviously fails cause B realm will not trust > A's credentials (here, the RM's principal is the renewer). > In the 1.x JobTracker the same call is present, but it is done asynchronously > and once the renewal attempt failed we simply ceased to schedule any further > attempts of renewals, rather than fail the job immediately. > We should change the logic such that we attempt the renewal but go easy on > the failure and skip the scheduling alone, rather than bubble back an error > to the client, failing the app submission. This way the old behaviour is > retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3333) rename TimelineAggregator etc. to TimelineCollector
[ https://issues.apache.org/jira/browse/YARN-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du resolved YARN-. -- Resolution: Fixed Hadoop Flags: Reviewed > rename TimelineAggregator etc. to TimelineCollector > --- > > Key: YARN- > URL: https://issues.apache.org/jira/browse/YARN- > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN--unit-tests-fixes.patch, YARN-.001.patch, > YARN-.002.patch > > > Per discussions on YARN-2928, let's rename TimelineAggregator, etc. to > TimelineCollector, etc. > There are also several minor issues on the current branch, which can be fixed > as part of this: > - fixing some imports > - missing license in TestTimelineServerClientIntegration.java > - whitespaces > - missing direct dependency -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3333) rename TimelineAggregator etc. to TimelineCollector
[ https://issues.apache.org/jira/browse/YARN-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369865#comment-14369865 ] Junping Du commented on YARN-: -- Thanks [~sjlee0] for the patch! I have commit the patch to YARN-2928. > rename TimelineAggregator etc. to TimelineCollector > --- > > Key: YARN- > URL: https://issues.apache.org/jira/browse/YARN- > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN--unit-tests-fixes.patch, YARN-.001.patch, > YARN-.002.patch > > > Per discussions on YARN-2928, let's rename TimelineAggregator, etc. to > TimelineCollector, etc. > There are also several minor issues on the current branch, which can be fixed > as part of this: > - fixing some imports > - missing license in TestTimelineServerClientIntegration.java > - whitespaces > - missing direct dependency -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3039) [Collector wireup] Implement timeline app-level collector service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3039: -- Summary: [Collector wireup] Implement timeline app-level collector service discovery (was: [Aggregator wireup] Implement ATS app-appgregator service discovery) > [Collector wireup] Implement timeline app-level collector service discovery > --- > > Key: YARN-3039 > URL: https://issues.apache.org/jira/browse/YARN-3039 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Junping Du > Fix For: YARN-2928 > > Attachments: Service Binding for applicationaggregator of ATS > (draft).pdf, Service Discovery For Application Aggregator of ATS (v2).pdf, > YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch, > YARN-3039-v3-core-changes-only.patch, YARN-3039-v4.patch, YARN-3039-v5.patch, > YARN-3039-v6.patch, YARN-3039-v7.patch, YARN-3039-v8.patch, YARN-3039.9.patch > > > Per design in YARN-2928, implement ATS writer service discovery. This is > essential for off-node clients to send writes to the right ATS writer. This > should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3374) Collector's web server should randomly bind an available port
[ https://issues.apache.org/jira/browse/YARN-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3374: -- Summary: Collector's web server should randomly bind an available port (was: Aggregator's web server should randomly bind an available port) > Collector's web server should randomly bind an available port > - > > Key: YARN-3374 > URL: https://issues.apache.org/jira/browse/YARN-3374 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > > It's based on the configuration now. The approach won't work if we move to > app-level aggregator container solution. On NM my start multiple such > aggregators, which cannot bind to the same configured port. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3359) Recover collector list in RM failed over
[ https://issues.apache.org/jira/browse/YARN-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3359: -- Summary: Recover collector list in RM failed over (was: Recover aggregator (collector) list in RM failed over) > Recover collector list in RM failed over > > > Key: YARN-3359 > URL: https://issues.apache.org/jira/browse/YARN-3359 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Junping Du >Assignee: Junping Du > > Per discussion in YARN-3039, split the recover work from RMStateStore in a > separated JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3210) [Source organization] Refactor timeline collector according to new code organization
[ https://issues.apache.org/jira/browse/YARN-3210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3210: -- Summary: [Source organization] Refactor timeline collector according to new code organization (was: [Source organization] Refactor timeline aggregator according to new code organization) > [Source organization] Refactor timeline collector according to new code > organization > > > Key: YARN-3210 > URL: https://issues.apache.org/jira/browse/YARN-3210 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Labels: refactor > Fix For: YARN-2928 > > Attachments: YARN-3210-022715.patch, YARN-3210-030215.patch, > YARN-3210-030215_1.patch, YARN-3210-030215_2.patch > > > We may want to refactor the code of timeline aggregator according to the > discussion of YARN-3166, the code organization for timeline service v2. We > need to refactor the code after we reach an agreement on the aggregator part > of YARN-3166. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3040: -- Summary: [Data Model] Make putEntities operation be aware of the app's context (was: [Data Model] Implement client-side API for handling flows) > [Data Model] Make putEntities operation be aware of the app's context > - > > Key: YARN-3040 > URL: https://issues.apache.org/jira/browse/YARN-3040 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3040.1.patch > > > Per design in YARN-2928, implement client-side API for handling *flows*. > Frameworks should be able to define and pass in all attributes of flows and > flow runs to YARN, and they should be passed into ATS writers. > YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3167) [Collector implementation] Implement the core functionality of the timeline collector
[ https://issues.apache.org/jira/browse/YARN-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3167: -- Summary: [Collector implementation] Implement the core functionality of the timeline collector (was: [Aggregator implementation] Implement the core functionality of the TimelineAggregator service) > [Collector implementation] Implement the core functionality of the timeline > collector > - > > Key: YARN-3167 > URL: https://issues.apache.org/jira/browse/YARN-3167 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C > Attachments: RM-AM-NM-Aggregator.png, > Sequence_diagram_User_RM_AM_NM_Aggregator_Writer.png > > > The basic skeleton of the timeline aggregator has been set up by YARN-3030. > We need to implement the core functionality of the base aggregator service. > The key things include > - handling the requests from clients (sync or async) > - buffering data > - handling the aggregation logic > - invoking the storage API -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3116: -- Summary: [Collector wireup] We need an assured way to determine if a container is an AM container on NM (was: [Aggregator wireup] We need an assured way to determine if a container is an AM container on NM) > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3040) [Data Model] Implement client-side API for handling flows
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369848#comment-14369848 ] Zhijie Shen commented on YARN-3040: --- Please hold the review. Per offline discussion. For AM and NM use case, we can move the context info to the aggregator directly. I'll create a new patch soon. > [Data Model] Implement client-side API for handling flows > - > > Key: YARN-3040 > URL: https://issues.apache.org/jira/browse/YARN-3040 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3040.1.patch > > > Per design in YARN-2928, implement client-side API for handling *flows*. > Frameworks should be able to define and pass in all attributes of flows and > flow runs to YARN, and they should be passed into ATS writers. > YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3115) [Collector wireup] Work-preserving restarting of per-node timeline collector
[ https://issues.apache.org/jira/browse/YARN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3115: -- Summary: [Collector wireup] Work-preserving restarting of per-node timeline collector (was: [Aggregator wireup] Work-preserving restarting of per-node aggregator) > [Collector wireup] Work-preserving restarting of per-node timeline collector > > > Key: YARN-3115 > URL: https://issues.apache.org/jira/browse/YARN-3115 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Junping Du > > YARN-3030 makes the per-node aggregator work as the aux service of a NM. It > contains the states of the per-app aggregators corresponding to the running > AM containers on this NM. While NM is restarted in work-preserving mode, this > information of per-node aggregator needs to be carried on over restarting too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3087) [Collector implementation] the REST server (web server) for per-node collector does not work if it runs inside node manager
[ https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3087: -- Summary: [Collector implementation] the REST server (web server) for per-node collector does not work if it runs inside node manager (was: [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager) > [Collector implementation] the REST server (web server) for per-node > collector does not work if it runs inside node manager > --- > > Key: YARN-3087 > URL: https://issues.apache.org/jira/browse/YARN-3087 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Li Lu > Fix For: YARN-2928 > > Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch, > YARN-3087-022615.patch > > > This is related to YARN-3030. YARN-3030 sets up a per-node timeline > aggregator and the associated REST server. It runs fine as a standalone > process, but does not work if it runs inside the node manager due to possible > collisions of servlet mapping. > Exception: > {noformat} > org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for > v2 not found > at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232) > at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140) > at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) > at > com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) > at > com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) > ... > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3032) [Collector implementation] Implement timeline collector functionality to serve ATS readers' requests for live apps
[ https://issues.apache.org/jira/browse/YARN-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3032: -- Summary: [Collector implementation] Implement timeline collector functionality to serve ATS readers' requests for live apps (was: [Aggregator implementation] Implement ATS writer functionality to serve ATS readers' requests for live apps) > [Collector implementation] Implement timeline collector functionality to > serve ATS readers' requests for live apps > -- > > Key: YARN-3032 > URL: https://issues.apache.org/jira/browse/YARN-3032 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > > Per design in YARN-2928, implement the functionality in ATS writer to serve > data for live apps coming from ATS readers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3038) [Collector wireup] Handle timeline collector failure scenarios
[ https://issues.apache.org/jira/browse/YARN-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3038: -- Summary: [Collector wireup] Handle timeline collector failure scenarios (was: [Aggregator wireup] Handle ATS writer failure scenarios) > [Collector wireup] Handle timeline collector failure scenarios > -- > > Key: YARN-3038 > URL: https://issues.apache.org/jira/browse/YARN-3038 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Varun Saxena > > Per design in YARN-2928, consider various ATS writer failure scenarios, and > implement proper handling. > For example, ATS writers may fail and exit due to OOM. It should be retried a > certain number of times in that case. We also need to tie fatal ATS writer > failures (after exhausting all retries) to the application failure, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3031) [Storage abstraction] Create backing storage write interface for timeline collectors
[ https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3031: -- Summary: [Storage abstraction] Create backing storage write interface for timeline collectors (was: [Storage abstraction] Create backing storage write interface for ATS writers) > [Storage abstraction] Create backing storage write interface for timeline > collectors > > > Key: YARN-3031 > URL: https://issues.apache.org/jira/browse/YARN-3031 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C > Attachments: Sequence_diagram_write_interaction.2.png, > Sequence_diagram_write_interaction.png, YARN-3031.01.patch, > YARN-3031.02.patch, YARN-3031.03.patch > > > Per design in YARN-2928, come up with the interface for the ATS writer to > write to various backing storages. The interface should be created to capture > the right level of abstractions so that it will enable all backing storage > implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3033) [Collector wireup] Implement NM starting the standalone timeline collector daemon
[ https://issues.apache.org/jira/browse/YARN-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3033: -- Summary: [Collector wireup] Implement NM starting the standalone timeline collector daemon (was: [Aggregator wireup] Implement NM starting the standalone ATS writer companion) > [Collector wireup] Implement NM starting the standalone timeline collector > daemon > - > > Key: YARN-3033 > URL: https://issues.apache.org/jira/browse/YARN-3033 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Li Lu > Attachments: MappingandlaunchingApplevelTimelineaggregators.pdf > > > Per design in YARN-2928, implement node managers starting the ATS writer > companion. In YARN-2928, we already have an auxiliary service based solution. > Per discussion below, the bulk of that approach has actually been done as > part of YARN-3030. In this ticket we can work on the remaining tasks, for > example: > # any needed change for configuration, esp. running it inside the NM (e.g. > the number of servlet threads) > # set up a start script that starts the per-node aggregator as a standalone > daemon > # for the standalone mode, implement a service that receives requests to set > up and tear down the app-level data -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3034: -- Summary: [Collector wireup] Implement RM starting its timeline collector (was: [Aggregator wireup] Implement RM starting its ATS writer) > [Collector wireup] Implement RM starting its timeline collector > --- > > Key: YARN-3034 > URL: https://issues.apache.org/jira/browse/YARN-3034 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, > YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch > > > Per design in YARN-2928, implement resource managers starting their own ATS > writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3378) a load test client that can replay a volume of history files
[ https://issues.apache.org/jira/browse/YARN-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu reassigned YARN-3378: --- Assignee: Li Lu > a load test client that can replay a volume of history files > > > Key: YARN-3378 > URL: https://issues.apache.org/jira/browse/YARN-3378 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Li Lu > > It might be good to create a load test client that can replay a large volume > of history files into the timeline service. One can envision running such a > load test client as a mapreduce job and generate a fair amount of load. It > would be useful to spot check correctness, and more importantly observe > performance characteristic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3030) [Collector wireup] Set up timeline collector with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3030: -- Summary: [Collector wireup] Set up timeline collector with basic request serving structure and lifecycle (was: [Aggregator wireup] Set up ATS writer with basic request serving structure and lifecycle) > [Collector wireup] Set up timeline collector with basic request serving > structure and lifecycle > --- > > Key: YARN-3030 > URL: https://issues.apache.org/jira/browse/YARN-3030 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Fix For: YARN-2928 > > Attachments: YARN-3030.001.patch, YARN-3030.002.patch, > YARN-3030.003.patch, YARN-3030.004.patch > > > Per design in YARN-2928, create an ATS writer as a service, and implement the > basic service structure including the lifecycle management. > Also, as part of this JIRA, we should come up with the ATS client API for > sending requests to this ATS writer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3377) TestTimelineServiceClientIntegration fails
[ https://issues.apache.org/jira/browse/YARN-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369838#comment-14369838 ] Sangjin Lee commented on YARN-3377: --- The root cause is known. I'll post a patch once YARN- is resolved. > TestTimelineServiceClientIntegration fails > -- > > Key: YARN-3377 > URL: https://issues.apache.org/jira/browse/YARN-3377 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Minor > > TestTimelineServiceClientIntegration fails. It appears we are getting 500 > from the timeline collector. This appears to be mostly an issue with the test > itself. > {noformat} > --- > Test set: > org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration > --- > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 33.503 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration > testPutEntities(org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration) > Time elapsed: 32.606 sec <<< ERROR! > org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the response > from the timeline server. > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putObjects(TimelineClientImpl.java:457) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putObjects(TimelineClientImpl.java:391) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:368) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:342) > at > org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration.testPutEntities(TestTimelineServiceClientIntegration.java:74) > {noformat} > The relevant piece from the server side: > {noformat} > Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.PackagesResourceConfig init > INFO: Scanning for root resource and provider classes in the packages: > org.apache.hadoop.yarn.server.timelineservice.collector > org.apache.hadoop.yarn.webapp > org.apache.hadoop.yarn.webapp > Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.ScanningResourceConfig > logClasses > INFO: Root resource classes found: > class org.apache.hadoop.yarn.webapp.MyTestWebService > class > org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorWebService > Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.ScanningResourceConfig > logClasses > INFO: Provider classes found: > class org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider > class org.apache.hadoop.yarn.webapp.GenericExceptionHandler > class org.apache.hadoop.yarn.webapp.MyTestJAXBContextResolver > Mar 19, 2015 10:48:30 AM > com.sun.jersey.server.impl.application.WebApplicationImpl _initiate > INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM' > Mar 19, 2015 10:48:31 AM > com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 > resolve > SEVERE: null > java.lang.IllegalAccessException: Class > com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 can > not access a member of class > org.apache.hadoop.yarn.webapp.MyTestWebService$MyInfo with modifiers "public" > at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:95) > at java.lang.Class.newInstance0(Class.java:366) > at java.lang.Class.newInstance(Class.java:325) > at > com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8.resolve(WadlGeneratorJAXBGrammarGenerator.java:467) > at > com.sun.jersey.server.wadl.WadlGenerator$ExternalGrammarDefinition.resolve(WadlGenerator.java:181) > at > com.sun.jersey.server.wadl.ApplicationDescription.resolve(ApplicationDescription.java:81) > at > com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.attachTypes(WadlGeneratorJAXBGrammarGenerator.java:518) > at com.sun.jersey.server.wadl.WadlBuilder.generate(WadlBuilder.java:124) > at > com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:104) > at > com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:120) > at > com.sun.jersey.server.impl.wadl.WadlMethodFactory$WadlOptionsMethodDispatcher.dispatch(WadlMethodFactory.java:98) > at > com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) > at > com.sun.jersey.server.impl.uri.rules.RightHandP