[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951785#comment-13951785 ] Jian He commented on YARN-1879: --- Both register and unregister are checking whether the application has registered/unregistered before. If the application has already registered/unregistered, duplicate register/unregister attempts can cause exceptions. are they still idempotent ? > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Xuan Gong >Priority: Blocker > Attachments: YARN-1879.1.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1890) Too many unnecessary logs are logged while accessing applicationMaster web UI.
[ https://issues.apache.org/jira/browse/YARN-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951783#comment-13951783 ] Jian He commented on YARN-1890: --- +1 for cleaning it up > Too many unnecessary logs are logged while accessing applicationMaster web UI. > -- > > Key: YARN-1890 > URL: https://issues.apache.org/jira/browse/YARN-1890 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith >Assignee: Rohith >Priority: Minor > > Accessing applicationMaster UI which is redirected from RM UI, logs too many > logs in ResourceManager logs and ProxyServer logs. On every refresh, logging > is done at WebAppProxyServlet.doGet(). All my RM and Proxyserver logs are > filled with UI information logs which are not really necessary for user. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951780#comment-13951780 ] Xuan Gong commented on YARN-1879: - [~ozawa] Mind if i take over this ticket. I will make some progress this weekend. Feel free to take it back. > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Blocker > Attachments: YARN-1879.1.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1879: Attachment: YARN-1879.1.patch > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Xuan Gong >Priority: Blocker > Attachments: YARN-1879.1.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-1879: --- Assignee: Xuan Gong (was: Tsuyoshi OZAWA) > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Xuan Gong >Priority: Blocker > Attachments: YARN-1879.1.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951779#comment-13951779 ] Xuan Gong commented on YARN-1879: - ApplicationMasterService#registerApplicationMaster does have the lastResponse stuff. So, I think that it is OK for us to add the annotation in this ticket. ApplicationMasterService#finishApplicationMaster is to submit RMAppAttemptUnregistrationEvent, and check status. So, adding idempotent annotation is OK, too So, we can fix three of APIs together in this ticket. Mark * allocation --> AtMostOnce * registerApplicationMaster --> Idempotent * finishApplicationMaster --> Idempotent > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Blocker > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator
[ https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951778#comment-13951778 ] Fengdong Yu commented on YARN-1889: --- Hi, Zhiguo, what my comments you addressed in your new patch? I cannot see any change. 1. there are still tabs in the patch 2. move following initialization into the constructor {code} + private Priority priority = recordFactory.newRecordInstance(Priority.class); + private ResourceWeights resourceWeights = new ResourceWeights(); {code} 3: As Sandy said, don't use recordFactory.newRecordInstance(Priority.class), instead, use Priority.newInstance(1) 4: so remove priority.setPriority(1); {code} public Priority getPriority() { // Right now per-app priorities are not passed to scheduler, // so everyone has the same priority. -Priority p = recordFactory.newRecordInstance(Priority.class); -p.setPriority(1); -return p; +priority.setPriority(1); +return priority; } {code} 5: please rename to getResourceWeights() {code} + public ResourceWeights getResourceWeightsObject() { + return resourceWeights; + } {code} > avoid creating new objects on each fair scheduler call to AppSchedulable > comparator > --- > > Key: YARN-1889 > URL: https://issues.apache.org/jira/browse/YARN-1889 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Hong Zhiguo >Priority: Minor > Attachments: YARN-1889.patch > > > In fair scheduler, in each scheduling attempt, a full sort is > performed on List of AppSchedulable, which invokes Comparator.compare > method many times. Both FairShareComparator and DRFComparator call > AppSchedulable.getWeights, and AppSchedulable.getPriority. > A new ResourceWeights object is allocated on each call of getWeights, > and the same for getPriority. This introduces a lot of pressure to > GC because these methods are called very very frequently. > Below test case shows improvement on performance and GC behaviour. The > results show that the GC pressure during processing NodeUpdate is recuded > half by this patch. > The code to show the improvement: (Add it to TestFairScheduler.java) > import java.lang.management.GarbageCollectorMXBean; > import java.lang.management.ManagementFactory; > public void printGCStats() { > long totalGarbageCollections = 0; > long garbageCollectionTime = 0; > for(GarbageCollectorMXBean gc : > ManagementFactory.getGarbageCollectorMXBeans()) { > long count = gc.getCollectionCount(); > if(count >= 0) { > totalGarbageCollections += count; > } > long time = gc.getCollectionTime(); > if(time >= 0) { > garbageCollectionTime += time; > } > } > System.out.println("Total Garbage Collections: " > + totalGarbageCollections); > System.out.println("Total Garbage Collection Time (ms): " > + garbageCollectionTime); > } > @Test > public void testImpactOnGC() throws Exception { > scheduler.reinitialize(conf, resourceManager.getRMContext()); > // Add nodes > int numNode = 1; > for (int i = 0; i < numNode; ++i) { > String host = String.format("192.1.%d.%d", i/256, i%256); > RMNode node = > MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, > host); > NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node); > scheduler.handle(nodeEvent); > assertEquals(1024 * 64 * (i+1), > scheduler.getClusterCapacity().getMemory()); > } > assertEquals(numNode, scheduler.getNumClusterNodes()); > assertEquals(1024 * 64 * numNode, > scheduler.getClusterCapacity().getMemory()); > // add apps, each app has 100 containers. > int minReqSize = > > FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB; > int numApp = 8000; > int priority = 1; > for (int i = 1; i < numApp + 1; ++i) { > ApplicationAttemptId attemptId = createAppAttemptId(i, 1); > AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent( > attemptId.getApplicationId(), "queue1", "user1"); > scheduler.handle(appAddedEvent); > AppAttemptAddedSchedulerEvent attemptAddedEvent = > new AppAttemptAddedSchedulerEvent(attemptId, false); > scheduler.handle(attemptAddedEvent); > createSchedulingRequestExistingApplication(minReqSize * 2, 1, > priority, attemptId); > } > scheduler.update(); > assertEquals(numApp, scheduler.getQueueManager().getLeafQueue("queue1", > true) > .getRunnableAppSchedulables().size()); > System.out.println("GC stats before NodeUpdate processing:"); > printGCStats(); > int h
[jira] [Commented] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator
[ https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951774#comment-13951774 ] Hong Zhiguo commented on YARN-1889: --- Hi, Sandy, During processing NodeUpdate events, the number of GC and the accumulated GC time is reduced about half. > avoid creating new objects on each fair scheduler call to AppSchedulable > comparator > --- > > Key: YARN-1889 > URL: https://issues.apache.org/jira/browse/YARN-1889 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Hong Zhiguo >Priority: Minor > Attachments: YARN-1889.patch > > > In fair scheduler, in each scheduling attempt, a full sort is > performed on List of AppSchedulable, which invokes Comparator.compare > method many times. Both FairShareComparator and DRFComparator call > AppSchedulable.getWeights, and AppSchedulable.getPriority. > A new ResourceWeights object is allocated on each call of getWeights, > and the same for getPriority. This introduces a lot of pressure to > GC because these methods are called very very frequently. > Below test case shows improvement on performance and GC behaviour. The > results show that the GC pressure during processing NodeUpdate is recuded > half by this patch. > The code to show the improvement: (Add it to TestFairScheduler.java) > import java.lang.management.GarbageCollectorMXBean; > import java.lang.management.ManagementFactory; > public void printGCStats() { > long totalGarbageCollections = 0; > long garbageCollectionTime = 0; > for(GarbageCollectorMXBean gc : > ManagementFactory.getGarbageCollectorMXBeans()) { > long count = gc.getCollectionCount(); > if(count >= 0) { > totalGarbageCollections += count; > } > long time = gc.getCollectionTime(); > if(time >= 0) { > garbageCollectionTime += time; > } > } > System.out.println("Total Garbage Collections: " > + totalGarbageCollections); > System.out.println("Total Garbage Collection Time (ms): " > + garbageCollectionTime); > } > @Test > public void testImpactOnGC() throws Exception { > scheduler.reinitialize(conf, resourceManager.getRMContext()); > // Add nodes > int numNode = 1; > for (int i = 0; i < numNode; ++i) { > String host = String.format("192.1.%d.%d", i/256, i%256); > RMNode node = > MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, > host); > NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node); > scheduler.handle(nodeEvent); > assertEquals(1024 * 64 * (i+1), > scheduler.getClusterCapacity().getMemory()); > } > assertEquals(numNode, scheduler.getNumClusterNodes()); > assertEquals(1024 * 64 * numNode, > scheduler.getClusterCapacity().getMemory()); > // add apps, each app has 100 containers. > int minReqSize = > > FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB; > int numApp = 8000; > int priority = 1; > for (int i = 1; i < numApp + 1; ++i) { > ApplicationAttemptId attemptId = createAppAttemptId(i, 1); > AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent( > attemptId.getApplicationId(), "queue1", "user1"); > scheduler.handle(appAddedEvent); > AppAttemptAddedSchedulerEvent attemptAddedEvent = > new AppAttemptAddedSchedulerEvent(attemptId, false); > scheduler.handle(attemptAddedEvent); > createSchedulingRequestExistingApplication(minReqSize * 2, 1, > priority, attemptId); > } > scheduler.update(); > assertEquals(numApp, scheduler.getQueueManager().getLeafQueue("queue1", > true) > .getRunnableAppSchedulables().size()); > System.out.println("GC stats before NodeUpdate processing:"); > printGCStats(); > int hb_num = 5000; > long start = System.nanoTime(); > for (int i = 0; i < hb_num; ++i) { > String host = String.format("192.1.%d.%d", i/256, i%256); > RMNode node = > MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), 5000, > host); > NodeUpdateSchedulerEvent nodeEvent = new NodeUpdateSchedulerEvent(node); > scheduler.handle(nodeEvent); > } > long end = System.nanoTime(); > System.out.printf("processing time for a NodeUpdate in average: %d us\n", > (end - start)/(hb_num * 1000)); > System.out.println("GC stats after NodeUpdate processing:"); > printGCStats(); > } -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator
[ https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951773#comment-13951773 ] Hong Zhiguo commented on YARN-1889: --- HI, Fengdong, I'll update the patch according to your comments. Thanks > avoid creating new objects on each fair scheduler call to AppSchedulable > comparator > --- > > Key: YARN-1889 > URL: https://issues.apache.org/jira/browse/YARN-1889 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Hong Zhiguo >Priority: Minor > Attachments: YARN-1889.patch > > > In fair scheduler, in each scheduling attempt, a full sort is > performed on List of AppSchedulable, which invokes Comparator.compare > method many times. Both FairShareComparator and DRFComparator call > AppSchedulable.getWeights, and AppSchedulable.getPriority. > A new ResourceWeights object is allocated on each call of getWeights, > and the same for getPriority. This introduces a lot of pressure to > GC because these methods are called very very frequently. > Below test case shows improvement on performance and GC behaviour. The > results show that the GC pressure during processing NodeUpdate is recuded > half by this patch. > The code to show the improvement: (Add it to TestFairScheduler.java) > import java.lang.management.GarbageCollectorMXBean; > import java.lang.management.ManagementFactory; > public void printGCStats() { > long totalGarbageCollections = 0; > long garbageCollectionTime = 0; > for(GarbageCollectorMXBean gc : > ManagementFactory.getGarbageCollectorMXBeans()) { > long count = gc.getCollectionCount(); > if(count >= 0) { > totalGarbageCollections += count; > } > long time = gc.getCollectionTime(); > if(time >= 0) { > garbageCollectionTime += time; > } > } > System.out.println("Total Garbage Collections: " > + totalGarbageCollections); > System.out.println("Total Garbage Collection Time (ms): " > + garbageCollectionTime); > } > @Test > public void testImpactOnGC() throws Exception { > scheduler.reinitialize(conf, resourceManager.getRMContext()); > // Add nodes > int numNode = 1; > for (int i = 0; i < numNode; ++i) { > String host = String.format("192.1.%d.%d", i/256, i%256); > RMNode node = > MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, > host); > NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node); > scheduler.handle(nodeEvent); > assertEquals(1024 * 64 * (i+1), > scheduler.getClusterCapacity().getMemory()); > } > assertEquals(numNode, scheduler.getNumClusterNodes()); > assertEquals(1024 * 64 * numNode, > scheduler.getClusterCapacity().getMemory()); > // add apps, each app has 100 containers. > int minReqSize = > > FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB; > int numApp = 8000; > int priority = 1; > for (int i = 1; i < numApp + 1; ++i) { > ApplicationAttemptId attemptId = createAppAttemptId(i, 1); > AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent( > attemptId.getApplicationId(), "queue1", "user1"); > scheduler.handle(appAddedEvent); > AppAttemptAddedSchedulerEvent attemptAddedEvent = > new AppAttemptAddedSchedulerEvent(attemptId, false); > scheduler.handle(attemptAddedEvent); > createSchedulingRequestExistingApplication(minReqSize * 2, 1, > priority, attemptId); > } > scheduler.update(); > assertEquals(numApp, scheduler.getQueueManager().getLeafQueue("queue1", > true) > .getRunnableAppSchedulables().size()); > System.out.println("GC stats before NodeUpdate processing:"); > printGCStats(); > int hb_num = 5000; > long start = System.nanoTime(); > for (int i = 0; i < hb_num; ++i) { > String host = String.format("192.1.%d.%d", i/256, i%256); > RMNode node = > MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), 5000, > host); > NodeUpdateSchedulerEvent nodeEvent = new NodeUpdateSchedulerEvent(node); > scheduler.handle(nodeEvent); > } > long end = System.nanoTime(); > System.out.printf("processing time for a NodeUpdate in average: %d us\n", > (end - start)/(hb_num * 1000)); > System.out.println("GC stats after NodeUpdate processing:"); > printGCStats(); > } -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951704#comment-13951704 ] Vinod Kumar Vavilapalli commented on YARN-1879: --- *Sigh* I meant "For now, split this JIRA and just make allocate() API {{AtMostOnce}}? That is the API that is causing our tests to fail. Thoughts?" > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Blocker > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951700#comment-13951700 ] Vinod Kumar Vavilapalli commented on YARN-1879: --- That's true. I mentioned retry-cache but didn't say AtMostOnce. My Bad. But I remember register/unregister APIs didn't have the lastResponse stuff. So register/unregister should be enhanced with the retry-cache support. For now, split this JIRA and just make allocate() API idempotent? That is the API that is causing our tests to fail. Thoughts? > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Blocker > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951696#comment-13951696 ] Jian He commented on YARN-1879: --- allocate() is similar to nodeHeartBeat which returns previous response even with multiple retries, so should be AtMostOnce? > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Blocker > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951693#comment-13951693 ] Vinod Kumar Vavilapalli commented on YARN-1879: --- BTW, +1 for marking all of them as idempotent. We already have retry-cache like mechanism built into ResourceManager. > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Blocker > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1879: -- Priority: Blocker (was: Major) Target Version/s: 2.4.0 (was: 2.5.0) Okay, we have been running some tests with RM HA and because of these missing annotations, we are running into apps that fail during fail-over. I think this should be fixed in 2.4.0 itself. [~ozawa]/[~xgong]/[~jianhe], can we make progress on this? Thanks! > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Blocker > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1017) Document RM Restart feature
[ https://issues.apache.org/jira/browse/YARN-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951647#comment-13951647 ] Hudson commented on YARN-1017: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5429 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5429/]) YARN-1017. Added documentation for ResourceManager Restart. (jianhe) (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1582913) * /hadoop/common/trunk/hadoop-project/src/site/site.xml * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRestart.apt.vm > Document RM Restart feature > --- > > Key: YARN-1017 > URL: https://issues.apache.org/jira/browse/YARN-1017 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Jian He >Priority: Blocker > Fix For: 2.4.0 > > Attachments: rm-restart-doc-1.patch, rm-restart-doc-2.patch, > rm-restart-doc-3.patch > > > This should give users a general idea about how RM Restart works and how to > use RM Restart -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1885) yarn logs command does not provide the application logs for some applications
[ https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951644#comment-13951644 ] Jian He commented on YARN-1885: --- One possible fix is that on NM-resync, today we are ignoring all the container statuses except the AM container. We can change it to notify RMAppAttempt about the other containers too so that the new attempt knows the nodes on which the previous containers ran. > yarn logs command does not provide the application logs for some applications > - > > Key: YARN-1885 > URL: https://issues.apache.org/jira/browse/YARN-1885 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Arpit Gupta > > During our HA testing we have seen cases where yarn application logs are not > available through the cli but i can look at AM logs through the UI. RM was > also being restarted in the background as the application was running. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1888) Not add NodeManager to inactiveRMNodes when reboot NodeManager which have different port
[ https://issues.apache.org/jira/browse/YARN-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951591#comment-13951591 ] Karthik Kambatla commented on YARN-1888: Don't think this is a bug. People should be able to run multiple NMs on a node. The port 0 is primarily for convenience. > Not add NodeManager to inactiveRMNodes when reboot NodeManager which have > different port > > > Key: YARN-1888 > URL: https://issues.apache.org/jira/browse/YARN-1888 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: zhaoyunjiong >Priority: Minor > Attachments: YARN-1888.patch > > > When NodeManager's port set to 0, reboot NodeManager will cause "Losts Nodes" > inaccurate. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1885) yarn logs command does not provide the application logs for some applications
[ https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951573#comment-13951573 ] Jian He commented on YARN-1885: --- If this is the last AM retry, no new AM will be created, and NM on re-sync after RM restarts won't get notification and won't aggregate the container logs for this application at all. > yarn logs command does not provide the application logs for some applications > - > > Key: YARN-1885 > URL: https://issues.apache.org/jira/browse/YARN-1885 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Arpit Gupta > > During our HA testing we have seen cases where yarn application logs are not > available through the cli but i can look at AM logs through the UI. RM was > also being restarted in the background as the application was running. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1885) yarn logs command does not provide the application logs for some applications
[ https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951556#comment-13951556 ] Jian He commented on YARN-1885: --- Talked with Vinod offline, one thing we observed is that after RM restart, if the new AM is not scheduled any containers on the nodes on which its previous AM's containers were running, this NM won't get the application_completed signal from RM when the new AM completes and so this NM won't aggregate the logs for the previous containers of this application. But it should only be a leak of local container logs. yarn logs -applicationId shouldn't return nothing. it should at least return the latest AM's logs > yarn logs command does not provide the application logs for some applications > - > > Key: YARN-1885 > URL: https://issues.apache.org/jira/browse/YARN-1885 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Arpit Gupta > > During our HA testing we have seen cases where yarn application logs are not > available through the cli but i can look at AM logs through the UI. RM was > also being restarted in the background as the application was running. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1892) Excessive logging in RM
Siddharth Seth created YARN-1892: Summary: Excessive logging in RM Key: YARN-1892 URL: https://issues.apache.org/jira/browse/YARN-1892 Project: Hadoop YARN Issue Type: Bug Reporter: Siddharth Seth Priority: Minor Mostly in the CS I believe {code} INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: Application application_1395435468498_0011 reserved container container_1395435468498_0011_01_000213 on node host: #containers=5 available=4096 used=20960, currently has 1 at priority 4; currentReservation 4096 {code} {code} INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: hive2 usedResources: clusterResources: currentCapacity 0.25 required potentialNewCapacity: 0.255 ( max-capacity: 0.25) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1017) Document RM Restart feature
[ https://issues.apache.org/jira/browse/YARN-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951540#comment-13951540 ] Hadoop QA commented on YARN-1017: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12637556/rm-restart-doc-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3487//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3487//console This message is automatically generated. > Document RM Restart feature > --- > > Key: YARN-1017 > URL: https://issues.apache.org/jira/browse/YARN-1017 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Jian He >Priority: Blocker > Attachments: rm-restart-doc-1.patch, rm-restart-doc-2.patch, > rm-restart-doc-3.patch > > > This should give users a general idea about how RM Restart works and how to > use RM Restart -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1891) Document NodeManager health-monitoring
[ https://issues.apache.org/jira/browse/YARN-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951485#comment-13951485 ] Hudson commented on YARN-1891: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5427 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5427/]) YARN-1891. Added documentation for NodeManager health-monitoring. Contributed by Varun Vasudev. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1582891) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManager.apt.vm > Document NodeManager health-monitoring > -- > > Key: YARN-1891 > URL: https://issues.apache.org/jira/browse/YARN-1891 > Project: Hadoop YARN > Issue Type: Task >Reporter: Varun Vasudev >Assignee: Varun Vasudev >Priority: Minor > Fix For: 2.4.0 > > Attachments: apache-yarn-1891.0.patch > > > Start documenting node manager starting with the health monitoring. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1017) Document RM Restart feature
[ https://issues.apache.org/jira/browse/YARN-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1017: -- Attachment: rm-restart-doc-3.patch Thanks Vinod for the review, updated the doc > Document RM Restart feature > --- > > Key: YARN-1017 > URL: https://issues.apache.org/jira/browse/YARN-1017 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Jian He >Priority: Blocker > Attachments: rm-restart-doc-1.patch, rm-restart-doc-2.patch, > rm-restart-doc-3.patch > > > This should give users a general idea about how RM Restart works and how to > use RM Restart -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1891) Document NodeManager health-monitoring
[ https://issues.apache.org/jira/browse/YARN-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951459#comment-13951459 ] Vinod Kumar Vavilapalli commented on YARN-1891: --- Great, this looks good, +1. Checking this in. > Document NodeManager health-monitoring > -- > > Key: YARN-1891 > URL: https://issues.apache.org/jira/browse/YARN-1891 > Project: Hadoop YARN > Issue Type: Task >Reporter: Varun Vasudev >Assignee: Varun Vasudev >Priority: Minor > Attachments: apache-yarn-1891.0.patch > > > Start documenting node manager starting with the health monitoring. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1864) Fair Scheduler Dynamic Hierarchical User Queues
[ https://issues.apache.org/jira/browse/YARN-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951458#comment-13951458 ] Sandy Ryza commented on YARN-1864: -- With your proposal, would it be possible to choose different parent queues depending on the submission? A couple policies we'd like to ultimately be able to support would be "choose the parent queue based on the user's primary group, and then place the job into a queue underneath it with the user's name" and "if a user is any of a defined set of users, put it in a particular parent queue, and then create a child queue underneath it with the user's name". > Fair Scheduler Dynamic Hierarchical User Queues > --- > > Key: YARN-1864 > URL: https://issues.apache.org/jira/browse/YARN-1864 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: Ashwin Shankar > Labels: scheduler > Attachments: YARN-1864-v1.txt > > > In Fair Scheduler, we want to be able to create user queues under any parent > queue in the hierarchy. For eg. Say user1 submits a job to a parent queue > called root.allUserQueues, we want be able to create a new queue called > root.allUserQueues.user1 and run user1's job in it.Any further jobs submitted > by this user to root.allUserQueues will be run in this newly created > root.allUserQueues.user1. > This is very similar to the 'user-as-default' feature in Fair Scheduler which > creates user queues under root queue. But we want the ability to create user > queues under ANY parent queue. > Why do we want this ? > 1. Preemption : these dynamically created user queues can preempt each other > if its fair share is not met. So there is fairness among users. > User queues can also preempt other non-user leaf queue as well if below fair > share. > 2. Allocation to user queues : we want all the user queries(adhoc) to consume > only a fraction of resources in the shared cluster. By creating this > feature,we could do that by giving a fair share to the parent user queue > which is then redistributed to all the dynamically created user queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions
[ https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951456#comment-13951456 ] Hadoop QA commented on YARN-596: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12637524/YARN-596.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3486//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3486//console This message is automatically generated. > In fair scheduler, intra-application container priorities affect > inter-application preemption decisions > --- > > Key: YARN-596 > URL: https://issues.apache.org/jira/browse/YARN-596 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, > YARN-596.patch, YARN-596.patch > > > In the fair scheduler, containers are chosen for preemption in the following > way: > All containers for all apps that are in queues that are over their fair share > are put in a list. > The list is sorted in order of the priority that the container was requested > in. > This means that an application can shield itself from preemption by > requesting it's containers at higher priorities, which doesn't really make > sense. > Also, an application that is not over its fair share, but that is in a queue > that is over it's fair share is just as likely to have containers preempted > as an application that is over its fair share. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1017) Document RM Restart feature
[ https://issues.apache.org/jira/browse/YARN-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951427#comment-13951427 ] Vinod Kumar Vavilapalli commented on YARN-1017: --- +1, except for a couple of places where you say "configures" instead of "configuration". > Document RM Restart feature > --- > > Key: YARN-1017 > URL: https://issues.apache.org/jira/browse/YARN-1017 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Jian He >Priority: Blocker > Attachments: rm-restart-doc-1.patch, rm-restart-doc-2.patch > > > This should give users a general idea about how RM Restart works and how to > use RM Restart -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions
[ https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-596: - Attachment: YARN-596.patch Update a new patch without cloneQueueApps(). Instead, add preemptedResources in the FSSchedulerApp. > In fair scheduler, intra-application container priorities affect > inter-application preemption decisions > --- > > Key: YARN-596 > URL: https://issues.apache.org/jira/browse/YARN-596 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, > YARN-596.patch, YARN-596.patch > > > In the fair scheduler, containers are chosen for preemption in the following > way: > All containers for all apps that are in queues that are over their fair share > are put in a list. > The list is sorted in order of the priority that the container was requested > in. > This means that an application can shield itself from preemption by > requesting it's containers at higher priorities, which doesn't really make > sense. > Also, an application that is not over its fair share, but that is in a queue > that is over it's fair share is just as likely to have containers preempted > as an application that is over its fair share. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1883) TestRMAdminService fails due to inconsistent entries in UserGroups
[ https://issues.apache.org/jira/browse/YARN-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951290#comment-13951290 ] Hudson commented on YARN-1883: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5424 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5424/]) YARN-1883. TestRMAdminService fails due to inconsistent entries in UserGroups (Mit Desai via jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1582862) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java > TestRMAdminService fails due to inconsistent entries in UserGroups > -- > > Key: YARN-1883 > URL: https://issues.apache.org/jira/browse/YARN-1883 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0, 2.4.0 >Reporter: Mit Desai >Assignee: Mit Desai > Labels: java7 > Fix For: 3.0.0, 2.5.0 > > Attachments: YARN-1883.patch, YARN-1883.patch > > > testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider fails > with the following error: > {noformat} > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:92) > at org.junit.Assert.assertTrue(Assert.java:43) > at org.junit.Assert.assertTrue(Assert.java:54) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:421) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testOrder(TestRMAdminService.java:104) > {noformat} > Line Numbers will be inconsistent as I was testing to run it in a particular > order. But the Line on which the failure occurs is > {code} > Assert.assertTrue(groupBefore.contains("test_group_A") > && groupBefore.contains("test_group_B") > && groupBefore.contains("test_group_C") && groupBefore.size() == 3); > {code} > testRMInitialsWithFileSystemBasedConfigurationProvider() and > testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider() > calls the function {{MockUnixGroupsMapping.updateGroups();}} which changes > the list of userGroups. > testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider() > tries to verify the groups before changing it and fails if > testRMInitialsWithFileSystemBasedConfigurationProvider() already ran and made > the changes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1883) TestRMAdminService fails due to inconsistent entries in UserGroups
[ https://issues.apache.org/jira/browse/YARN-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951270#comment-13951270 ] Jonathan Eagles commented on YARN-1883: --- +1. Thanks for cleaning this test up. The double bracket initialization that was there before is considered a hack since it is creating an anonymous subclass with a static initialization. Committing to trunk and branch-2. jeagles > TestRMAdminService fails due to inconsistent entries in UserGroups > -- > > Key: YARN-1883 > URL: https://issues.apache.org/jira/browse/YARN-1883 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0, 2.4.0 >Reporter: Mit Desai >Assignee: Mit Desai > Labels: java7 > Attachments: YARN-1883.patch, YARN-1883.patch > > > testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider fails > with the following error: > {noformat} > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:92) > at org.junit.Assert.assertTrue(Assert.java:43) > at org.junit.Assert.assertTrue(Assert.java:54) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:421) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testOrder(TestRMAdminService.java:104) > {noformat} > Line Numbers will be inconsistent as I was testing to run it in a particular > order. But the Line on which the failure occurs is > {code} > Assert.assertTrue(groupBefore.contains("test_group_A") > && groupBefore.contains("test_group_B") > && groupBefore.contains("test_group_C") && groupBefore.size() == 3); > {code} > testRMInitialsWithFileSystemBasedConfigurationProvider() and > testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider() > calls the function {{MockUnixGroupsMapping.updateGroups();}} which changes > the list of userGroups. > testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider() > tries to verify the groups before changing it and fails if > testRMInitialsWithFileSystemBasedConfigurationProvider() already ran and made > the changes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator
[ https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950749#comment-13950749 ] Fengdong Yu commented on YARN-1889: --- Good catch, Zhiguo. can you add some test cases in your patch? please replace 'tab' in your code with 'space'. {code} + private Priority priority = recordFactory.newRecordInstance(Priority.class); + private ResourceWeights resourceWeights = new ResourceWeights(); {code} can you add these to the constructor? {code} + public ResourceWeights getResourceWeightsObject() { + return resourceWeights; + } {code} It would be better for the name "getResourceWeights()" > avoid creating new objects on each fair scheduler call to AppSchedulable > comparator > --- > > Key: YARN-1889 > URL: https://issues.apache.org/jira/browse/YARN-1889 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Hong Zhiguo >Priority: Minor > Attachments: YARN-1889.patch > > > In fair scheduler, in each scheduling attempt, a full sort is > performed on List of AppSchedulable, which invokes Comparator.compare > method many times. Both FairShareComparator and DRFComparator call > AppSchedulable.getWeights, and AppSchedulable.getPriority. > A new ResourceWeights object is allocated on each call of getWeights, > and the same for getPriority. This introduces a lot of pressure to > GC because these methods are called very very frequently. > Below test case shows improvement on performance and GC behaviour. The > results show that the GC pressure during processing NodeUpdate is recuded > half by this patch. > The code to show the improvement: (Add it to TestFairScheduler.java) > import java.lang.management.GarbageCollectorMXBean; > import java.lang.management.ManagementFactory; > public void printGCStats() { > long totalGarbageCollections = 0; > long garbageCollectionTime = 0; > for(GarbageCollectorMXBean gc : > ManagementFactory.getGarbageCollectorMXBeans()) { > long count = gc.getCollectionCount(); > if(count >= 0) { > totalGarbageCollections += count; > } > long time = gc.getCollectionTime(); > if(time >= 0) { > garbageCollectionTime += time; > } > } > System.out.println("Total Garbage Collections: " > + totalGarbageCollections); > System.out.println("Total Garbage Collection Time (ms): " > + garbageCollectionTime); > } > @Test > public void testImpactOnGC() throws Exception { > scheduler.reinitialize(conf, resourceManager.getRMContext()); > // Add nodes > int numNode = 1; > for (int i = 0; i < numNode; ++i) { > String host = String.format("192.1.%d.%d", i/256, i%256); > RMNode node = > MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, > host); > NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node); > scheduler.handle(nodeEvent); > assertEquals(1024 * 64 * (i+1), > scheduler.getClusterCapacity().getMemory()); > } > assertEquals(numNode, scheduler.getNumClusterNodes()); > assertEquals(1024 * 64 * numNode, > scheduler.getClusterCapacity().getMemory()); > // add apps, each app has 100 containers. > int minReqSize = > > FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB; > int numApp = 8000; > int priority = 1; > for (int i = 1; i < numApp + 1; ++i) { > ApplicationAttemptId attemptId = createAppAttemptId(i, 1); > AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent( > attemptId.getApplicationId(), "queue1", "user1"); > scheduler.handle(appAddedEvent); > AppAttemptAddedSchedulerEvent attemptAddedEvent = > new AppAttemptAddedSchedulerEvent(attemptId, false); > scheduler.handle(attemptAddedEvent); > createSchedulingRequestExistingApplication(minReqSize * 2, 1, > priority, attemptId); > } > scheduler.update(); > assertEquals(numApp, scheduler.getQueueManager().getLeafQueue("queue1", > true) > .getRunnableAppSchedulables().size()); > System.out.println("GC stats before NodeUpdate processing:"); > printGCStats(); > int hb_num = 5000; > long start = System.nanoTime(); > for (int i = 0; i < hb_num; ++i) { > String host = String.format("192.1.%d.%d", i/256, i%256); > RMNode node = > MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), 5000, > host); > NodeUpdateSchedulerEvent nodeEvent = new NodeUpdateSchedulerEvent(node); > scheduler.handle(nodeEvent); > } > long end = System.nanoTime(); > System.out.printf("processing time for a NodeUp
[jira] [Commented] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions
[ https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951126#comment-13951126 ] Wei Yan commented on YARN-596: -- Thought that approach before. But I forget why give up. I'll update a patch using that approach. > In fair scheduler, intra-application container priorities affect > inter-application preemption decisions > --- > > Key: YARN-596 > URL: https://issues.apache.org/jira/browse/YARN-596 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, > YARN-596.patch > > > In the fair scheduler, containers are chosen for preemption in the following > way: > All containers for all apps that are in queues that are over their fair share > are put in a list. > The list is sorted in order of the priority that the container was requested > in. > This means that an application can shield itself from preemption by > requesting it's containers at higher priorities, which doesn't really make > sense. > Also, an application that is not over its fair share, but that is in a queue > that is over it's fair share is just as likely to have containers preempted > as an application that is over its fair share. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions
[ https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951040#comment-13951040 ] Sandy Ryza commented on YARN-596: - Cloning the full queue/app tree makes preemption O(apps) vs. O(log(apps)), which seems expensive to do every time we want to preempt. Could we possibly add a field to Schedulable called something like preemptedResources that gets subtracted when calculating getResourceUsage, and cleared out at the end of preemptResources? Also, minor nit: better to use "sched" than "sche" for consistency with other places in the code. > In fair scheduler, intra-application container priorities affect > inter-application preemption decisions > --- > > Key: YARN-596 > URL: https://issues.apache.org/jira/browse/YARN-596 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, > YARN-596.patch > > > In the fair scheduler, containers are chosen for preemption in the following > way: > All containers for all apps that are in queues that are over their fair share > are put in a list. > The list is sorted in order of the priority that the container was requested > in. > This means that an application can shield itself from preemption by > requesting it's containers at higher priorities, which doesn't really make > sense. > Also, an application that is not over its fair share, but that is in a queue > that is over it's fair share is just as likely to have containers preempted > as an application that is over its fair share. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1853) Allow containers to be ran under real user even in insecure mode
[ https://issues.apache.org/jira/browse/YARN-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Stepachev updated YARN-1853: --- Component/s: resourcemanager > Allow containers to be ran under real user even in insecure mode > > > Key: YARN-1853 > URL: https://issues.apache.org/jira/browse/YARN-1853 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager >Affects Versions: 2.2.0 >Reporter: Andrey Stepachev > Attachments: YARN-1853.patch, YARN-1853.patch > > > Currently unsecure cluster runs all containers under one user (typically > nobody). That is not appropriate, because yarn applications doesn't play well > with hdfs having enabled permissions. Yarn applications try to write data (as > expected) into /user/nobody regardless of user, who launched application. > Another sideeffect is that it is not possible to configure cgroups for > particular users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1885) yarn logs command does not provide the application logs for some applications
[ https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950976#comment-13950976 ] Arpit Gupta commented on YARN-1885: --- yarn logs -applicationId returned nothing. And the application was done when i checked the RM UI. > yarn logs command does not provide the application logs for some applications > - > > Key: YARN-1885 > URL: https://issues.apache.org/jira/browse/YARN-1885 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Arpit Gupta > > During our HA testing we have seen cases where yarn application logs are not > available through the cli but i can look at AM logs through the UI. RM was > also being restarted in the background as the application was running. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator
[ https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950999#comment-13950999 ] Sandy Ryza commented on YARN-1889: -- When you say "gc pressure", which is going down? The number of gc's or the time spent in each gc (or both)? > avoid creating new objects on each fair scheduler call to AppSchedulable > comparator > --- > > Key: YARN-1889 > URL: https://issues.apache.org/jira/browse/YARN-1889 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Hong Zhiguo >Priority: Minor > Attachments: YARN-1889.patch > > > In fair scheduler, in each scheduling attempt, a full sort is > performed on List of AppSchedulable, which invokes Comparator.compare > method many times. Both FairShareComparator and DRFComparator call > AppSchedulable.getWeights, and AppSchedulable.getPriority. > A new ResourceWeights object is allocated on each call of getWeights, > and the same for getPriority. This introduces a lot of pressure to > GC because these methods are called very very frequently. > Below test case shows improvement on performance and GC behaviour. The > results show that the GC pressure during processing NodeUpdate is recuded > half by this patch. > The code to show the improvement: (Add it to TestFairScheduler.java) > import java.lang.management.GarbageCollectorMXBean; > import java.lang.management.ManagementFactory; > public void printGCStats() { > long totalGarbageCollections = 0; > long garbageCollectionTime = 0; > for(GarbageCollectorMXBean gc : > ManagementFactory.getGarbageCollectorMXBeans()) { > long count = gc.getCollectionCount(); > if(count >= 0) { > totalGarbageCollections += count; > } > long time = gc.getCollectionTime(); > if(time >= 0) { > garbageCollectionTime += time; > } > } > System.out.println("Total Garbage Collections: " > + totalGarbageCollections); > System.out.println("Total Garbage Collection Time (ms): " > + garbageCollectionTime); > } > @Test > public void testImpactOnGC() throws Exception { > scheduler.reinitialize(conf, resourceManager.getRMContext()); > // Add nodes > int numNode = 1; > for (int i = 0; i < numNode; ++i) { > String host = String.format("192.1.%d.%d", i/256, i%256); > RMNode node = > MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, > host); > NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node); > scheduler.handle(nodeEvent); > assertEquals(1024 * 64 * (i+1), > scheduler.getClusterCapacity().getMemory()); > } > assertEquals(numNode, scheduler.getNumClusterNodes()); > assertEquals(1024 * 64 * numNode, > scheduler.getClusterCapacity().getMemory()); > // add apps, each app has 100 containers. > int minReqSize = > > FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB; > int numApp = 8000; > int priority = 1; > for (int i = 1; i < numApp + 1; ++i) { > ApplicationAttemptId attemptId = createAppAttemptId(i, 1); > AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent( > attemptId.getApplicationId(), "queue1", "user1"); > scheduler.handle(appAddedEvent); > AppAttemptAddedSchedulerEvent attemptAddedEvent = > new AppAttemptAddedSchedulerEvent(attemptId, false); > scheduler.handle(attemptAddedEvent); > createSchedulingRequestExistingApplication(minReqSize * 2, 1, > priority, attemptId); > } > scheduler.update(); > assertEquals(numApp, scheduler.getQueueManager().getLeafQueue("queue1", > true) > .getRunnableAppSchedulables().size()); > System.out.println("GC stats before NodeUpdate processing:"); > printGCStats(); > int hb_num = 5000; > long start = System.nanoTime(); > for (int i = 0; i < hb_num; ++i) { > String host = String.format("192.1.%d.%d", i/256, i%256); > RMNode node = > MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), 5000, > host); > NodeUpdateSchedulerEvent nodeEvent = new NodeUpdateSchedulerEvent(node); > scheduler.handle(nodeEvent); > } > long end = System.nanoTime(); > System.out.printf("processing time for a NodeUpdate in average: %d us\n", > (end - start)/(hb_num * 1000)); > System.out.println("GC stats after NodeUpdate processing:"); > printGCStats(); > } -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator
[ https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951002#comment-13951002 ] Sandy Ryza commented on YARN-1889: -- Another nit: Priority.newInstance should be used instead of recordFactory > avoid creating new objects on each fair scheduler call to AppSchedulable > comparator > --- > > Key: YARN-1889 > URL: https://issues.apache.org/jira/browse/YARN-1889 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Hong Zhiguo >Priority: Minor > Attachments: YARN-1889.patch > > > In fair scheduler, in each scheduling attempt, a full sort is > performed on List of AppSchedulable, which invokes Comparator.compare > method many times. Both FairShareComparator and DRFComparator call > AppSchedulable.getWeights, and AppSchedulable.getPriority. > A new ResourceWeights object is allocated on each call of getWeights, > and the same for getPriority. This introduces a lot of pressure to > GC because these methods are called very very frequently. > Below test case shows improvement on performance and GC behaviour. The > results show that the GC pressure during processing NodeUpdate is recuded > half by this patch. > The code to show the improvement: (Add it to TestFairScheduler.java) > import java.lang.management.GarbageCollectorMXBean; > import java.lang.management.ManagementFactory; > public void printGCStats() { > long totalGarbageCollections = 0; > long garbageCollectionTime = 0; > for(GarbageCollectorMXBean gc : > ManagementFactory.getGarbageCollectorMXBeans()) { > long count = gc.getCollectionCount(); > if(count >= 0) { > totalGarbageCollections += count; > } > long time = gc.getCollectionTime(); > if(time >= 0) { > garbageCollectionTime += time; > } > } > System.out.println("Total Garbage Collections: " > + totalGarbageCollections); > System.out.println("Total Garbage Collection Time (ms): " > + garbageCollectionTime); > } > @Test > public void testImpactOnGC() throws Exception { > scheduler.reinitialize(conf, resourceManager.getRMContext()); > // Add nodes > int numNode = 1; > for (int i = 0; i < numNode; ++i) { > String host = String.format("192.1.%d.%d", i/256, i%256); > RMNode node = > MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, > host); > NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node); > scheduler.handle(nodeEvent); > assertEquals(1024 * 64 * (i+1), > scheduler.getClusterCapacity().getMemory()); > } > assertEquals(numNode, scheduler.getNumClusterNodes()); > assertEquals(1024 * 64 * numNode, > scheduler.getClusterCapacity().getMemory()); > // add apps, each app has 100 containers. > int minReqSize = > > FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB; > int numApp = 8000; > int priority = 1; > for (int i = 1; i < numApp + 1; ++i) { > ApplicationAttemptId attemptId = createAppAttemptId(i, 1); > AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent( > attemptId.getApplicationId(), "queue1", "user1"); > scheduler.handle(appAddedEvent); > AppAttemptAddedSchedulerEvent attemptAddedEvent = > new AppAttemptAddedSchedulerEvent(attemptId, false); > scheduler.handle(attemptAddedEvent); > createSchedulingRequestExistingApplication(minReqSize * 2, 1, > priority, attemptId); > } > scheduler.update(); > assertEquals(numApp, scheduler.getQueueManager().getLeafQueue("queue1", > true) > .getRunnableAppSchedulables().size()); > System.out.println("GC stats before NodeUpdate processing:"); > printGCStats(); > int hb_num = 5000; > long start = System.nanoTime(); > for (int i = 0; i < hb_num; ++i) { > String host = String.format("192.1.%d.%d", i/256, i%256); > RMNode node = > MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), 5000, > host); > NodeUpdateSchedulerEvent nodeEvent = new NodeUpdateSchedulerEvent(node); > scheduler.handle(nodeEvent); > } > long end = System.nanoTime(); > System.out.printf("processing time for a NodeUpdate in average: %d us\n", > (end - start)/(hb_num * 1000)); > System.out.println("GC stats after NodeUpdate processing:"); > printGCStats(); > } -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1853) Allow containers to be ran under real user even in insecure mode
[ https://issues.apache.org/jira/browse/YARN-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Stepachev updated YARN-1853: --- Attachment: YARN-1853.patch Updated patch. RMAppManager should check existence of user before submitting app in insecure mode and reject if no user found. (this patch defensive, check user only in non-impersonate mode). > Allow containers to be ran under real user even in insecure mode > > > Key: YARN-1853 > URL: https://issues.apache.org/jira/browse/YARN-1853 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.2.0 >Reporter: Andrey Stepachev > Attachments: YARN-1853.patch, YARN-1853.patch > > > Currently unsecure cluster runs all containers under one user (typically > nobody). That is not appropriate, because yarn applications doesn't play well > with hdfs having enabled permissions. Yarn applications try to write data (as > expected) into /user/nobody regardless of user, who launched application. > Another sideeffect is that it is not possible to configure cgroups for > particular users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1891) Document NodeManager health-monitoring
[ https://issues.apache.org/jira/browse/YARN-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950624#comment-13950624 ] Hadoop QA commented on YARN-1891: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12637374/apache-yarn-1891.0.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3485//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3485//console This message is automatically generated. > Document NodeManager health-monitoring > -- > > Key: YARN-1891 > URL: https://issues.apache.org/jira/browse/YARN-1891 > Project: Hadoop YARN > Issue Type: Task >Reporter: Varun Vasudev >Assignee: Varun Vasudev >Priority: Minor > Attachments: apache-yarn-1891.0.patch > > > Start documenting node manager starting with the health monitoring. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1703) Too many connections are opened for proxy server when applicationMaster UI is accessed.
[ https://issues.apache.org/jira/browse/YARN-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950590#comment-13950590 ] Hadoop QA commented on YARN-1703: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12637370/YARN-1703.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3484//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3484//console This message is automatically generated. > Too many connections are opened for proxy server when applicationMaster UI is > accessed. > --- > > Key: YARN-1703 > URL: https://issues.apache.org/jira/browse/YARN-1703 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.3.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: YARN-1703.1.patch, YARN-1703.2.patch > > > If running job is accessed for progress. there many CLOSE_WAIT connections > for proxyserver. Eventhough connection is released, it makes available > again to the HttpClient instance, but does not close it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1891) Document NodeManager health-monitoring
[ https://issues.apache.org/jira/browse/YARN-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-1891: Attachment: apache-yarn-1891.0.patch Patch to create NodeManager documentation and document health monitoring. > Document NodeManager health-monitoring > -- > > Key: YARN-1891 > URL: https://issues.apache.org/jira/browse/YARN-1891 > Project: Hadoop YARN > Issue Type: Task >Reporter: Varun Vasudev >Assignee: Varun Vasudev >Priority: Minor > Attachments: apache-yarn-1891.0.patch > > > Start documenting node manager starting with the health monitoring. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1891) Document NodeManager health-monitoring
Varun Vasudev created YARN-1891: --- Summary: Document NodeManager health-monitoring Key: YARN-1891 URL: https://issues.apache.org/jira/browse/YARN-1891 Project: Hadoop YARN Issue Type: Task Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Minor Start documenting node manager starting with the health monitoring. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator
[ https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950576#comment-13950576 ] Hadoop QA commented on YARN-1889: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12637367/YARN-1889.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3483//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3483//console This message is automatically generated. > avoid creating new objects on each fair scheduler call to AppSchedulable > comparator > --- > > Key: YARN-1889 > URL: https://issues.apache.org/jira/browse/YARN-1889 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Hong Zhiguo >Priority: Minor > Attachments: YARN-1889.patch > > > In fair scheduler, in each scheduling attempt, a full sort is > performed on List of AppSchedulable, which invokes Comparator.compare > method many times. Both FairShareComparator and DRFComparator call > AppSchedulable.getWeights, and AppSchedulable.getPriority. > A new ResourceWeights object is allocated on each call of getWeights, > and the same for getPriority. This introduces a lot of pressure to > GC because these methods are called very very frequently. > Below test case shows improvement on performance and GC behaviour. The > results show that the GC pressure during processing NodeUpdate is recuded > half by this patch. > The code to show the improvement: (Add it to TestFairScheduler.java) > import java.lang.management.GarbageCollectorMXBean; > import java.lang.management.ManagementFactory; > public void printGCStats() { > long totalGarbageCollections = 0; > long garbageCollectionTime = 0; > for(GarbageCollectorMXBean gc : > ManagementFactory.getGarbageCollectorMXBeans()) { > long count = gc.getCollectionCount(); > if(count >= 0) { > totalGarbageCollections += count; > } > long time = gc.getCollectionTime(); > if(time >= 0) { > garbageCollectionTime += time; > } > } > System.out.println("Total Garbage Collections: " > + totalGarbageCollections); > System.out.println("Total Garbage Collection Time (ms): " > + garbageCollectionTime); > } > @Test > public void testImpactOnGC() throws Exception { > scheduler.reinitialize(conf, resourceManager.getRMContext()); > // Add nodes > int numNode = 1; > for (int i = 0; i < numNode; ++i) { > String host = String.format("192.1.%d.%d", i/256, i%256); > RMNode node = > MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, > host); > NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node); > scheduler.handle(nodeEvent); > assertEquals(1024 * 64 * (i+1), > scheduler.getClusterCapacity().getMemory()); > } > assertEquals(numNode, scheduler.getNumClusterNodes()); > assertEquals(1024 * 64 * numNode, > scheduler.getClusterCapacity().getMemory()); > // add apps, each app has 100 containers. > int minReqSize = > > FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB; > int numApp = 8000; > int priority = 1; > for (int i = 1; i < numApp + 1; ++i) { > ApplicationAttemptId attemptId = createAppAttemptId(i, 1); > AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent( > attemptId.getApplicationId(), "queue1", "user1"); > scheduler.h
[jira] [Commented] (YARN-1890) Too many unnecessary logs are logged while accessing applicationMaster web UI.
[ https://issues.apache.org/jira/browse/YARN-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950566#comment-13950566 ] Rohith commented on YARN-1890: -- Should log priority change to DEBUG ? I do not understand why so many GET request are being sent?:-( > Too many unnecessary logs are logged while accessing applicationMaster web UI. > -- > > Key: YARN-1890 > URL: https://issues.apache.org/jira/browse/YARN-1890 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith >Assignee: Rohith >Priority: Minor > > Accessing applicationMaster UI which is redirected from RM UI, logs too many > logs in ResourceManager logs and ProxyServer logs. On every refresh, logging > is done at WebAppProxyServlet.doGet(). All my RM and Proxyserver logs are > filled with UI information logs which are not really necessary for user. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1890) Too many unnecessary logs are logged while accessing applicationMaster web UI.
[ https://issues.apache.org/jira/browse/YARN-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950564#comment-13950564 ] Rohith commented on YARN-1890: -- Below logs are logging at one shot on refresh. {noformat} 2014-03-28 15:48:24,456 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://host-10-18-40-71:42769/mapreduce which is the app master GUI of application_1395977591056_0008 owned by root 2014-03-28 15:48:24,506 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://host-10-18-40-71:42769/static/yarn.css which is the app master GUI of application_1395977591056_0008 owned by root 2014-03-28 15:48:24,507 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://host-10-18-40-71:42769/static/jquery/jquery-1.8.2.min.js which is the app master GUI of application_1395977591056_0008 owned by root 2014-03-28 15:48:24,508 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://host-10-18-40-71:42769/static/jquery/jquery-ui-1.9.1.custom.min.js which is the app master GUI of application_1395977591056_0008 owned by root 2014-03-28 15:48:24,510 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://host-10-18-40-71:42769/static/dt-1.9.4/js/jquery.dataTables.min.js which is the app master GUI of application_1395977591056_0008 owned by root 2014-03-28 15:48:24,511 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://host-10-18-40-71:42769/static/jquery/themes-1.9.1/base/jquery-ui.css which is the app master GUI of application_1395977591056_0008 owned by root 2014-03-28 15:48:24,548 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://host-10-18-40-71:42769/static/dt-1.9.4/css/jui-dt.css which is the app master GUI of application_1395977591056_0008 owned by root 2014-03-28 15:48:24,626 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://host-10-18-40-71:42769/static/yarn.dt.plugins.js which is the app master GUI of application_1395977591056_0008 owned by root 2014-03-28 15:48:24,836 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://host-10-18-40-71:42769/static/jquery/themes-1.9.1/base/images/ui-bg_glass_95_fef1ec_1x400.png which is the app master GUI of application_1395977591056_0008 owned by root 2014-03-28 15:48:24,841 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://host-10-18-40-71:42769/static/jquery/themes-1.9.1/base/images/ui-bg_flat_75_ff_40x100.png which is the app master GUI of application_1395977591056_0008 owned by root 2014-03-28 15:48:24,841 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://host-10-18-40-71:42769/static/jquery/themes-1.9.1/base/images/ui-bg_highlight-soft_75_cc_1x100.png which is the app master GUI of application_1395977591056_0008 owned by root 2014-03-28 15:48:24,843 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://host-10-18-40-71:42769/static/jquery/themes-1.9.1/base/images/ui-icons_88_256x240.png which is the app master GUI of application_1395977591056_0008 owned by root 2014-03-28 15:48:24,844 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://host-10-18-40-71:42769/static/jquery/themes-1.9.1/base/images/ui-icons_454545_256x240.png which is the app master GUI of application_1395977591056_0008 owned by root 2014-03-28 15:48:24,844 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://host-10-18-40-71:42769/static/jquery/themes-1.9.1/base/images/ui-bg_glass_75_e6e6e6_1x400.png which is the app master GUI of application_1395977591056_0008 owned by root 2014-03-28 15:48:24,871 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://host-10-18-40-71:42769/static/jquery/themes-1.9.1/base/images/ui-bg_glass_65_ff_1x400.png which is the app master GUI of application_1395977591056_0008 owned by root {noformat} > Too many unnecessary logs are logged while accessing applicationMaster web UI. > -- > > Key: YARN-1890 > URL: https://issues.apache.org/jira/browse/YARN-1890 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith >Assignee: Rohith >Priority: Minor > > Accessing applicationMaster UI which is redirected from RM UI, logs too many > logs in ResourceManager
[jira] [Created] (YARN-1890) Too many unnecessary logs are logged while accessing applicationMaster web UI.
Rohith created YARN-1890: Summary: Too many unnecessary logs are logged while accessing applicationMaster web UI. Key: YARN-1890 URL: https://issues.apache.org/jira/browse/YARN-1890 Project: Hadoop YARN Issue Type: Bug Reporter: Rohith Assignee: Rohith Priority: Minor Accessing applicationMaster UI which is redirected from RM UI, logs too many logs in ResourceManager logs and ProxyServer logs. On every refresh, logging is done at WebAppProxyServlet.doGet(). All my RM and Proxyserver logs are filled with UI information logs which are not really necessary for user. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1703) Too many connections are opened for proxy server when applicationMaster UI is accessed.
[ https://issues.apache.org/jira/browse/YARN-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1703: - Attachment: YARN-1703.2.patch Updated patch rebasing to latest code. I verified after this change, it is wroking. Please review the patch.. > Too many connections are opened for proxy server when applicationMaster UI is > accessed. > --- > > Key: YARN-1703 > URL: https://issues.apache.org/jira/browse/YARN-1703 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.3.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: YARN-1703.1.patch, YARN-1703.2.patch > > > If running job is accessed for progress. there many CLOSE_WAIT connections > for proxyserver. Eventhough connection is released, it makes available > again to the HttpClient instance, but does not close it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1703) Too many connections are opened for proxy server when applicationMaster UI is accessed.
[ https://issues.apache.org/jira/browse/YARN-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950552#comment-13950552 ] Rohith commented on YARN-1703: -- I am attaching connection established for accesing 1 application master ui. On every manuall refresh , it is incrementing more than 30 connections. {noformat} host-10-18-40-71:/home/OpenSource/hadoop-3.0.0-SNAPSHOT/sbin # netstat -tnpla | grep tcp0 0 0.0.0.0:49960.0.0.0:* LISTEN 27664/java tcp0 0 10.18.40.71:45029 :::*LISTEN 27664/java tcp1 0 10.18.40.71:58696 10.18.40.71:56297 CLOSE_WAIT 27664/java tcp1 0 10.18.40.71:10375 10.18.40.71:56297 CLOSE_WAIT 27664/java tcp1 0 10.18.40.71:27934 10.18.40.71:56297 CLOSE_WAIT 27664/java tcp0 0 10.18.40.71:29802 10.18.40.77:45022 ESTABLISHED 27664/java tcp0 0 10.18.40.71:43589 10.18.40.71:56297 ESTABLISHED 27664/java tcp0 0 10.18.40.71:45347 10.18.40.71:56297 ESTABLISHED 27664/java tcp0 0 10.18.40.71:45029 10.18.40.77:10980 ESTABLISHED 27664/java tcp0 0 10.18.40.71:37156 10.18.40.71:56297 ESTABLISHED 27664/java tcp0 0 10.18.40.71:45029 10.18.40.77:37989 ESTABLISHED 27664/java tcp0 0 10.18.40.71:45029 10.18.40.77:18116 ESTABLISHED 27664/java tcp0 0 10.18.40.71:50766 10.18.40.71:56297 ESTABLISHED 27664/java tcp1 0 10.18.40.71:18836 10.18.40.71:56297 CLOSE_WAIT 27664/java tcp1 0 10.18.40.71:19376 10.18.40.71:56297 CLOSE_WAIT 27664/java tcp1 0 10.18.40.71:25214 10.18.40.71:56297 CLOSE_WAIT 27664/java tcp0 0 10.18.40.71:45029 10.18.40.77:54847 ESTABLISHED 27664/java tcp0 0 10.18.40.71:45029 10.18.40.77:12605 ESTABLISHED 27664/java tcp1 0 10.18.40.71:50339 10.18.40.71:56297 CLOSE_WAIT 27664/java tcp0 0 10.18.40.71:49870 10.18.40.71:56297 ESTABLISHED 27664/java tcp0 0 10.18.40.71:50956 10.18.40.71:56297 ESTABLISHED 27664/java tcp1 0 10.18.40.71:16896 10.18.40.71:56297 CLOSE_WAIT 27664/java tcp0 0 10.18.40.71:43026 10.18.40.71:56297 ESTABLISHED 27664/java tcp0 0 10.18.40.71:45029 10.18.40.77:12091 ESTABLISHED 27664/java tcp0 0 10.18.40.71:49205 10.18.40.71:56297 ESTABLISHED 27664/java tcp0 0 10.18.40.71:29968 10.18.40.71:56297 ESTABLISHED 27664/java tcp1 0 10.18.40.71:34355 10.18.40.71:56297 CLOSE_WAIT 27664/java tcp0 0 10.18.40.71:54701 10.18.40.71:56297 ESTABLISHED 27664/java tcp0 0 10.18.40.71:47084 10.18.40.71:56297 ESTABLISHED 27664/java tcp0 0 10.18.40.71:45029 10.18.40.77:32938 ESTABLISHED 27664/java tcp1 0 10.18.40.71:28034 10.18.40.71:56297 CLOSE_WAIT 27664/java tcp1 0 10.18.40.71:40430 10.18.40.71:56297 CLOSE_WAIT 27664/java tcp0 0 10.18.40.71:45029 10.18.40.77:56572 ESTABLISHED 27664/java tcp0 0 10.18.40.71:45029 10.18.40.77:60771 ESTABLISHED 27664/java tcp0 0 10.18.40.71:45029 10.18.40.77:39165 ESTABLISHED 27664/java tcp0 0 10.18.40.71:20341 10.18.40.71:56297 ESTABLISHED 27664/java tcp1 0 10.18.40.71:22649 10.18.40.71:56297 CLOSE_WAIT 27664/java tcp1 0 10.18.40.71:23049 10.18.40.71:56297 CLOSE_WAIT 27664/java tcp0 0 10.18.40.71:52172 10.18.40.71:56297 ESTABLISHED 27664/java tcp0 0 10.18.40.71:45029 10.18.40.77:57790 ESTABLISHED 27664/java tcp0 0 10.18.40.71:17880 10.18.40.71:56297 ESTABLISHED 27664/java tcp0 0 10.18.40.71:45029 10.18.40.77:22911 ESTABLISHED 27664/java tcp0 0 10.18.40.71:52107 10.18.40.71:56297 ESTABLISHED 27664/java tcp0 0 10.18.40.71:45029 10.18.40.77:38345 ESTABLISHED 27664/java tcp0 0 10.18.40.71:45029 10.18.40.77:40127 ESTABLISHED 27664/java tcp1 0 10.18.40.71:18287 10.18.40.71:56297 CLOSE_WAIT 27664/java tcp0 0 10.18.40.71:45029 10.18.40.77:30754 ESTABLISHED 27664/java {noformat} > Too many connections are opened for proxy server when applicationMaster UI is > accessed. > -
[jira] [Updated] (YARN-1703) Too many connections are opened for proxy server when applicationMaster UI is accessed.
[ https://issues.apache.org/jira/browse/YARN-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1703: - Priority: Critical (was: Major) > Too many connections are opened for proxy server when applicationMaster UI is > accessed. > --- > > Key: YARN-1703 > URL: https://issues.apache.org/jira/browse/YARN-1703 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.3.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: YARN-1703.1.patch > > > If running job is accessed for progress. there many CLOSE_WAIT connections > for proxyserver. Eventhough connection is released, it makes available > again to the HttpClient instance, but does not close it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1703) Too many connections are opened for proxy server when applicationMaster UI is accessed.
[ https://issues.apache.org/jira/browse/YARN-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1703: - Summary: Too many connections are opened for proxy server when applicationMaster UI is accessed. (was: There many CLOSE_WAIT connections for proxy server.) > Too many connections are opened for proxy server when applicationMaster UI is > accessed. > --- > > Key: YARN-1703 > URL: https://issues.apache.org/jira/browse/YARN-1703 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.3.0 >Reporter: Rohith >Assignee: Rohith > Attachments: YARN-1703.1.patch > > > If running job is accessed for progress. there many CLOSE_WAIT connections > for proxyserver. Eventhough connection is released, it makes available > again to the HttpClient instance, but does not close it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator
[ https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-1889: -- Attachment: YARN-1889.patch > avoid creating new objects on each fair scheduler call to AppSchedulable > comparator > --- > > Key: YARN-1889 > URL: https://issues.apache.org/jira/browse/YARN-1889 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Hong Zhiguo >Priority: Minor > Attachments: YARN-1889.patch > > > In fair scheduler, in each scheduling attempt, a full sort is > performed on List of AppSchedulable, which invokes Comparator.compare > method many times. Both FairShareComparator and DRFComparator call > AppSchedulable.getWeights, and AppSchedulable.getPriority. > A new ResourceWeights object is allocated on each call of getWeights, > and the same for getPriority. This introduces a lot of pressure to > GC because these methods are called very very frequently. > Below test case shows improvement on performance and GC behaviour. The > results show that the GC pressure during processing NodeUpdate is recuded > half by this patch. > The code to show the improvement: (Add it to TestFairScheduler.java) > import java.lang.management.GarbageCollectorMXBean; > import java.lang.management.ManagementFactory; > public void printGCStats() { > long totalGarbageCollections = 0; > long garbageCollectionTime = 0; > for(GarbageCollectorMXBean gc : > ManagementFactory.getGarbageCollectorMXBeans()) { > long count = gc.getCollectionCount(); > if(count >= 0) { > totalGarbageCollections += count; > } > long time = gc.getCollectionTime(); > if(time >= 0) { > garbageCollectionTime += time; > } > } > System.out.println("Total Garbage Collections: " > + totalGarbageCollections); > System.out.println("Total Garbage Collection Time (ms): " > + garbageCollectionTime); > } > @Test > public void testImpactOnGC() throws Exception { > scheduler.reinitialize(conf, resourceManager.getRMContext()); > // Add nodes > int numNode = 1; > for (int i = 0; i < numNode; ++i) { > String host = String.format("192.1.%d.%d", i/256, i%256); > RMNode node = > MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, > host); > NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node); > scheduler.handle(nodeEvent); > assertEquals(1024 * 64 * (i+1), > scheduler.getClusterCapacity().getMemory()); > } > assertEquals(numNode, scheduler.getNumClusterNodes()); > assertEquals(1024 * 64 * numNode, > scheduler.getClusterCapacity().getMemory()); > // add apps, each app has 100 containers. > int minReqSize = > > FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB; > int numApp = 8000; > int priority = 1; > for (int i = 1; i < numApp + 1; ++i) { > ApplicationAttemptId attemptId = createAppAttemptId(i, 1); > AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent( > attemptId.getApplicationId(), "queue1", "user1"); > scheduler.handle(appAddedEvent); > AppAttemptAddedSchedulerEvent attemptAddedEvent = > new AppAttemptAddedSchedulerEvent(attemptId, false); > scheduler.handle(attemptAddedEvent); > createSchedulingRequestExistingApplication(minReqSize * 2, 1, > priority, attemptId); > } > scheduler.update(); > assertEquals(numApp, scheduler.getQueueManager().getLeafQueue("queue1", > true) > .getRunnableAppSchedulables().size()); > System.out.println("GC stats before NodeUpdate processing:"); > printGCStats(); > int hb_num = 5000; > long start = System.nanoTime(); > for (int i = 0; i < hb_num; ++i) { > String host = String.format("192.1.%d.%d", i/256, i%256); > RMNode node = > MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), 5000, > host); > NodeUpdateSchedulerEvent nodeEvent = new NodeUpdateSchedulerEvent(node); > scheduler.handle(nodeEvent); > } > long end = System.nanoTime(); > System.out.printf("processing time for a NodeUpdate in average: %d us\n", > (end - start)/(hb_num * 1000)); > System.out.println("GC stats after NodeUpdate processing:"); > printGCStats(); > } -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator
Hong Zhiguo created YARN-1889: - Summary: avoid creating new objects on each fair scheduler call to AppSchedulable comparator Key: YARN-1889 URL: https://issues.apache.org/jira/browse/YARN-1889 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Hong Zhiguo Priority: Minor In fair scheduler, in each scheduling attempt, a full sort is performed on List of AppSchedulable, which invokes Comparator.compare method many times. Both FairShareComparator and DRFComparator call AppSchedulable.getWeights, and AppSchedulable.getPriority. A new ResourceWeights object is allocated on each call of getWeights, and the same for getPriority. This introduces a lot of pressure to GC because these methods are called very very frequently. Below test case shows improvement on performance and GC behaviour. The results show that the GC pressure during processing NodeUpdate is recuded half by this patch. The code to show the improvement: (Add it to TestFairScheduler.java) import java.lang.management.GarbageCollectorMXBean; import java.lang.management.ManagementFactory; public void printGCStats() { long totalGarbageCollections = 0; long garbageCollectionTime = 0; for(GarbageCollectorMXBean gc : ManagementFactory.getGarbageCollectorMXBeans()) { long count = gc.getCollectionCount(); if(count >= 0) { totalGarbageCollections += count; } long time = gc.getCollectionTime(); if(time >= 0) { garbageCollectionTime += time; } } System.out.println("Total Garbage Collections: " + totalGarbageCollections); System.out.println("Total Garbage Collection Time (ms): " + garbageCollectionTime); } @Test public void testImpactOnGC() throws Exception { scheduler.reinitialize(conf, resourceManager.getRMContext()); // Add nodes int numNode = 1; for (int i = 0; i < numNode; ++i) { String host = String.format("192.1.%d.%d", i/256, i%256); RMNode node = MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, host); NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node); scheduler.handle(nodeEvent); assertEquals(1024 * 64 * (i+1), scheduler.getClusterCapacity().getMemory()); } assertEquals(numNode, scheduler.getNumClusterNodes()); assertEquals(1024 * 64 * numNode, scheduler.getClusterCapacity().getMemory()); // add apps, each app has 100 containers. int minReqSize = FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB; int numApp = 8000; int priority = 1; for (int i = 1; i < numApp + 1; ++i) { ApplicationAttemptId attemptId = createAppAttemptId(i, 1); AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent( attemptId.getApplicationId(), "queue1", "user1"); scheduler.handle(appAddedEvent); AppAttemptAddedSchedulerEvent attemptAddedEvent = new AppAttemptAddedSchedulerEvent(attemptId, false); scheduler.handle(attemptAddedEvent); createSchedulingRequestExistingApplication(minReqSize * 2, 1, priority, attemptId); } scheduler.update(); assertEquals(numApp, scheduler.getQueueManager().getLeafQueue("queue1", true) .getRunnableAppSchedulables().size()); System.out.println("GC stats before NodeUpdate processing:"); printGCStats(); int hb_num = 5000; long start = System.nanoTime(); for (int i = 0; i < hb_num; ++i) { String host = String.format("192.1.%d.%d", i/256, i%256); RMNode node = MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), 5000, host); NodeUpdateSchedulerEvent nodeEvent = new NodeUpdateSchedulerEvent(node); scheduler.handle(nodeEvent); } long end = System.nanoTime(); System.out.printf("processing time for a NodeUpdate in average: %d us\n", (end - start)/(hb_num * 1000)); System.out.println("GC stats after NodeUpdate processing:"); printGCStats(); } -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1888) Not add NodeManager to inactiveRMNodes when reboot NodeManager which have different port
[ https://issues.apache.org/jira/browse/YARN-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950521#comment-13950521 ] Hadoop QA commented on YARN-1888: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12637354/YARN-1888.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3482//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3482//console This message is automatically generated. > Not add NodeManager to inactiveRMNodes when reboot NodeManager which have > different port > > > Key: YARN-1888 > URL: https://issues.apache.org/jira/browse/YARN-1888 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: zhaoyunjiong >Priority: Minor > Attachments: YARN-1888.patch > > > When NodeManager's port set to 0, reboot NodeManager will cause "Losts Nodes" > inaccurate. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1888) Not add NodeManager to inactiveRMNodes when reboot NodeManager which have different port
[ https://issues.apache.org/jira/browse/YARN-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated YARN-1888: --- Attachment: YARN-1888.patch When RMNodeImpl do DeactivateNodeTransition, check whether there is already a new NodeManager with different port, if yes, don't add it to "Losts Nodes". > Not add NodeManager to inactiveRMNodes when reboot NodeManager which have > different port > > > Key: YARN-1888 > URL: https://issues.apache.org/jira/browse/YARN-1888 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: zhaoyunjiong >Priority: Minor > Attachments: YARN-1888.patch > > > When NodeManager's port set to 0, reboot NodeManager will cause "Losts Nodes" > inaccurate. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1888) Not add NodeManager to inactiveRMNodes when reboot NodeManager which have different port
zhaoyunjiong created YARN-1888: -- Summary: Not add NodeManager to inactiveRMNodes when reboot NodeManager which have different port Key: YARN-1888 URL: https://issues.apache.org/jira/browse/YARN-1888 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: zhaoyunjiong Priority: Minor When NodeManager's port set to 0, reboot NodeManager will cause "Losts Nodes" inaccurate. -- This message was sent by Atlassian JIRA (v6.2#6252)