[jira] [Updated] (YARN-6820) Restrict read access to timelineservice v2 data
[ https://issues.apache.org/jira/browse/YARN-6820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-6820: - Attachment: YARN-6820-YARN-5355.002.patch Attaching patch 002 , updated as per review recommendations. > Restrict read access to timelineservice v2 data > > > Key: YARN-6820 > URL: https://issues.apache.org/jira/browse/YARN-6820 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Labels: yarn-5355-merge-blocker > Attachments: YARN-6820-YARN-5355.0001.patch, > YARN-6820-YARN-5355.002.patch > > > Need to provide a way to restrict read access in ATSv2. Not all users should > be able to read all entities. On the flip side, some folks may not need any > read restrictions, so we need to provide a way to disable this access > restriction as well. > Initially this access restriction could be done in a simple way via a > whitelist of users allowed to read data. That set of users can read all data, > no other user can read any data. Can be turned off for all users to read all > data. > Could be stored in a "domain" table in hbase perhaps. Or a configuration > setting for the cluster. Or something else that's simple enough. ATSv1 has a > concept of domain for isolating users for reading. Would be good to keep that > in consideration. > In ATSv1, domain offers a namespace for Timeline server allowing users to > host multiple entities, isolating them from other users and applications. A > “Domain” in ATSV1 primarily stores owner info, read and& write ACL > information, created and modified time stamp information. Each Domain is > identified by an ID which must be unique across all users in the YARN cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6947) The implementation of Schedulable#getResourceUsage so inefficiency that can reduce the performance of scheduling
[ https://issues.apache.org/jira/browse/YARN-6947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YunFan Zhou updated YARN-6947: -- Description: Each time the FairScheduler assign container, it will checks whether the resources used by the queue exceed Max Share. However, our current calculation of the resources of the queue is particularly inefficient, which recursively iterates over all child nodes, with high time complexity. We can refactor this logic by using lazy update way. {code:java} @Override public Resource assignContainer(FSSchedulerNode node) { Resource assigned = Resources.none(); // If this queue is over its limit, reject if (!assignContainerPreCheck(node)) { return assigned; } {code} {code:java} * Helper method to check if the queue should attempt assigning resources * * @return true if check passes (can assign) or false otherwise */ boolean assignContainerPreCheck(FSSchedulerNode node) { if (node.getReservedContainer() != null) { if (LOG.isDebugEnabled()) { LOG.debug("Assigning container failed on node '" + node.getNodeName() + " because it has reserved containers."); } return false; } else if (!Resources.fitsIn(getResourceUsage(), maxShare)) { if (LOG.isDebugEnabled()) { LOG.debug("Assigning container failed on node '" + node.getNodeName() + " because queue resource usage is larger than MaxShare: " + dumpState()); } return false; } else { return true; } } {code} {code:java} @Override public Resource getResourceUsage() { Resource usage = Resources.createResource(0); readLock.lock(); try { for (FSQueue child : childQueues) { Resources.addTo(usage, child.getResourceUsage()); } } finally { readLock.unlock(); } return usage; } {code} was: {code:java} @Override public Resource assignContainer(FSSchedulerNode node) { Resource assigned = Resources.none(); // If this queue is over its limit, reject if (!assignContainerPreCheck(node)) { return assigned; } {code} > The implementation of Schedulable#getResourceUsage so inefficiency that can > reduce the performance of scheduling > > > Key: YARN-6947 > URL: https://issues.apache.org/jira/browse/YARN-6947 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: YunFan Zhou >Priority: Critical > > Each time the FairScheduler assign container, it will checks whether the > resources used by the queue exceed Max Share. However, our current > calculation of the resources of the queue is particularly inefficient, which > recursively iterates over all child nodes, with high time complexity. > We can refactor this logic by using lazy update way. > {code:java} > @Override > public Resource assignContainer(FSSchedulerNode node) { > Resource assigned = Resources.none(); > // If this queue is over its limit, reject > if (!assignContainerPreCheck(node)) { > return assigned; > } > {code} > {code:java} >* Helper method to check if the queue should attempt assigning resources >* >* @return true if check passes (can assign) or false otherwise >*/ > boolean assignContainerPreCheck(FSSchedulerNode node) { > if (node.getReservedContainer() != null) { > if (LOG.isDebugEnabled()) { > LOG.debug("Assigning container failed on node '" + node.getNodeName() > + " because it has reserved containers."); > } > return false; > } else if (!Resources.fitsIn(getResourceUsage(), maxShare)) { > if (LOG.isDebugEnabled()) { > LOG.debug("Assigning container failed on node '" + node.getNodeName() > + " because queue resource usage is larger than MaxShare: " > + dumpState()); > } > return false; > } else { > return true; > } > } > {code} > {code:java} > @Override > public Resource getResourceUsage() { > Resource usage = Resources.createResource(0); > readLock.lock(); > try { > for (FSQueue child : childQueues) { > Resources.addTo(usage, child.getResourceUsage()); > } > } finally { > readLock.unlock(); > } > return usage; > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-6947) The implementation of Schedulable#getResourceUsage so inefficiency that can reduce the performance of scheduling
[ https://issues.apache.org/jira/browse/YARN-6947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu resolved YARN-6947. Resolution: Duplicate > The implementation of Schedulable#getResourceUsage so inefficiency that can > reduce the performance of scheduling > > > Key: YARN-6947 > URL: https://issues.apache.org/jira/browse/YARN-6947 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: YunFan Zhou >Priority: Critical > > Each time the FairScheduler assign container, it will checks whether the > resources used by the queue exceed Max Share. However, our current > calculation of the resources of the queue is particularly inefficient, which > recursively iterates over all child nodes, with high time complexity. > We can refactor this logic by using lazy update way. > {code:java} > @Override > public Resource assignContainer(FSSchedulerNode node) { > Resource assigned = Resources.none(); > // If this queue is over its limit, reject > if (!assignContainerPreCheck(node)) { > return assigned; > } > {code} > {code:java} >* Helper method to check if the queue should attempt assigning resources >* >* @return true if check passes (can assign) or false otherwise >*/ > boolean assignContainerPreCheck(FSSchedulerNode node) { > if (node.getReservedContainer() != null) { > if (LOG.isDebugEnabled()) { > LOG.debug("Assigning container failed on node '" + node.getNodeName() > + " because it has reserved containers."); > } > return false; > } else if (!Resources.fitsIn(getResourceUsage(), maxShare)) { > if (LOG.isDebugEnabled()) { > LOG.debug("Assigning container failed on node '" + node.getNodeName() > + " because queue resource usage is larger than MaxShare: " > + dumpState()); > } > return false; > } else { > return true; > } > } > {code} > {code:java} > @Override > public Resource getResourceUsage() { > Resource usage = Resources.createResource(0); > readLock.lock(); > try { > for (FSQueue child : childQueues) { > Resources.addTo(usage, child.getResourceUsage()); > } > } finally { > readLock.unlock(); > } > return usage; > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6788) Improve performance of resource profile branch
[ https://issues.apache.org/jira/browse/YARN-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114043#comment-16114043 ] Sunil G edited comment on YARN-6788 at 8/4/17 8:13 AM: --- Thanks [~templedf] Quick clarification for few points: bq.the findbugs warning is worth fixing Findbugs warnings which are pointed by Jenkins is the one which is existing at YARN-3926. I have fixed as part of this patch, hence there are no findbugs warning shown with patch (Please refer *Patch Compile Tests* part in jenkins) bq.don't forget to move TestResourceUtils into yarn-api There were two fundamental reasons why it is kept in yarn-common # ResourceUtils open files like {{resource-types.xml}} or any other files using {{ConfigurationProvider}}. Default ConfigurationProvider class is {{org.apache.hadoop.yarn.LocalConfigurationProvider}}. But this compiles with yarn-common package. Due to this, we can't compile TestResourceUtils when it is in yarn-api since this package is first built as per hadoop-yarn pom (yarn-common is built post yarn-api) # A bunch of sample resource files are also added as {{testResources}} in yarn-common pom.xml. I can point to same dir from yarn-api pom or needed to copy/duplicate these resources. This is something which we can do (point lookup to yarn-common resources for junit tests) I think point 1 is little tricky and we can leave this test file in yarn-common for now. I could add a comment and detail in this file for reference. bq.checkstyle issue in ResourceUtils s I could handle this in next patch. I guess I will wait for your comment for above point before sharing next patch. was (Author: sunilg): Thanks [~templedf] Quick clarification for few points: bq.the findbugs warning is worth fixing Current findbugs warnings pointed by Jenkins is the one which is existing at YARN-3926. I have fixed as part of this patch, hence there are no findbugs warning shown with patch (Please refer *Patch Compile Tests* part in jenkins) bq.don't forget to move TestResourceUtils into yarn-api There were two fundamental reason why its kept in yarn-common # ResourceUtils open files such {{resource-types.xml}} or any other files using {{ConfigurationProvider}}. Default ConfigurationProvider class is {{org.apache.hadoop.yarn.LocalConfigurationProvider}}. But this compiles with yarn-common package. Due to this, we can't compile TestResourceUtils when its in yarn-api since this package is first build as per hadoop-yarn pom (yarn-common is built post yarn-api) # A bunch of sample resource files are added as {{testResources}} in yarn-common pom.xml. I can hard point to same dir from yarn-api or need to copy/duplicate these resource. This something doing (point lookup to yarn-common resources for junit tests) I think point 1 is little tricky and we can leave this file in yarn-common for now. I could add a comment and detail in this file for reference. bq.checkstyle issue in ResourceUtils s I could handle this in next patch. I guess I will wait for your comment for above point before sharing next patch. > Improve performance of resource profile branch > -- > > Key: YARN-6788 > URL: https://issues.apache.org/jira/browse/YARN-6788 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Sunil G >Assignee: Sunil G >Priority: Blocker > Attachments: YARN-6788-YARN-3926.001.patch, > YARN-6788-YARN-3926.002.patch, YARN-6788-YARN-3926.003.patch, > YARN-6788-YARN-3926.004.patch, YARN-6788-YARN-3926.005.patch, > YARN-6788-YARN-3926.006.patch, YARN-6788-YARN-3926.007.patch, > YARN-6788-YARN-3926.008.patch, YARN-6788-YARN-3926.009.patch, > YARN-6788-YARN-3926.010.patch, YARN-6788-YARN-3926.011.patch, > YARN-6788-YARN-3926.012.patch, YARN-6788-YARN-3926.013.patch, > YARN-6788-YARN-3926.014.patch, YARN-6788-YARN-3926.015.patch, > YARN-6788-YARN-3926.016.patch, YARN-6788-YARN-3926.017.patch, > YARN-6788-YARN-3926.018.patch, YARN-6788-YARN-3926.019.patch, > YARN-6788-YARN-3926.020.patch, YARN-6788-YARN-3926.021.patch, > YARN-6788-YARN-3926.022.patch, YARN-6788-YARN-3926.022.patch > > > Currently we could see a 15% performance delta with this branch. > Few performance improvements to improve the same. > Also this patch will handle > [comments|https://issues.apache.org/jira/browse/YARN-6761?focusedCommentId=16075418=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16075418] > from [~leftnoteasy]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6920) Fix TestNMClient failure due to YARN-6706
[ https://issues.apache.org/jira/browse/YARN-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114006#comment-16114006 ] Jian He commented on YARN-6920: --- I see, thanks for the explanation, so, it's guaranteed the Guaranteed container will be started. But I was wondering if this will cause unnecessary churn. Like: 1) CONTAINER_COMPLETED sent 2) opportunistic container started. 3) SCHEDULE_CONTAINER sent 3) opportunistic killed and make room for the original upgrading container. If above is possible occur, we can eliminate this by skipping checking if should launch opportunistic container, and container upgrade can happen more smoothly. > Fix TestNMClient failure due to YARN-6706 > - > > Key: YARN-6920 > URL: https://issues.apache.org/jira/browse/YARN-6920 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: YARN-6920.001.patch, YARN-6920.002.patch, > YARN-6920.003.patch, YARN-6920.004.patch > > > Looks like {{TestNMClient}} has been failing for a while. Opening this JIRA > to track the fix. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6920) Fix TestNMClient failure due to YARN-6706
[ https://issues.apache.org/jira/browse/YARN-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114021#comment-16114021 ] Arun Suresh commented on YARN-6920: --- True.. the sequence of events you mentioned can happen. I am hoping that once YARN-5972 is completed, freezing and later thawing of the opportunistic container using cgroups freezer / docker pause - rather than simply killing it - will ensure no work is lost (we have seen good results in production on Windows). I can maybe raise a JIRA to optimize the above path and keep it open till we finish with YARN-5972. That way, I can get some data and see an optimization is required. Thoughts ? > Fix TestNMClient failure due to YARN-6706 > - > > Key: YARN-6920 > URL: https://issues.apache.org/jira/browse/YARN-6920 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: YARN-6920.001.patch, YARN-6920.002.patch, > YARN-6920.003.patch, YARN-6920.004.patch > > > Looks like {{TestNMClient}} has been failing for a while. Opening this JIRA > to track the fix. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6947) The implementation of Schedulable#getResourceUsage so inefficiency that can reduce the performance of scheduling
YunFan Zhou created YARN-6947: - Summary: The implementation of Schedulable#getResourceUsage so inefficiency that can reduce the performance of scheduling Key: YARN-6947 URL: https://issues.apache.org/jira/browse/YARN-6947 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: YunFan Zhou Priority: Critical -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6788) Improve performance of resource profile branch
[ https://issues.apache.org/jira/browse/YARN-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114043#comment-16114043 ] Sunil G commented on YARN-6788: --- Thanks [~templedf] Quick clarification for few points: bq.the findbugs warning is worth fixing Current findbugs warnings pointed by Jenkins is the one which is existing at YARN-3926. I have fixed as part of this patch, hence there are no findbugs warning shown with patch (Please refer *Patch Compile Tests* part in jenkins) bq.don't forget to move TestResourceUtils into yarn-api There were two fundamental reason why its kept in yarn-common # ResourceUtils open files such {{resource-types.xml}} or any other files using {{ConfigurationProvider}}. Default ConfigurationProvider class is {{org.apache.hadoop.yarn.LocalConfigurationProvider}}. But this compiles with yarn-common package. Due to this, we can't compile TestResourceUtils when its in yarn-api since this package is first build as per hadoop-yarn pom (yarn-common is built post yarn-api) # A bunch of sample resource files are added as {{testResources}} in yarn-common pom.xml. I can hard point to same dir from yarn-api or need to copy/duplicate these resource. This something doing (point lookup to yarn-common resources for junit tests) I think point 1 is little tricky and we can leave this file in yarn-common for now. I could add a comment and detail in this file for reference. bq.checkstyle issue in ResourceUtils s I could handle this in next patch. I guess I will wait for your comment for above point before sharing next patch. > Improve performance of resource profile branch > -- > > Key: YARN-6788 > URL: https://issues.apache.org/jira/browse/YARN-6788 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Sunil G >Assignee: Sunil G >Priority: Blocker > Attachments: YARN-6788-YARN-3926.001.patch, > YARN-6788-YARN-3926.002.patch, YARN-6788-YARN-3926.003.patch, > YARN-6788-YARN-3926.004.patch, YARN-6788-YARN-3926.005.patch, > YARN-6788-YARN-3926.006.patch, YARN-6788-YARN-3926.007.patch, > YARN-6788-YARN-3926.008.patch, YARN-6788-YARN-3926.009.patch, > YARN-6788-YARN-3926.010.patch, YARN-6788-YARN-3926.011.patch, > YARN-6788-YARN-3926.012.patch, YARN-6788-YARN-3926.013.patch, > YARN-6788-YARN-3926.014.patch, YARN-6788-YARN-3926.015.patch, > YARN-6788-YARN-3926.016.patch, YARN-6788-YARN-3926.017.patch, > YARN-6788-YARN-3926.018.patch, YARN-6788-YARN-3926.019.patch, > YARN-6788-YARN-3926.020.patch, YARN-6788-YARN-3926.021.patch, > YARN-6788-YARN-3926.022.patch, YARN-6788-YARN-3926.022.patch > > > Currently we could see a 15% performance delta with this branch. > Few performance improvements to improve the same. > Also this patch will handle > [comments|https://issues.apache.org/jira/browse/YARN-6761?focusedCommentId=16075418=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16075418] > from [~leftnoteasy]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6873) Moving logging APIs over to slf4j in hadoop-yarn-server-applicationhistoryservice
[ https://issues.apache.org/jira/browse/YARN-6873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113991#comment-16113991 ] Wenxin He commented on YARN-6873: - [~Cyl], since HADOOP-14706 is commited, would you use the helper method {{isLog4jLogger}} in your patch? > Moving logging APIs over to slf4j in > hadoop-yarn-server-applicationhistoryservice > - > > Key: YARN-6873 > URL: https://issues.apache.org/jira/browse/YARN-6873 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Yeliang Cang >Assignee: Yeliang Cang > Attachments: YARN-6873.001.patch, YARN-6873.002.patch, > YARN-6873.003.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6938) Add a flag to indicate whether timeline server acl enabled
[ https://issues.apache.org/jira/browse/YARN-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YunFan Zhou updated YARN-6938: -- Component/s: timelineserver > Add a flag to indicate whether timeline server acl enabled > -- > > Key: YARN-6938 > URL: https://issues.apache.org/jira/browse/YARN-6938 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Reporter: YunFan Zhou >Assignee: YunFan Zhou > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6947) The implementation of Schedulable#getResourceUsage so inefficiency that can reduce the performance of scheduling
[ https://issues.apache.org/jira/browse/YARN-6947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YunFan Zhou updated YARN-6947: -- Description: {code:java} @Override public Resource assignContainer(FSSchedulerNode node) { Resource assigned = Resources.none(); // If this queue is over its limit, reject if (!assignContainerPreCheck(node)) { return assigned; } {code} > The implementation of Schedulable#getResourceUsage so inefficiency that can > reduce the performance of scheduling > > > Key: YARN-6947 > URL: https://issues.apache.org/jira/browse/YARN-6947 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: YunFan Zhou >Priority: Critical > > {code:java} > @Override > public Resource assignContainer(FSSchedulerNode node) { > Resource assigned = Resources.none(); > // If this queue is over its limit, reject > if (!assignContainerPreCheck(node)) { > return assigned; > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6820) Restrict read access to timelineservice v2 data
[ https://issues.apache.org/jira/browse/YARN-6820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114028#comment-16114028 ] Vrushali C edited comment on YARN-6820 at 8/4/17 7:07 AM: -- Attaching patch 002 , updated as per review recommendations. I have added two new classes: TimelineReaderWhitelistAuthorizationFilterInitializer and TimelineReaderWhitelistAuthorizationFilter. These are similar to other filter classes in hadoop. These names feel a bit too lengthy to me, wondering if / how to make them shorter. The filter class now uses AccessControlList to determine if a user should be allowed or not. It also checks for admins and allows them to read timeline service v2 data. I have added unit tests for checking users and groups set in the config similar to the way yarn admin acl config params are set. I also ran other unit tests for timeline v2 reader webservices and saw that these filters are being invoked. Thanks [~jrottinghuis] for helping me wade through the code base this afternoon. I will be out for the next 3 days, so will respond to review suggestions after Monday afternoon. (I am yet to update the documentation for this. Will do so in either this jira or the documentation jira YARN-6047.) was (Author: vrushalic): Attaching patch 002 , updated as per review recommendations. > Restrict read access to timelineservice v2 data > > > Key: YARN-6820 > URL: https://issues.apache.org/jira/browse/YARN-6820 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Labels: yarn-5355-merge-blocker > Attachments: YARN-6820-YARN-5355.0001.patch, > YARN-6820-YARN-5355.002.patch > > > Need to provide a way to restrict read access in ATSv2. Not all users should > be able to read all entities. On the flip side, some folks may not need any > read restrictions, so we need to provide a way to disable this access > restriction as well. > Initially this access restriction could be done in a simple way via a > whitelist of users allowed to read data. That set of users can read all data, > no other user can read any data. Can be turned off for all users to read all > data. > Could be stored in a "domain" table in hbase perhaps. Or a configuration > setting for the cluster. Or something else that's simple enough. ATSv1 has a > concept of domain for isolating users for reading. Would be good to keep that > in consideration. > In ATSv1, domain offers a namespace for Timeline server allowing users to > host multiple entities, isolating them from other users and applications. A > “Domain” in ATSV1 primarily stores owner info, read and& write ACL > information, created and modified time stamp information. Each Domain is > identified by an ID which must be unique across all users in the YARN cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6361) FairScheduler: FSLeafQueue.fetchAppsWithDemand CPU usage is high with big queues
[ https://issues.apache.org/jira/browse/YARN-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114094#comment-16114094 ] Yufei Gu commented on YARN-6361: You can take it. > FairScheduler: FSLeafQueue.fetchAppsWithDemand CPU usage is high with big > queues > > > Key: YARN-6361 > URL: https://issues.apache.org/jira/browse/YARN-6361 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Miklos Szegedi >Assignee: Yufei Gu >Priority: Minor > Attachments: dispatcherthread.png, threads.png > > > FSLeafQueue.fetchAppsWithDemand sorts the applications by the current policy. > Most of the time is spent in FairShareComparator.compare. We could improve > this by doing the calculations outside the sort loop {{(O\(n\))}} and we > sorted by a fixed number inside instead {{O(n*log\(n\))}}. This could be an > performance issue when there are huge number of applications in a single > queue. The attachments shows the performance impact when there are 10k > applications in one queue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6361) FairScheduler: FSLeafQueue.fetchAppsWithDemand CPU usage is high with big queues
[ https://issues.apache.org/jira/browse/YARN-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114061#comment-16114061 ] YunFan Zhou commented on YARN-6361: --- [~yufeigu] Hi, Yufei. I would like to work on this JIRA if you have not yet started working, please inform can I take over this JIRA. Thank you. > FairScheduler: FSLeafQueue.fetchAppsWithDemand CPU usage is high with big > queues > > > Key: YARN-6361 > URL: https://issues.apache.org/jira/browse/YARN-6361 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Miklos Szegedi >Assignee: Yufei Gu >Priority: Minor > Attachments: dispatcherthread.png, threads.png > > > FSLeafQueue.fetchAppsWithDemand sorts the applications by the current policy. > Most of the time is spent in FairShareComparator.compare. We could improve > this by doing the calculations outside the sort loop {{(O\(n\))}} and we > sorted by a fixed number inside instead {{O(n*log\(n\))}}. This could be an > performance issue when there are huge number of applications in a single > queue. The attachments shows the performance impact when there are 10k > applications in one queue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6951) Fix debug log when Resource handler chain is enabled
[ https://issues.apache.org/jira/browse/YARN-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Wang updated YARN-6951: Description: {code:title=LinuxContainerExecutor.java} ... ... if (LOG.isDebugEnabled()) { LOG.debug("Resource handler chain enabled = " + (resourceHandlerChain == null)); } ... ... {code} I think it is just a typo.When resourceHandlerChain is not null, print the log "Resource handler chain enabled = true". was: {code title=LinuxContainerExecutor.java} ... ... if (LOG.isDebugEnabled()) { LOG.debug("Resource handler chain enabled = " + (resourceHandlerChain == null)); } ... ... {code} I think it is just a typo.When resourceHandlerChain is not null, print the log "Resource handler chain enabled = true". > Fix debug log when Resource handler chain is enabled > > > Key: YARN-6951 > URL: https://issues.apache.org/jira/browse/YARN-6951 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang > > {code:title=LinuxContainerExecutor.java} > ... ... > if (LOG.isDebugEnabled()) { > LOG.debug("Resource handler chain enabled = " + (resourceHandlerChain > == null)); > } > ... ... > {code} > I think it is just a typo.When resourceHandlerChain is not null, print the > log "Resource handler chain enabled = true". -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6951) Fix debug log when Resource handler chain is enabled
[ https://issues.apache.org/jira/browse/YARN-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Wang reassigned YARN-6951: --- Assignee: Yang Wang > Fix debug log when Resource handler chain is enabled > > > Key: YARN-6951 > URL: https://issues.apache.org/jira/browse/YARN-6951 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang >Assignee: Yang Wang > Attachments: YARN-6951.001.patch > > > {code:title=LinuxContainerExecutor.java} > ... ... > if (LOG.isDebugEnabled()) { > LOG.debug("Resource handler chain enabled = " + (resourceHandlerChain > == null)); > } > ... ... > {code} > I think it is just a typo.When resourceHandlerChain is not null, print the > log "Resource handler chain enabled = true". -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6133) [ATSv2 Security] Renew delegation token for app automatically if an app collector is active
[ https://issues.apache.org/jira/browse/YARN-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114235#comment-16114235 ] Rohith Sharma K S commented on YARN-6133: - thanks @varun for the patch. Some comments # Token is renewed just before 10 seconds. Should it be increased? # TimelineCollectorManager has introduced synchronized block. This is not necessary right.? # Renewer threads count is 1. Given load on NM not much, one thread can renew it. But I would suggest to keep it to 50? > [ATSv2 Security] Renew delegation token for app automatically if an app > collector is active > --- > > Key: YARN-6133 > URL: https://issues.apache.org/jira/browse/YARN-6133 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-5355-merge-blocker > Attachments: YARN-6133-YARN-5355.01.patch, > YARN-6133-YARN-5355.02.patch, YARN-6133-YARN-5355.03.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6951) Fix debug log when Resource handler chain is enabled
Yang Wang created YARN-6951: --- Summary: Fix debug log when Resource handler chain is enabled Key: YARN-6951 URL: https://issues.apache.org/jira/browse/YARN-6951 Project: Hadoop YARN Issue Type: Bug Reporter: Yang Wang {code title=LinuxContainerExecutor.java} ... ... if (LOG.isDebugEnabled()) { LOG.debug("Resource handler chain enabled = " + (resourceHandlerChain == null)); } ... ... {code} I think it is just a typo.When resourceHandlerChain is not null, print the log "Resource handler chain enabled = true". -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6951) Fix debug log when Resource handler chain is enabled
[ https://issues.apache.org/jira/browse/YARN-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Wang updated YARN-6951: Attachment: YARN-6951.001.patch > Fix debug log when Resource handler chain is enabled > > > Key: YARN-6951 > URL: https://issues.apache.org/jira/browse/YARN-6951 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang > Attachments: YARN-6951.001.patch > > > {code:title=LinuxContainerExecutor.java} > ... ... > if (LOG.isDebugEnabled()) { > LOG.debug("Resource handler chain enabled = " + (resourceHandlerChain > == null)); > } > ... ... > {code} > I think it is just a typo.When resourceHandlerChain is not null, print the > log "Resource handler chain enabled = true". -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6133) [ATSv2 Security] Renew delegation token for app automatically if an app collector is active
[ https://issues.apache.org/jira/browse/YARN-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114248#comment-16114248 ] Varun Saxena commented on YARN-6133: bq. Token is renewed just before 10 seconds. Should it be increased? What do you suggest? 10 seconds should be enough as we renew only in DT manager i.e. internally in NM. Token doesn't need to go to AM. Right? bq. TimelineCollectorManager has introduced synchronized block. This is not necessary right.? This is to avoid race between Collector stopping and renewal timer expiring. So that additional renewal timer is not set unnecessarily. Has no functional impact though even if we set because it just wont find collector on expiry. But I thought better to avoid it altogether. Thoughts? bq. Renewer threads count is 1. Given load on NM not much, one thread can renew it. But I would suggest to keep it to 50? How many active collectors do we expect in one NM? Token renewal and token generation is not a very heavy task as well. Assuming we have 1000 active apps in say a 5000 node large cluster, we will have AMs' distributed across multiple nodes. So It is unlikely you will have more than 4-5 app collectors running in any NM at a particular moment. And even there it is unlikely that all collectors will have their token renewal expiry at same moment. There are no guarantees though. But it is unlikely. We may have a situation wherein we launch AMs' on a particular node partition though. In this case there might be some hotspotting, as in multiple app collectors on one node. But even there, 50 might be too many I think. We can keep a value higher than 1 though if you have concerns with only 1 thread, maybe 3-5. Keep it configurable with default 3 or 5? > [ATSv2 Security] Renew delegation token for app automatically if an app > collector is active > --- > > Key: YARN-6133 > URL: https://issues.apache.org/jira/browse/YARN-6133 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-5355-merge-blocker > Attachments: YARN-6133-YARN-5355.01.patch, > YARN-6133-YARN-5355.02.patch, YARN-6133-YARN-5355.03.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6361) FairScheduler: FSLeafQueue.fetchAppsWithDemand CPU usage is high with big queues
[ https://issues.apache.org/jira/browse/YARN-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114258#comment-16114258 ] YunFan Zhou commented on YARN-6361: --- [~yufeigu] Thank Yufei. For this question, including the optimization of the scheduling performance of the FairScheduler, I have the following ideas, and I apply these ideas to our production environment. The performance of the scheduling is ideal, and the speed of the assigning container can reach 5000 ~ 1 per second when aggregate resource requirements for the cluster is high. Here's what I do: * Avoid frequent ordering, and it's pointless and a waste of time to do a sequence before each assign container. Because, after each assignment, the whole child nodes of the queue are basically staying in order. And we don't really need to ensure that all of our fair shares is guaranteed, after all, even though we do a sort of order before each of the container's assignment because the *FSQueue#demand* is updated in the last time the *FairScheduler# update* cycle. So the value of demand is not real time, which also leads to the fact that we are not strictly and fairly shared. So, we can sort all the queues at the *FairScheduler#update* cycle, and we now have a default of 0.5 s per update cycle, which is worth doing. Since we have not been able to make a strict fair share, why don't we sacrifice some of our semantics of fair scheduler in exchange for better performance? * Improve the performance of the *Schedulable#getResourceUsage* calculation, making it complex in O(1). For one, there are several related smaller but especially useful optimization points. But I don't know if you can accept that. If you can accept it, I will list a few more detailed points later. > FairScheduler: FSLeafQueue.fetchAppsWithDemand CPU usage is high with big > queues > > > Key: YARN-6361 > URL: https://issues.apache.org/jira/browse/YARN-6361 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Miklos Szegedi >Assignee: Yufei Gu >Priority: Minor > Attachments: dispatcherthread.png, threads.png > > > FSLeafQueue.fetchAppsWithDemand sorts the applications by the current policy. > Most of the time is spent in FairShareComparator.compare. We could improve > this by doing the calculations outside the sort loop {{(O\(n\))}} and we > sorted by a fixed number inside instead {{O(n*log\(n\))}}. This could be an > performance issue when there are huge number of applications in a single > queue. The attachments shows the performance impact when there are 10k > applications in one queue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6361) FairScheduler: FSLeafQueue.fetchAppsWithDemand CPU usage is high with big queues
[ https://issues.apache.org/jira/browse/YARN-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YunFan Zhou reassigned YARN-6361: - Assignee: YunFan Zhou (was: Yufei Gu) > FairScheduler: FSLeafQueue.fetchAppsWithDemand CPU usage is high with big > queues > > > Key: YARN-6361 > URL: https://issues.apache.org/jira/browse/YARN-6361 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Miklos Szegedi >Assignee: YunFan Zhou >Priority: Minor > Attachments: dispatcherthread.png, threads.png > > > FSLeafQueue.fetchAppsWithDemand sorts the applications by the current policy. > Most of the time is spent in FairShareComparator.compare. We could improve > this by doing the calculations outside the sort loop {{(O\(n\))}} and we > sorted by a fixed number inside instead {{O(n*log\(n\))}}. This could be an > performance issue when there are huge number of applications in a single > queue. The attachments shows the performance impact when there are 10k > applications in one queue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING
lujie created YARN-6948: --- Summary: Invalid event: ATTEMPT_ADDED at FINAL_SAVING Key: YARN-6948 URL: https://issues.apache.org/jira/browse/YARN-6948 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.8.0 Reporter: lujie When I send kill command to a running job, I check the logs and find the Exception: {code:java} 2017-08-03 01:35:20,485 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: ATTEMPT_ADDED at FINAL_SAVING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6950) Invalid event: LAUNCH_FAILED at FAILED
lujie created YARN-6950: --- Summary: Invalid event: LAUNCH_FAILED at FAILED Key: YARN-6950 URL: https://issues.apache.org/jira/browse/YARN-6950 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.6.0 Reporter: lujie A RMAppAttemptImpl fail due to some reason,meanwhile AM fails to launch a container and send event LAUNCH_FAILED,and the StateMachine can not handle it: {code:java} 2017-07-05 03:33:09,013 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5621) Support LinuxContainerExecutor to create symlinks for continuously localized resources
[ https://issues.apache.org/jira/browse/YARN-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114150#comment-16114150 ] Yang Wang commented on YARN-5621: - {code:title=LinuxContainerExecutor.java} protected void createSymlinkAsUser(String user, File privateScriptFile, String userScriptFile) throws PrivilegedOperationException { String runAsUser = getRunAsUser(user); ... ... {code} I think we should use containerUser instead of runAsUser here. Because it may cause "Invalid command" in container-executor when getRunAsUser return nonsecureLocalUser. > Support LinuxContainerExecutor to create symlinks for continuously localized > resources > -- > > Key: YARN-5621 > URL: https://issues.apache.org/jira/browse/YARN-5621 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jian He >Assignee: Jian He > Labels: oct16-hard > Attachments: YARN-5621.1.patch, YARN-5621.2.patch, YARN-5621.3.patch, > YARN-5621.4.patch, YARN-5621.5.patch > > > When new resources are localized, new symlink needs to be created for the > localized resource. This is the change for the LinuxContainerExecutor to > create the symlinks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5621) Support LinuxContainerExecutor to create symlinks for continuously localized resources
[ https://issues.apache.org/jira/browse/YARN-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114150#comment-16114150 ] Yang Wang edited comment on YARN-5621 at 8/4/17 9:08 AM: - {code:title=LinuxContainerExecutor.java} protected void createSymlinkAsUser(String user, File privateScriptFile, String userScriptFile) throws PrivilegedOperationException { String runAsUser = getRunAsUser(user); ... ... {code} Hi,[~jianhe] I think we should use containerUser instead of runAsUser here. Because it may cause "Invalid command" in container-executor when getRunAsUser return nonsecureLocalUser. was (Author: fly_in_gis): {code:title=LinuxContainerExecutor.java} protected void createSymlinkAsUser(String user, File privateScriptFile, String userScriptFile) throws PrivilegedOperationException { String runAsUser = getRunAsUser(user); ... ... {code} I think we should use containerUser instead of runAsUser here. Because it may cause "Invalid command" in container-executor when getRunAsUser return nonsecureLocalUser. > Support LinuxContainerExecutor to create symlinks for continuously localized > resources > -- > > Key: YARN-5621 > URL: https://issues.apache.org/jira/browse/YARN-5621 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jian He >Assignee: Jian He > Labels: oct16-hard > Attachments: YARN-5621.1.patch, YARN-5621.2.patch, YARN-5621.3.patch, > YARN-5621.4.patch, YARN-5621.5.patch > > > When new resources are localized, new symlink needs to be created for the > localized resource. This is the change for the LinuxContainerExecutor to > create the symlinks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6949) Invalid event: LOCALIZED at LOCALIZED
lujie created YARN-6949: --- Summary: Invalid event: LOCALIZED at LOCALIZED Key: YARN-6949 URL: https://issues.apache.org/jira/browse/YARN-6949 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.8.0 Reporter: lujie When job is running, I stop a nodemanager in one machine due to some reason, Then I check the logs to see the running state,I find many InvalidStateTransitionException: {code:java} rg.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: LOCALIZATION_FAILED at LOCALIZED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource.handle(LocalizedResource.java:198) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.handle(LocalResourcesTrackerImpl.java:194) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.handle(LocalResourcesTrackerImpl.java:58) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1058) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:720) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:355) at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48) at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:63) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:845) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6726) Fix issues with docker commands executed by container-executor
[ https://issues.apache.org/jira/browse/YARN-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114160#comment-16114160 ] Sunil G commented on YARN-6726: --- Sorry for pitching in late here: Some minor comments: # It will be better if we can write to LOGFILE ot ERRORFILE regarding {{regex_match}} failure if any from {{validate_docker_image_name}} method. It could help us getting for information regarding regex failure if any. # A suggestion. {{validate_docker_image_name}} could also take {{regex_str}} as input. In that case we can use this method for any future regex matching. # validate_container_id could take const param # I think i am missing something. Could you please to share why we need a prefix of UTILS here, is this a standard. {{#ifndef _UTILS_STRING_UTILS_H_}} > Fix issues with docker commands executed by container-executor > -- > > Key: YARN-6726 > URL: https://issues.apache.org/jira/browse/YARN-6726 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Shane Kumpf >Assignee: Shane Kumpf > Attachments: YARN-6726.001.patch, YARN-6726.002.patch > > > docker inspect, rm, stop, etc are issued through container-executor. Commands > other than docker run are not functioning properly. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6788) Improve performance of resource profile branch
[ https://issues.apache.org/jira/browse/YARN-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114705#comment-16114705 ] Sunil G commented on YARN-6788: --- Thank you very much for thorough reviews and commit [~templedf] and [~leftnoteasy]. Really appreciate the same. > Improve performance of resource profile branch > -- > > Key: YARN-6788 > URL: https://issues.apache.org/jira/browse/YARN-6788 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Sunil G >Assignee: Sunil G >Priority: Blocker > Fix For: YARN-3926 > > Attachments: YARN-6788-YARN-3926.001.patch, > YARN-6788-YARN-3926.002.patch, YARN-6788-YARN-3926.003.patch, > YARN-6788-YARN-3926.004.patch, YARN-6788-YARN-3926.005.patch, > YARN-6788-YARN-3926.006.patch, YARN-6788-YARN-3926.007.patch, > YARN-6788-YARN-3926.008.patch, YARN-6788-YARN-3926.009.patch, > YARN-6788-YARN-3926.010.patch, YARN-6788-YARN-3926.011.patch, > YARN-6788-YARN-3926.012.patch, YARN-6788-YARN-3926.013.patch, > YARN-6788-YARN-3926.014.patch, YARN-6788-YARN-3926.015.patch, > YARN-6788-YARN-3926.016.patch, YARN-6788-YARN-3926.017.patch, > YARN-6788-YARN-3926.018.patch, YARN-6788-YARN-3926.019.patch, > YARN-6788-YARN-3926.020.patch, YARN-6788-YARN-3926.021.patch, > YARN-6788-YARN-3926.022.patch, YARN-6788-YARN-3926.022.patch > > > Currently we could see a 15% performance delta with this branch. > Few performance improvements to improve the same. > Also this patch will handle > [comments|https://issues.apache.org/jira/browse/YARN-6761?focusedCommentId=16075418=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16075418] > from [~leftnoteasy]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6871) Add additional deSelects params in getAppReport
[ https://issues.apache.org/jira/browse/YARN-6871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114710#comment-16114710 ] Giovanni Matteo Fumarola commented on YARN-6871: Thanks [~tanujnay] for the patch. Few comments: * I still think the formatting is not correctly set. Let me sync with you offline. * You have few checkstyle warnings. The [LineLength]s may be fixed with the correct formatting. * You have 2 unit tests that timed out. Please validate on your dev box that these tests pass successfully with your patch. > Add additional deSelects params in getAppReport > --- > > Key: YARN-6871 > URL: https://issues.apache.org/jira/browse/YARN-6871 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager, router >Reporter: Giovanni Matteo Fumarola >Assignee: Tanuj Nayak > Attachments: YARN-6871.002.patch, YARN-6871.proto.patch > > > This jira tracks the effort to add additional deSelect params to the > GetAppReport to make it lighter and faster. > With the current one we are facing a scalability issues. > E.g. with ~500 applications running the AppReport can reach up to 300MB in > size due to the {{ResourceRequest}} in the {{AppInfo}}. > Yarn RM will return the new result faster and it will use less compute cycles > to create the report and it will improve the YARN RM and Client's > performances. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3254) HealthReport should include disk full information
[ https://issues.apache.org/jira/browse/YARN-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-3254: --- Attachment: YARN-3254-005.patch > HealthReport should include disk full information > - > > Key: YARN-3254 > URL: https://issues.apache.org/jira/browse/YARN-3254 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Akira Ajisaka >Assignee: Suma Shivaprasad > Fix For: 3.0.0-beta1 > > Attachments: Screen Shot 2015-02-24 at 17.57.39.png, Screen Shot > 2015-02-25 at 14.38.10.png, YARN-3254-001.patch, YARN-3254-002.patch, > YARN-3254-003.patch, YARN-3254-004.patch, YARN-3254-005.patch > > > When a NodeManager's local disk gets almost full, the NodeManager sends a > health report to ResourceManager that "local/log dir is bad" and the message > is displayed on ResourceManager Web UI. It's difficult for users to detect > why the dir is bad. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-3254) HealthReport should include disk full information
[ https://issues.apache.org/jira/browse/YARN-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114935#comment-16114935 ] Suma Shivaprasad edited comment on YARN-3254 at 8/4/17 8:30 PM: Updated the patch to display the exact root cause for the disk error/capacity exceeded cases. These error diagnostics were already available in DirectoryCollection earlier but was ignored and not surfaced in the health report. Along with the ratio of disks marked as unhealthy for local/log dirs, the reason why each of them was marked unhealthy will be surfaced in health report. Sample errors below {noformat} 1/1 local-dirs have errors: [ /invalidDir1 : Cannot create directory: /invalidDir1 ] 1/1 log-dirs usable space is below configured utilization percentage/no more usable space [ /hadoop-3.0.0-beta1-SNAPSHOT/logs/userlogs : used space above threshold of 1.0% ] {noformat} was (Author: suma.shivaprasad): Updated the patch to display the exact root cause for the disk error/capacity exceeded cases. These error diagnostics were already available in DirectoryCollection earlier but was ignored and not surfaced in the health report. Along with the ratio of disks marked as unhealthy for local/log dirs, the reason why each of them was marked unhealthy will be surfaced in health report. Sample errors below {noformat} 1/1 local-dirs have errors: [ /invalidDir1 : Cannot create directory: /invalidDir1 ] 1/1 log-dirs usable space is below configured utilization percentage/no more usable space [ /hadoop-3.0.0-beta1-SNAPSHOT/logs/userlogs : used space above threshold of 1.0% ] > HealthReport should include disk full information > - > > Key: YARN-3254 > URL: https://issues.apache.org/jira/browse/YARN-3254 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Akira Ajisaka >Assignee: Suma Shivaprasad > Fix For: 3.0.0-beta1 > > Attachments: Screen Shot 2015-02-24 at 17.57.39.png, Screen Shot > 2015-02-25 at 14.38.10.png, YARN-3254-001.patch, YARN-3254-002.patch, > YARN-3254-003.patch, YARN-3254-004.patch, YARN-3254-005.patch > > > When a NodeManager's local disk gets almost full, the NodeManager sends a > health report to ResourceManager that "local/log dir is bad" and the message > is displayed on ResourceManager Web UI. It's difficult for users to detect > why the dir is bad. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6955) Concurrent registerAM thread in Federation Interceptor
[ https://issues.apache.org/jira/browse/YARN-6955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-6955: --- Attachment: YARN-6955.v1.patch > Concurrent registerAM thread in Federation Interceptor > -- > > Key: YARN-6955 > URL: https://issues.apache.org/jira/browse/YARN-6955 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Attachments: YARN-6955.v1.patch > > > The timeout between AM and AMRMProxy is shorter than the timeout + failOver > between FederationInterceptor (AMRMProxy) and RM. When the first register > thread in FI is blocked because of an RM failover, AM can timeout and resend > register call, leading to two outstanding register call inside FI. > Eventually when RM comes back up, one thread succeeds register and the other > thread got an application already registered exception. FI should swallow the > exception and return success back to AM in both threads. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6820) Restrict read access to timelineservice v2 data
[ https://issues.apache.org/jira/browse/YARN-6820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114990#comment-16114990 ] Jason Lowe commented on YARN-6820: -- Thanks for updating the patch! The javadoc errors are relevant: {noformat} [ERROR] /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java:2119: error: bad HTML entity [ERROR] * The name for setting that lists the users & groups who are allowed to [ERROR] ^ [ERROR] /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java:2122: error: bad HTML entity [ERROR] * It will allow this list of users & groups to read the data {noformat} There's no default value constant for TIMELINE_SERVICE_READ_AUTH_ENABLED but there probably should be one. DEFAULT_TIMELINE_SERVICE_READ_ALLOWED_USERS is defined but never used. I think it'd be simpler to always have an admin acl (so no need for null check), initializing it with a default value of an empty string if the YARN_ADMIN_ACL property is not set. It would be nice to have a unit test that verifies that even if a user not in the whitelist tries to perform a read it will be allowed if the master enable is off. > Restrict read access to timelineservice v2 data > > > Key: YARN-6820 > URL: https://issues.apache.org/jira/browse/YARN-6820 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Labels: yarn-5355-merge-blocker > Attachments: YARN-6820-YARN-5355.0001.patch, > YARN-6820-YARN-5355.002.patch > > > Need to provide a way to restrict read access in ATSv2. Not all users should > be able to read all entities. On the flip side, some folks may not need any > read restrictions, so we need to provide a way to disable this access > restriction as well. > Initially this access restriction could be done in a simple way via a > whitelist of users allowed to read data. That set of users can read all data, > no other user can read any data. Can be turned off for all users to read all > data. > Could be stored in a "domain" table in hbase perhaps. Or a configuration > setting for the cluster. Or something else that's simple enough. ATSv1 has a > concept of domain for isolating users for reading. Would be good to keep that > in consideration. > In ATSv1, domain offers a namespace for Timeline server allowing users to > host multiple entities, isolating them from other users and applications. A > “Domain” in ATSV1 primarily stores owner info, read and& write ACL > information, created and modified time stamp information. Each Domain is > identified by an ID which must be unique across all users in the YARN cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-6944) The comment about ResourceManager#createPolicyMonitors lies
[ https://issues.apache.org/jira/browse/YARN-6944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu resolved YARN-6944. Resolution: Duplicate > The comment about ResourceManager#createPolicyMonitors lies > --- > > Key: YARN-6944 > URL: https://issues.apache.org/jira/browse/YARN-6944 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.1, 3.0.0-alpha3 >Reporter: Yufei Gu >Priority: Trivial > > {code} > // creating monitors that handle preemption > createPolicyMonitors(); > {code} > Monitors don't handle preemption. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6952) Enable scheduling monitor in FS
[ https://issues.apache.org/jira/browse/YARN-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114904#comment-16114904 ] Yufei Gu commented on YARN-6952: Uploaded patch v1. File YARN-6954 to remove interface PreemptableResourceScheduler. > Enable scheduling monitor in FS > --- > > Key: YARN-6952 > URL: https://issues.apache.org/jira/browse/YARN-6952 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler, resourcemanager >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: YARN-6952.001.patch > > > {{SchedulingEditPolicy#init}} doesn't need to take interface > {{PreemptableResourceScheduler}} as the scheduler input. A ResourceScheduler > is good enough. With that change, fair scheduler is able to use scheduling > monitor(e.g. invariant checks) as CS does. Further more, there is no need for > interface {{PreemptableResourceScheduler}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6955) Concurrent registerAM thread in Federation Interceptor
Botong Huang created YARN-6955: -- Summary: Concurrent registerAM thread in Federation Interceptor Key: YARN-6955 URL: https://issues.apache.org/jira/browse/YARN-6955 Project: Hadoop YARN Issue Type: Bug Reporter: Botong Huang Assignee: Botong Huang Priority: Minor The timeout between AM and AMRMProxy is shorter than the timeout + failOver between FederationInterceptor (AMRMProxy) and RM. When the first register thread in FI is blocked because of an RM failover, AM can timeout and resend register call, leading to two outstanding register call inside FI. Eventually when RM comes back up, one thread succeeds register and the other thread got an application already registered exception. FI should swallow the exception and return success back to AM in both threads. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6033) Add support for sections in container-executor configuration file
[ https://issues.apache.org/jira/browse/YARN-6033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-6033: - Attachment: YARN-6033.009.patch Attached ver.009 patch, fixed warnings. > Add support for sections in container-executor configuration file > - > > Key: YARN-6033 > URL: https://issues.apache.org/jira/browse/YARN-6033 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-6033.003.patch, YARN-6033.004.patch, > YARN-6033.005.patch, YARN-6033.006.patch, YARN-6033.007.patch, > YARN-6033.008.patch, YARN-6033.009.patch, YARN-6033-YARN-5673.001.patch, > YARN-6033-YARN-5673.002.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6955) Concurrent registerAM thread in Federation Interceptor
[ https://issues.apache.org/jira/browse/YARN-6955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-6955: --- Attachment: (was: YARN-6955.v1.patch) > Concurrent registerAM thread in Federation Interceptor > -- > > Key: YARN-6955 > URL: https://issues.apache.org/jira/browse/YARN-6955 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > > The timeout between AM and AMRMProxy is shorter than the timeout + failOver > between FederationInterceptor (AMRMProxy) and RM. When the first register > thread in FI is blocked because of an RM failover, AM can timeout and resend > register call, leading to two outstanding register call inside FI. > Eventually when RM comes back up, one thread succeeds register and the other > thread got an application already registered exception. FI should swallow the > exception and return success back to AM in both threads. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6361) FairScheduler: FSLeafQueue.fetchAppsWithDemand CPU usage is high with big queues
[ https://issues.apache.org/jira/browse/YARN-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114654#comment-16114654 ] Yufei Gu commented on YARN-6361: Thanks for taking this, [~daemon]. This jira is dedicated to performance issue in FSLeafQueue.fetchAppsWithDemand. We can always open new JIRAs or using existing jiras for other performance issues. YARN-4090 is for the improvement the performance of the Schedulable#getResourceUsage calculation, which is your second point. > FairScheduler: FSLeafQueue.fetchAppsWithDemand CPU usage is high with big > queues > > > Key: YARN-6361 > URL: https://issues.apache.org/jira/browse/YARN-6361 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Miklos Szegedi >Assignee: YunFan Zhou >Priority: Minor > Attachments: dispatcherthread.png, threads.png > > > FSLeafQueue.fetchAppsWithDemand sorts the applications by the current policy. > Most of the time is spent in FairShareComparator.compare. We could improve > this by doing the calculations outside the sort loop {{(O\(n\))}} and we > sorted by a fixed number inside instead {{O(n*log\(n\))}}. This could be an > performance issue when there are huge number of applications in a single > queue. The attachments shows the performance impact when there are 10k > applications in one queue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6789) new api to get all supported resources from RM
[ https://issues.apache.org/jira/browse/YARN-6789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114722#comment-16114722 ] Sunil G edited comment on YARN-6789 at 8/4/17 5:58 PM: --- Submitting patch to run jenkins. This is an initial version of patch for api improvements. cc/[~leftnoteasy] [~templedf] was (Author: sunilg): Submitting patch to run jenkins. This is an initial version of patch for api improvements. cc/[~leftnoteasy] > new api to get all supported resources from RM > -- > > Key: YARN-6789 > URL: https://issues.apache.org/jira/browse/YARN-6789 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Sunil G >Assignee: Sunil G > Attachments: YARN-6789-YARN-3926.001.patch > > > It will be better to provide an api to get all supported resource types from > RM. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6726) Fix issues with docker commands executed by container-executor
[ https://issues.apache.org/jira/browse/YARN-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114746#comment-16114746 ] Hadoop QA commented on YARN-6726: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 14m 12s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 32m 2s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-6726 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12880436/YARN-6726.003.patch | | Optional Tests | asflicense compile cc mvnsite javac unit | | uname | Linux dc1077390d7d 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 02bf328 | | Default Java | 1.8.0_131 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/16712/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/16712/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Fix issues with docker commands executed by container-executor > -- > > Key: YARN-6726 > URL: https://issues.apache.org/jira/browse/YARN-6726 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Shane Kumpf >Assignee: Shane Kumpf > Attachments: YARN-6726.001.patch, YARN-6726.002.patch, > YARN-6726.003.patch > > > docker inspect, rm, stop, etc are issued through container-executor. Commands > other than docker run are not functioning properly. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-6934) ResourceUtils.checkMandatoryResources() should also ensure that no min or max is set for vcores or memory
[ https://issues.apache.org/jira/browse/YARN-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton resolved YARN-6934. Resolution: Invalid > ResourceUtils.checkMandatoryResources() should also ensure that no min or max > is set for vcores or memory > - > > Key: YARN-6934 > URL: https://issues.apache.org/jira/browse/YARN-6934 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: YARN-3926 >Reporter: Daniel Templeton > Labels: newbie++ > Attachments: YARN-6934.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6934) ResourceUtils.checkMandatoryResources() should also ensure that no min or max is set for vcores or memory
[ https://issues.apache.org/jira/browse/YARN-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114863#comment-16114863 ] Daniel Templeton commented on YARN-6934: Thanks for the patch, [~maniraj...@gmail.com]. I just took a closer look at the code, and it looks like I was wrong when I filed this JIRA. {{setMinimumAllocationForMandatoryResources()}} and {{setMaximumAllocationForMandatoryResources()}} explicitly allow for the min and max to be set on CPU and memory. I'm going to close it as invalid. I have added you as a contributor for the YARN project now, so you can assign JIRAs to yourself in the future. YARN-6933? > ResourceUtils.checkMandatoryResources() should also ensure that no min or max > is set for vcores or memory > - > > Key: YARN-6934 > URL: https://issues.apache.org/jira/browse/YARN-6934 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: YARN-3926 >Reporter: Daniel Templeton > Labels: newbie++ > Attachments: YARN-6934.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4090) Make Collections.sort() more efficient by caching resource usage
[ https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-4090: --- Summary: Make Collections.sort() more efficient by caching resource usage (was: Make Collections.sort() more efficient by ) > Make Collections.sort() more efficient by caching resource usage > > > Key: YARN-4090 > URL: https://issues.apache.org/jira/browse/YARN-4090 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Xianyin Xin >Assignee: zhangshilong > Attachments: sampling1.jpg, sampling2.jpg, YARN-4090.001.patch, > YARN-4090.002.patch, YARN-4090.003.patch, YARN-4090.004.patch, > YARN-4090.005.patch, YARN-4090.006.patch, YARN-4090-preview.patch, > YARN-4090-TestResult.pdf > > > Collections.sort() consumes too much time in a scheduling round. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6726) Fix issues with docker commands executed by container-executor
[ https://issues.apache.org/jira/browse/YARN-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shane Kumpf updated YARN-6726: -- Attachment: YARN-6726.003.patch > Fix issues with docker commands executed by container-executor > -- > > Key: YARN-6726 > URL: https://issues.apache.org/jira/browse/YARN-6726 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Shane Kumpf >Assignee: Shane Kumpf > Attachments: YARN-6726.001.patch, YARN-6726.002.patch, > YARN-6726.003.patch > > > docker inspect, rm, stop, etc are issued through container-executor. Commands > other than docker run are not functioning properly. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6726) Fix issues with docker commands executed by container-executor
[ https://issues.apache.org/jira/browse/YARN-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114829#comment-16114829 ] Shane Kumpf commented on YARN-6726: --- Thanks for the review, [~sunilg]. I have attached a new patch that addresses your comments. {quote} Could you please to share why we need a prefix of UTILS here, is this a standard. #ifndef UTILS_STRING_UTILS_H {quote} I don't know if it's a standard, but I've seen the convention elsewhere, and it aligns with YARN-6852 that Wangda had called out above. It is the relative path to the file (utils/strings-utils.h). > Fix issues with docker commands executed by container-executor > -- > > Key: YARN-6726 > URL: https://issues.apache.org/jira/browse/YARN-6726 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Shane Kumpf >Assignee: Shane Kumpf > Attachments: YARN-6726.001.patch, YARN-6726.002.patch, > YARN-6726.003.patch > > > docker inspect, rm, stop, etc are issued through container-executor. Commands > other than docker run are not functioning properly. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4090) Make Collections.sort() more efficient by
[ https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-4090: --- Summary: Make Collections.sort() more efficient by (was: Make Collections.sort() more efficient in FSParentQueue.java) > Make Collections.sort() more efficient by > -- > > Key: YARN-4090 > URL: https://issues.apache.org/jira/browse/YARN-4090 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Xianyin Xin >Assignee: zhangshilong > Attachments: sampling1.jpg, sampling2.jpg, YARN-4090.001.patch, > YARN-4090.002.patch, YARN-4090.003.patch, YARN-4090.004.patch, > YARN-4090.005.patch, YARN-4090.006.patch, YARN-4090-preview.patch, > YARN-4090-TestResult.pdf > > > Collections.sort() consumes too much time in a scheduling round. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6892) Improve API implementation in Resources and DominantResourceCalculator in align to ResourceInformation
[ https://issues.apache.org/jira/browse/YARN-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114721#comment-16114721 ] Sunil G commented on YARN-6892: --- YARN-6788 is committed. Submitting patch for jenkins. > Improve API implementation in Resources and DominantResourceCalculator in > align to ResourceInformation > -- > > Key: YARN-6892 > URL: https://issues.apache.org/jira/browse/YARN-6892 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Sunil G >Assignee: Sunil G > Attachments: YARN-6892-YARN-3926.001.patch > > > In YARN-3926, apis in Resources and DRC spents significant cpu cycles in most > of its api. For better performance, its better to improve the apis as > resource types order is defined in system level (ResourceUtils class ensures > this post YARN-6788) > This work is preceding to YARN-6788 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6953) Clean up ResourceUtils.setMinimumAllocationForMandatoryResources() and setMaximumAllocationForMandatoryResources()
Daniel Templeton created YARN-6953: -- Summary: Clean up ResourceUtils.setMinimumAllocationForMandatoryResources() and setMaximumAllocationForMandatoryResources() Key: YARN-6953 URL: https://issues.apache.org/jira/browse/YARN-6953 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: YARN-3926 Reporter: Daniel Templeton Priority: Minor The {{setMinimumAllocationForMandatoryResources()}} and {{setMaximumAllocationForMandatoryResources()}} methods are quite convoluted. They'd be much simpler if they just handled CPU and memory manually instead of trying to be clever about doing it in a loop. There are also issues, such as the log warning always talking about memory or the last element of the inner array being a copy of the first element. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6952) Enable scheduling monitor in FS
[ https://issues.apache.org/jira/browse/YARN-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-6952: --- Attachment: YARN-6952.001.patch > Enable scheduling monitor in FS > --- > > Key: YARN-6952 > URL: https://issues.apache.org/jira/browse/YARN-6952 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler, resourcemanager >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: YARN-6952.001.patch > > > {{SchedulingEditPolicy#init}} doesn't need to take interface > {{PreemptableResourceScheduler}} as the scheduler input. A ResourceScheduler > is good enough. With that change, fair scheduler is able to use scheduling > monitor(e.g. invariant checks) as CS does. Further more, there is no need for > interface {{PreemptableResourceScheduler}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6934) ResourceUtils.checkMandatoryResources() should also ensure that no min or max is set for vcores or memory
[ https://issues.apache.org/jira/browse/YARN-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-6934: --- Attachment: YARN-6934.001.patch > ResourceUtils.checkMandatoryResources() should also ensure that no min or max > is set for vcores or memory > - > > Key: YARN-6934 > URL: https://issues.apache.org/jira/browse/YARN-6934 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: YARN-3926 >Reporter: Daniel Templeton > Labels: newbie++ > Attachments: YARN-6934.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-65) Reduce RM app memory footprint once app has completed
[ https://issues.apache.org/jira/browse/YARN-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114793#comment-16114793 ] Hadoop QA commented on YARN-65: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 4m 35s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 27s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 13 new + 137 unchanged - 1 fixed = 150 total (was 138) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 43m 30s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 71m 43s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-65 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12880287/YARN-65.004.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux f4b5a072a392 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 02bf328 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/16711/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/16711/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/16711/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Reduce RM app memory footprint once app has completed > - > > Key: YARN-65 > URL: https://issues.apache.org/jira/browse/YARN-65 >
[jira] [Created] (YARN-6952) Enable scheduling monitor in FS
Yufei Gu created YARN-6952: -- Summary: Enable scheduling monitor in FS Key: YARN-6952 URL: https://issues.apache.org/jira/browse/YARN-6952 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler, resourcemanager Reporter: Yufei Gu Assignee: Yufei Gu {{SchedulingEditPolicy#init}} doesn't need to take interface {{PreemptableResourceScheduler}} as the scheduler input. A ResourceScheduler is good enough. With that change, fair scheduler is able to use scheduling monitor(e.g. invariant checks) as CS does. Further more, there is no need for interface {{PreemptableResourceScheduler}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6934) ResourceUtils.checkMandatoryResources() should also ensure that no min or max is set for vcores or memory
[ https://issues.apache.org/jira/browse/YARN-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114849#comment-16114849 ] Manikandan R commented on YARN-6934: Based on discussion, Attached patch for review. It contains even the changes required for YARN-6933 as it is closely related. > ResourceUtils.checkMandatoryResources() should also ensure that no min or max > is set for vcores or memory > - > > Key: YARN-6934 > URL: https://issues.apache.org/jira/browse/YARN-6934 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: YARN-3926 >Reporter: Daniel Templeton > Labels: newbie++ > Attachments: YARN-6934.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6954) Remove interface PreemptableResourceScheduler
Yufei Gu created YARN-6954: -- Summary: Remove interface PreemptableResourceScheduler Key: YARN-6954 URL: https://issues.apache.org/jira/browse/YARN-6954 Project: Hadoop YARN Issue Type: Improvement Components: capacity scheduler, resourcemanager Reporter: Yufei Gu Once YARN-6952 is done, the only place references interface PreemptableResourceScheduler is Capacity Scheduler. We could remove PreemptableResourceScheduler for simplicity. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3254) HealthReport should include disk full information
[ https://issues.apache.org/jira/browse/YARN-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114935#comment-16114935 ] Suma Shivaprasad commented on YARN-3254: Updated the patch to display the exact root cause for the disk error/capacity exceeded cases. These error diagnostics were already available in DirectoryCollection earlier but was ignored and not surfaced in the health report. Along with the ratio of disks marked as unhealthy for local/log dirs, the reason why each of them was marked unhealthy will be surfaced in health report. Sample errors below {noformat} 1/1 local-dirs have errors: [ /invalidDir1 : Cannot create directory: /invalidDir1 ] 1/1 log-dirs usable space is below configured utilization percentage/no more usable space [ /hadoop-3.0.0-beta1-SNAPSHOT/logs/userlogs : used space above threshold of 1.0% ] > HealthReport should include disk full information > - > > Key: YARN-3254 > URL: https://issues.apache.org/jira/browse/YARN-3254 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Akira Ajisaka >Assignee: Suma Shivaprasad > Fix For: 3.0.0-beta1 > > Attachments: Screen Shot 2015-02-24 at 17.57.39.png, Screen Shot > 2015-02-25 at 14.38.10.png, YARN-3254-001.patch, YARN-3254-002.patch, > YARN-3254-003.patch, YARN-3254-004.patch > > > When a NodeManager's local disk gets almost full, the NodeManager sends a > health report to ResourceManager that "local/log dir is bad" and the message > is displayed on ResourceManager Web UI. It's difficult for users to detect > why the dir is bad. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6955) Concurrent registerAM thread in Federation Interceptor
[ https://issues.apache.org/jira/browse/YARN-6955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-6955: --- Attachment: YARN-6955.v1.patch > Concurrent registerAM thread in Federation Interceptor > -- > > Key: YARN-6955 > URL: https://issues.apache.org/jira/browse/YARN-6955 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Attachments: YARN-6955.v1.patch > > > The timeout between AM and AMRMProxy is shorter than the timeout + failOver > between FederationInterceptor (AMRMProxy) and RM. When the first register > thread in FI is blocked because of an RM failover, AM can timeout and resend > register call, leading to two outstanding register call inside FI. > Eventually when RM comes back up, one thread succeeds register and the other > thread got an application already registered exception. FI should swallow the > exception and return success back to AM in both threads. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6413) Decouple Yarn Registry API from ZK
[ https://issues.apache.org/jira/browse/YARN-6413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ellen Hui updated YARN-6413: Attachment: 0003-Registry-API-api-only.patch This patch only adds the API, record, types, and exceptions, the implementation has not been touched. It compiles on top of yarn-native-services f1a358e178e. [~jianhe], can you please take a look at the latest patch, and let me know if this way of splitting the interface will work for you? > Decouple Yarn Registry API from ZK > -- > > Key: YARN-6413 > URL: https://issues.apache.org/jira/browse/YARN-6413 > Project: Hadoop YARN > Issue Type: Improvement > Components: amrmproxy, api, resourcemanager >Reporter: Ellen Hui >Assignee: Ellen Hui > Attachments: 0001-Registry-API-v2.patch, 0002-Registry-API-v2.patch, > 0003-Registry-API-api-only.patch > > > Right now the Yarn Registry API (defined in the RegistryOperations interface) > is a very thin layer over Zookeeper. This jira proposes changing the > interface to abstract away the implementation details so that we can write a > FS-based implementation of the registry service, which will be used to > support AMRMProxy HA. > The new interface will use register/delete/resolve APIs instead of > Zookeeper-specific operations like mknode. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6361) FairScheduler: FSLeafQueue.fetchAppsWithDemand CPU usage is high with big queues
[ https://issues.apache.org/jira/browse/YARN-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115205#comment-16115205 ] YunFan Zhou commented on YARN-6361: --- [~yufeigu] Thank Yufei. Sorry, I'm off the subject. But either way, the efficiency of raising the *fetchAppsWithDemand *is something that must be done. I have thought about the optimal method for two days and tested my thoughts today. *Thank you very much for your confidence in me.* > FairScheduler: FSLeafQueue.fetchAppsWithDemand CPU usage is high with big > queues > > > Key: YARN-6361 > URL: https://issues.apache.org/jira/browse/YARN-6361 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Miklos Szegedi >Assignee: YunFan Zhou >Priority: Minor > Attachments: dispatcherthread.png, threads.png > > > FSLeafQueue.fetchAppsWithDemand sorts the applications by the current policy. > Most of the time is spent in FairShareComparator.compare. We could improve > this by doing the calculations outside the sort loop {{(O\(n\))}} and we > sorted by a fixed number inside instead {{O(n*log\(n\))}}. This could be an > performance issue when there are huge number of applications in a single > queue. The attachments shows the performance impact when there are 10k > applications in one queue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6361) FairScheduler: FSLeafQueue.fetchAppsWithDemand CPU usage is high with big queues
[ https://issues.apache.org/jira/browse/YARN-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115205#comment-16115205 ] YunFan Zhou edited comment on YARN-6361 at 8/5/17 1:49 AM: --- [~yufeigu] Thank Yufei. Sorry, I'm off the subject. But either way, the efficiency of raising the *fetchAppsWithDemand* is something that must be done. I have thought about the optimal method for two days and tested my thoughts today. Thank you very much for your confidence in me. was (Author: daemon): [~yufeigu] Thank Yufei. Sorry, I'm off the subject. But either way, the efficiency of raising the *fetchAppsWithDemand *is something that must be done. I have thought about the optimal method for two days and tested my thoughts today. Thank you very much for your confidence in me. > FairScheduler: FSLeafQueue.fetchAppsWithDemand CPU usage is high with big > queues > > > Key: YARN-6361 > URL: https://issues.apache.org/jira/browse/YARN-6361 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Miklos Szegedi >Assignee: YunFan Zhou >Priority: Minor > Attachments: dispatcherthread.png, threads.png > > > FSLeafQueue.fetchAppsWithDemand sorts the applications by the current policy. > Most of the time is spent in FairShareComparator.compare. We could improve > this by doing the calculations outside the sort loop {{(O\(n\))}} and we > sorted by a fixed number inside instead {{O(n*log\(n\))}}. This could be an > performance issue when there are huge number of applications in a single > queue. The attachments shows the performance impact when there are 10k > applications in one queue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5938) Refactoring OpportunisticContainerAllocator to use SchedulerRequestKey instead of Priority and other misc fixes
[ https://issues.apache.org/jira/browse/YARN-5938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-5938: -- Fix Version/s: 2.9.0 > Refactoring OpportunisticContainerAllocator to use SchedulerRequestKey > instead of Priority and other misc fixes > --- > > Key: YARN-5938 > URL: https://issues.apache.org/jira/browse/YARN-5938 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-5938.001.patch, YARN-5938.002.patch, > YARN-5938.003.patch, YARN-5938-YARN-5085.001.patch, > YARN-5938-YARN-5085.002.patch, YARN-5938-YARN-5085.003.patch, > YARN-5938-YARN-5085.004.patch > > > Minor code re-organization to do the following: > # The OpportunisticContainerAllocatorAMService currently allocates outside > the ApplicationAttempt lock maintained by the ApplicationMasterService. This > should happen inside the lock. > # Refactored out some code to simplify the allocate() method. > # Removed some unused fields inside the OpportunisticContainerAllocator. > # Re-organized some of the code in the > OpportunisticContainerAllocatorAMService::allocate method to make it a bit > more readable. > # Moved SchedulerRequestKey to a new package, so it can be used by the > OpportunisticContainerAllocator/Context. > # Moved all usages of Priority in the OpportunisticContainerAllocator -> > SchedulerRequestKey. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6952) Enable scheduling monitor in FS
[ https://issues.apache.org/jira/browse/YARN-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115156#comment-16115156 ] Hadoop QA commented on YARN-6952: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 25s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 65 unchanged - 1 fixed = 66 total (was 66) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 43m 3s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 67m 49s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppStarvation | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-6952 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12880453/YARN-6952.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 78d21be2ade1 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / f44b349 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/16717/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/16717/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/16717/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U:
[jira] [Updated] (YARN-6802) Add Max AM Resource and AM Resource Usage to Leaf Queue View in FairScheduler WebUI
[ https://issues.apache.org/jira/browse/YARN-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-6802: --- Attachment: YARN-6802.branch-2.001.patch > Add Max AM Resource and AM Resource Usage to Leaf Queue View in FairScheduler > WebUI > --- > > Key: YARN-6802 > URL: https://issues.apache.org/jira/browse/YARN-6802 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: YunFan Zhou >Assignee: YunFan Zhou > Fix For: 2.9.0, 3.0.0-beta1 > > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, > YARN-6802.001.patch, YARN-6802.002.patch, YARN-6802.003.patch, > YARN-6802.branch-2.001.patch > > > RM Web ui should support view leaf queue am resource usage. > !screenshot-2.png! > I will upload my patch later. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-65) Reduce RM app memory footprint once app has completed
[ https://issues.apache.org/jira/browse/YARN-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115160#comment-16115160 ] Hadoop QA commented on YARN-65: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 31s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 13 new + 137 unchanged - 1 fixed = 150 total (was 138) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 47m 7s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 75m 29s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-65 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12880287/YARN-65.004.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux c2b2bfa9805e 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / f44b349 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/16718/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/16718/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/16718/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Reduce RM app memory footprint once app has completed > - > > Key: YARN-65 > URL: https://issues.apache.org/jira/browse/YARN-65 >
[jira] [Commented] (YARN-6945) Display the ACL of leaf queue in RM scheduler page
[ https://issues.apache.org/jira/browse/YARN-6945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115185#comment-16115185 ] YunFan Zhou commented on YARN-6945: --- [~Naganarasimha] Hi, Naganarasimha G R. Any suggestion? > Display the ACL of leaf queue in RM scheduler page > -- > > Key: YARN-6945 > URL: https://issues.apache.org/jira/browse/YARN-6945 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: YunFan Zhou >Assignee: YunFan Zhou > Attachments: screenshot-1.png, YARN-6945.001.patch > > > {code:java} > java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed > to submit application_1501748492123_0298 to YARN : User yarn cannot submit > applications to queue root.jack > at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:306) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1785) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) > {code} > Sometimes, when we submit our application to a queue, the submitted > application may fail because we have no permission to submit the application > to the corresponding queue. > But we have no place apart from the fair-scheduler.xml to see the ACL of the > queue, > you can't see such information from RM scheduler page. > So, I want to join the *aclSubmitApps*、*aclAdministerApps *message to the RM > scheduler page. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6952) Enable scheduling monitor in FS
[ https://issues.apache.org/jira/browse/YARN-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115162#comment-16115162 ] Yufei Gu commented on YARN-6952: The test failure is unrelated. > Enable scheduling monitor in FS > --- > > Key: YARN-6952 > URL: https://issues.apache.org/jira/browse/YARN-6952 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler, resourcemanager >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: YARN-6952.001.patch > > > {{SchedulingEditPolicy#init}} doesn't need to take interface > {{PreemptableResourceScheduler}} as the scheduler input. A ResourceScheduler > is good enough. With that change, fair scheduler is able to use scheduling > monitor(e.g. invariant checks) as CS does. Further more, there is no need for > interface {{PreemptableResourceScheduler}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6802) Add Max AM Resource and AM Resource Usage to Leaf Queue View in FairScheduler WebUI
[ https://issues.apache.org/jira/browse/YARN-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115161#comment-16115161 ] Yufei Gu commented on YARN-6802: Uploaded the patch for branch-2 and committed it to branch-2. > Add Max AM Resource and AM Resource Usage to Leaf Queue View in FairScheduler > WebUI > --- > > Key: YARN-6802 > URL: https://issues.apache.org/jira/browse/YARN-6802 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: YunFan Zhou >Assignee: YunFan Zhou > Fix For: 2.9.0, 3.0.0-beta1 > > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, > YARN-6802.001.patch, YARN-6802.002.patch, YARN-6802.003.patch, > YARN-6802.branch-2.001.patch > > > RM Web ui should support view leaf queue am resource usage. > !screenshot-2.png! > I will upload my patch later. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6955) Concurrent registerAM thread in Federation Interceptor
[ https://issues.apache.org/jira/browse/YARN-6955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115167#comment-16115167 ] Subru Krishnan commented on YARN-6955: -- Thanks [~botong] for surfacing this issue. The patch looks mostly good (pending Yetus warnings fix) except that we should be save the registration request only if _this.amRegistrationRequest == null_. > Concurrent registerAM thread in Federation Interceptor > -- > > Key: YARN-6955 > URL: https://issues.apache.org/jira/browse/YARN-6955 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Attachments: YARN-6955.v1.patch > > > The timeout between AM and AMRMProxy is shorter than the timeout + failOver > between FederationInterceptor (AMRMProxy) and RM. When the first register > thread in FI is blocked because of an RM failover, AM can timeout and resend > register call, leading to two outstanding register call inside FI. > Eventually when RM comes back up, one thread succeeds register and the other > thread got an application already registered exception. FI should swallow the > exception and return success back to AM in both threads. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6033) Add support for sections in container-executor configuration file
[ https://issues.apache.org/jira/browse/YARN-6033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115181#comment-16115181 ] Miklos Szegedi commented on YARN-6033: -- Sorry, I think I found one more in the latest patch. {code} 58 // free an entry set of values 59 void free_values(char** values) { 60if (*values != NULL) { 61 free(*values); 62} 63if (values != NULL) { 64 free(values); 65} 66 } {code} If I understand correctly this does not free all values, just the first value. This is expected, if the items come from strtok, so this would definitely deserve a comment. Moreover, if strtok finds a delimiter on the first character, the first value is inside the string, so free will crash and leak the memory. > Add support for sections in container-executor configuration file > - > > Key: YARN-6033 > URL: https://issues.apache.org/jira/browse/YARN-6033 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-6033.003.patch, YARN-6033.004.patch, > YARN-6033.005.patch, YARN-6033.006.patch, YARN-6033.007.patch, > YARN-6033.008.patch, YARN-6033.009.patch, YARN-6033-YARN-5673.001.patch, > YARN-6033-YARN-5673.002.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6361) FairScheduler: FSLeafQueue.fetchAppsWithDemand CPU usage is high with big queues
[ https://issues.apache.org/jira/browse/YARN-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YunFan Zhou updated YARN-6361: -- Priority: Major (was: Minor) > FairScheduler: FSLeafQueue.fetchAppsWithDemand CPU usage is high with big > queues > > > Key: YARN-6361 > URL: https://issues.apache.org/jira/browse/YARN-6361 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Miklos Szegedi >Assignee: YunFan Zhou > Attachments: dispatcherthread.png, threads.png > > > FSLeafQueue.fetchAppsWithDemand sorts the applications by the current policy. > Most of the time is spent in FairShareComparator.compare. We could improve > this by doing the calculations outside the sort loop {{(O\(n\))}} and we > sorted by a fixed number inside instead {{O(n*log\(n\))}}. This could be an > performance issue when there are huge number of applications in a single > queue. The attachments shows the performance impact when there are 10k > applications in one queue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6361) FairScheduler: FSLeafQueue.fetchAppsWithDemand CPU usage is high with big queues
[ https://issues.apache.org/jira/browse/YARN-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115205#comment-16115205 ] YunFan Zhou edited comment on YARN-6361 at 8/5/17 1:49 AM: --- [~yufeigu] Thank Yufei. Sorry, I'm off the subject. But either way, the efficiency of raising the *fetchAppsWithDemand *is something that must be done. I have thought about the optimal method for two days and tested my thoughts today. Thank you very much for your confidence in me. was (Author: daemon): [~yufeigu] Thank Yufei. Sorry, I'm off the subject. But either way, the efficiency of raising the *fetchAppsWithDemand *is something that must be done. I have thought about the optimal method for two days and tested my thoughts today. *Thank you very much for your confidence in me.* > FairScheduler: FSLeafQueue.fetchAppsWithDemand CPU usage is high with big > queues > > > Key: YARN-6361 > URL: https://issues.apache.org/jira/browse/YARN-6361 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Miklos Szegedi >Assignee: YunFan Zhou >Priority: Minor > Attachments: dispatcherthread.png, threads.png > > > FSLeafQueue.fetchAppsWithDemand sorts the applications by the current policy. > Most of the time is spent in FairShareComparator.compare. We could improve > this by doing the calculations outside the sort loop {{(O\(n\))}} and we > sorted by a fixed number inside instead {{O(n*log\(n\))}}. This could be an > performance issue when there are huge number of applications in a single > queue. The attachments shows the performance impact when there are 10k > applications in one queue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6949) Invalid event: LOCALIZATION_FAILED at LOCALIZED
[ https://issues.apache.org/jira/browse/YARN-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115220#comment-16115220 ] lujie commented on YARN-6949: - I check the log and also find some NullPointerException: {code:java} ava.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:505) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.getPathForLocalization(ResourceLocalizationService.java:1131) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1093) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:720) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:355) at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48) {code} > Invalid event: LOCALIZATION_FAILED at LOCALIZED > --- > > Key: YARN-6949 > URL: https://issues.apache.org/jira/browse/YARN-6949 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.8.0 >Reporter: lujie > > When job is running, I stop a nodemanager in one machine due to some reason, > Then I check the logs to see the running state,I find many > InvalidStateTransitionException: > {code:java} > rg.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > LOCALIZATION_FAILED at LOCALIZED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource.handle(LocalizedResource.java:198) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.handle(LocalResourcesTrackerImpl.java:194) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.handle(LocalResourcesTrackerImpl.java:58) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1058) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:720) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:355) > at > org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48) > at > org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:63) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:845) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5966) AMRMClient changes to support ExecutionType update
[ https://issues.apache.org/jira/browse/YARN-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-5966: -- Fix Version/s: 2.9.0 > AMRMClient changes to support ExecutionType update > -- > > Key: YARN-5966 > URL: https://issues.apache.org/jira/browse/YARN-5966 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh > Fix For: 2.9.0, 3.0.0-alpha4 > > Attachments: YARN-5966.001.patch, YARN-5966.002.patch, > YARN-5966.003.patch, YARN-5966.004.patch, YARN-5966.005.patch, > YARN-5966.006.patch, YARN-5966.007.patch, YARN-5966.008.patch, > YARN-5966.008.patch, YARN-5966.wip.001.patch > > > {{AMRMClient}} changes to support change of container ExecutionType -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5966) AMRMClient changes to support ExecutionType update
[ https://issues.apache.org/jira/browse/YARN-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115085#comment-16115085 ] Arun Suresh commented on YARN-5966: --- Committed this to branch-2 > AMRMClient changes to support ExecutionType update > -- > > Key: YARN-5966 > URL: https://issues.apache.org/jira/browse/YARN-5966 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh > Fix For: 2.9.0, 3.0.0-alpha4 > > Attachments: YARN-5966.001.patch, YARN-5966.002.patch, > YARN-5966.003.patch, YARN-5966.004.patch, YARN-5966.005.patch, > YARN-5966.006.patch, YARN-5966.007.patch, YARN-5966.008.patch, > YARN-5966.008.patch, YARN-5966.wip.001.patch > > > {{AMRMClient}} changes to support change of container ExecutionType -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6811) [ATS1.5] All history logs should be kept under its own User Directory.
[ https://issues.apache.org/jira/browse/YARN-6811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115124#comment-16115124 ] Junping Du commented on YARN-6811: -- I have commit the patch to trunk. For branch-2, my cherry-pick has several conflicts and the build still get failed even after I fix these conflicts. [~rohithsharma], can you upload a patch for branch-2? > [ATS1.5] All history logs should be kept under its own User Directory. > --- > > Key: YARN-6811 > URL: https://issues.apache.org/jira/browse/YARN-6811 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineclient, timelineserver >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: YARN-6811.01.patch, YARN-6811.02.patch > > > ATS1.5 allows to store history data in underlying FileSystem folder path i.e > */acitve-dir* and */done-dir*. These base directories are protected for > unauthorized user access for other users data by setting sticky bit for > /active-dir. > But object store filesystems such as WASB does not have user access control > on folders and files. When WASB are used as underlying file system for > ATS1.5, the history data which are stored in FS are accessible to all users. > *This would be a security risk* > I would propose to keep history data under its own user directory i.e > */active-dir/$USER*. Even this do not solve basic user access from FS, but it > provides capability to plugin Apache Ranger policies for each user folders. > One thing to note that setting policies to each user folder is admin > responsibility. But grouping all history data of one user folder allows to > set policies so that user access control is achieved. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6777) Support for ApplicationMasterService processing chain of interceptors
[ https://issues.apache.org/jira/browse/YARN-6777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-6777: -- Fix Version/s: 2.9.0 > Support for ApplicationMasterService processing chain of interceptors > - > > Key: YARN-6777 > URL: https://issues.apache.org/jira/browse/YARN-6777 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh > Fix For: 2.9.0, 3.0.0-beta1 > > Attachments: YARN-6777.001.patch, YARN-6777.002.patch, > YARN-6777.003.patch, YARN-6777.004.patch, YARN-6777.005.patch, > YARN-6777.006.patch > > > This JIRA extends the Processor introduced in YARN-6776 with a configurable > processing chain of interceptors. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6777) Support for ApplicationMasterService processing chain of interceptors
[ https://issues.apache.org/jira/browse/YARN-6777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115127#comment-16115127 ] Arun Suresh commented on YARN-6777: --- Committed to branch-2 > Support for ApplicationMasterService processing chain of interceptors > - > > Key: YARN-6777 > URL: https://issues.apache.org/jira/browse/YARN-6777 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh > Fix For: 2.9.0, 3.0.0-beta1 > > Attachments: YARN-6777.001.patch, YARN-6777.002.patch, > YARN-6777.003.patch, YARN-6777.004.patch, YARN-6777.005.patch, > YARN-6777.006.patch > > > This JIRA extends the Processor introduced in YARN-6776 with a configurable > processing chain of interceptors. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-6852: - Attachment: (was: YARN-6033.009.patch) > [YARN-6223] Native code changes to support isolate GPU devices by using > CGroups > --- > > Key: YARN-6852 > URL: https://issues.apache.org/jira/browse/YARN-6852 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6852.001.patch, YARN-6852.002.patch, > YARN-6852.003.patch > > > This JIRA plan to add support of: > 1) Isolation in CGroups. (native side). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-6852: - Attachment: YARN-6033.009.patch > [YARN-6223] Native code changes to support isolate GPU devices by using > CGroups > --- > > Key: YARN-6852 > URL: https://issues.apache.org/jira/browse/YARN-6852 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6852.001.patch, YARN-6852.002.patch, > YARN-6852.003.patch > > > This JIRA plan to add support of: > 1) Isolation in CGroups. (native side). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115055#comment-16115055 ] Wangda Tan commented on YARN-6852: -- Hi Miklos, really appreciate your thorough reviews, very helpful! I address most of your comments. Few items which I haven't addressed in the updated patch. bq. Why do you have cgroup_cfg_section? You could eliminate it and get it all the time or just cache cgroups_root. I still prefer to have it since this can help us get more configs without changing major code structure. bq. int input_argv_idx = 0; the first argument is the process name. Actually the argc and argv are modified in main.c before passed to modules, I removed process name already: {code} +return handle_gpu_request(_cgroups_parameters, "gpu", argc - 1, + [1]); {code} Please let me know if you have any suggestions to the approach. bq. opts->keys = malloc(sizeof(char*) * (argc + 1)); Why argc+1 and not argc-1? Updated to argc. bq. required and has_values could be implemented as a bit array instead of a byte array. Another option ... Since container-executor is not a memory-intensive application, I would prefer to spend time on changing it when it is necessary or there's any safety concerns. :) bq. This pattern is C+0x. I think Varun mentioned this in YARN-6033, it is C99: https://stackoverflow.com/a/330867 bq. arr[idx] = n; There is no overflow check. This could also be exploitable. This might not be an issue since we have already checked the input string once: {code} for (int i = 0; i < strlen(input); i++) { if (input[i] == ',') { n_numbers++; } } {code} bq. container_1 is an invalid container id in the unit tests. They will fail. Did you mean we should not fail the check? "container_1" is actually an invalid id in YARN. bq. There is no indentation after namespace ContainerExecutor I would prefer to not add extra indention for namespace. There're some discussions on SO: https://stackoverflow.com/questions/713698/c-namespaces-advice bq. static std::vector cgroups_parameters_invoked; I think you should consider std::string here. No need to malloc later bq. You do not clean up files in the unit tests, do you? Is there a reason? (TODO) Will include unit test related changes and clean ups in the next patch. Updated ver.003 patch. > [YARN-6223] Native code changes to support isolate GPU devices by using > CGroups > --- > > Key: YARN-6852 > URL: https://issues.apache.org/jira/browse/YARN-6852 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6852.001.patch, YARN-6852.002.patch, > YARN-6852.003.patch > > > This JIRA plan to add support of: > 1) Isolation in CGroups. (native side). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-6852: - Attachment: YARN-6852.003.patch > [YARN-6223] Native code changes to support isolate GPU devices by using > CGroups > --- > > Key: YARN-6852 > URL: https://issues.apache.org/jira/browse/YARN-6852 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6852.001.patch, YARN-6852.002.patch, > YARN-6852.003.patch > > > This JIRA plan to add support of: > 1) Isolation in CGroups. (native side). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115055#comment-16115055 ] Wangda Tan edited comment on YARN-6852 at 8/4/17 10:46 PM: --- Hi Miklos, really appreciate your thorough reviews, very helpful! I address most of your comments. Few items which I haven't addressed in the updated patch. bq. Why do you have cgroup_cfg_section? You could eliminate it and get it all the time or just cache cgroups_root. I still prefer to have it since this can help us get more configs without changing major code structure. bq. int input_argv_idx = 0; the first argument is the process name. Actually the argc and argv are modified in main.c before passed to modules, I removed process name already: {code} +return handle_gpu_request(_cgroups_parameters, "gpu", argc - 1, + [1]); {code} Please let me know if you have any suggestions to the approach. bq. opts->keys = malloc(sizeof(char*) * (argc + 1)); Why argc+1 and not argc-1? Updated to argc. bq. required and has_values could be implemented as a bit array instead of a byte array. Another option ... Since container-executor is not a memory-intensive application, I would prefer to spend time on changing it when it is necessary or there's any safety concerns. :) bq. This pattern is C+0x. I think Varun mentioned this in YARN-6033, it is C99: https://stackoverflow.com/a/330867 bq. arr[idx] = n; There is no overflow check. This could also be exploitable. This might not be an issue since we have already checked the input string once: {code} for (int i = 0; i < strlen(input); i++) { if (input[i] == ',') { n_numbers++; } } {code} bq. container_1 is an invalid container id in the unit tests. They will fail. Did you mean we should not fail the check? "container_1" is actually an invalid id in YARN. bq. There is no indentation after namespace ContainerExecutor I would prefer to not add extra indention for namespace. There're some discussions on SO: https://stackoverflow.com/questions/713698/c-namespaces-advice bq. static std::vector cgroups_parameters_invoked; I think you should consider std::string here. No need to malloc later bq. You do not clean up files in the unit tests, do you? Is there a reason? (TODO) Will include unit test related changes and clean ups in the next patch. Updated ver.003 patch. [~miklos.szeg...@cloudera.com], mind to check again? was (Author: leftnoteasy): Hi Miklos, really appreciate your thorough reviews, very helpful! I address most of your comments. Few items which I haven't addressed in the updated patch. bq. Why do you have cgroup_cfg_section? You could eliminate it and get it all the time or just cache cgroups_root. I still prefer to have it since this can help us get more configs without changing major code structure. bq. int input_argv_idx = 0; the first argument is the process name. Actually the argc and argv are modified in main.c before passed to modules, I removed process name already: {code} +return handle_gpu_request(_cgroups_parameters, "gpu", argc - 1, + [1]); {code} Please let me know if you have any suggestions to the approach. bq. opts->keys = malloc(sizeof(char*) * (argc + 1)); Why argc+1 and not argc-1? Updated to argc. bq. required and has_values could be implemented as a bit array instead of a byte array. Another option ... Since container-executor is not a memory-intensive application, I would prefer to spend time on changing it when it is necessary or there's any safety concerns. :) bq. This pattern is C+0x. I think Varun mentioned this in YARN-6033, it is C99: https://stackoverflow.com/a/330867 bq. arr[idx] = n; There is no overflow check. This could also be exploitable. This might not be an issue since we have already checked the input string once: {code} for (int i = 0; i < strlen(input); i++) { if (input[i] == ',') { n_numbers++; } } {code} bq. container_1 is an invalid container id in the unit tests. They will fail. Did you mean we should not fail the check? "container_1" is actually an invalid id in YARN. bq. There is no indentation after namespace ContainerExecutor I would prefer to not add extra indention for namespace. There're some discussions on SO: https://stackoverflow.com/questions/713698/c-namespaces-advice bq. static std::vector cgroups_parameters_invoked; I think you should consider std::string here. No need to malloc later bq. You do not clean up files in the unit tests, do you? Is there a reason? (TODO) Will include unit test related changes and clean ups in the next patch. Updated ver.003 patch. > [YARN-6223] Native code changes to support isolate GPU devices by using > CGroups > --- > > Key: YARN-6852 > URL:
[jira] [Commented] (YARN-6811) [ATS1.5] All history logs should be kept under its own User Directory.
[ https://issues.apache.org/jira/browse/YARN-6811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115099#comment-16115099 ] Hudson commented on YARN-6811: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12122 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/12122/]) YARN-6811. [ATS1.5] All history logs should be kept under its own User (junping_du: rev f44b349b813508f0f6d99ca10bddba683dedf6c4) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/src/main/java/org/apache/hadoop/yarn/server/timeline/EntityGroupFSTimelineStore.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/FileSystemTimelineWriter.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClientForATS1_5.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/src/test/java/org/apache/hadoop/yarn/server/timeline/TestEntityGroupFSTimelineStore.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml > [ATS1.5] All history logs should be kept under its own User Directory. > --- > > Key: YARN-6811 > URL: https://issues.apache.org/jira/browse/YARN-6811 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineclient, timelineserver >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: YARN-6811.01.patch, YARN-6811.02.patch > > > ATS1.5 allows to store history data in underlying FileSystem folder path i.e > */acitve-dir* and */done-dir*. These base directories are protected for > unauthorized user access for other users data by setting sticky bit for > /active-dir. > But object store filesystems such as WASB does not have user access control > on folders and files. When WASB are used as underlying file system for > ATS1.5, the history data which are stored in FS are accessible to all users. > *This would be a security risk* > I would propose to keep history data under its own user directory i.e > */active-dir/$USER*. Even this do not solve basic user access from FS, but it > provides capability to plugin Apache Ranger policies for each user folders. > One thing to note that setting policies to each user folder is admin > responsibility. But grouping all history data of one user folder allows to > set policies so that user access control is achieved. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6033) Add support for sections in container-executor configuration file
[ https://issues.apache.org/jira/browse/YARN-6033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115111#comment-16115111 ] Hadoop QA commented on YARN-6033: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 48s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 7 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 44s{color} | {color:green} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager generated 0 new + 0 unchanged - 1 fixed = 0 total (was 1) {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 29s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 33m 38s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-6033 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12880477/YARN-6033.009.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit xml cc | | uname | Linux f21460b5090a 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / f44b349 | | Default Java | 1.8.0_131 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/16716/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/16716/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Add support for sections in container-executor configuration file > - > > Key: YARN-6033 > URL: https://issues.apache.org/jira/browse/YARN-6033 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-6033.003.patch, YARN-6033.004.patch, > YARN-6033.005.patch, YARN-6033.006.patch, YARN-6033.007.patch, > YARN-6033.008.patch, YARN-6033.009.patch, YARN-6033-YARN-5673.001.patch, > YARN-6033-YARN-5673.002.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (YARN-3254) HealthReport should include disk full information
[ https://issues.apache.org/jira/browse/YARN-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115112#comment-16115112 ] Hadoop QA commented on YARN-3254: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 52s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 42s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager in trunk has 5 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 14s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 14 new + 35 unchanged - 0 fixed = 49 total (was 35) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 13s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 12s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 34m 26s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-3254 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12880458/YARN-3254-005.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 6023976564c6 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / f44b349 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-YARN-Build/16714/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/16714/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/16714/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/16714/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT
[jira] [Commented] (YARN-6776) Refactor ApplicaitonMasterService to move actual processing logic to a separate class
[ https://issues.apache.org/jira/browse/YARN-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115117#comment-16115117 ] Arun Suresh commented on YARN-6776: --- Committed this to branch-2 > Refactor ApplicaitonMasterService to move actual processing logic to a > separate class > - > > Key: YARN-6776 > URL: https://issues.apache.org/jira/browse/YARN-6776 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh >Priority: Minor > Fix For: 2.9.0, 3.0.0-beta1 > > Attachments: YARN-6776.001.patch, YARN-6776.002.patch, > YARN-6776.003.patch, YARN-6776.004.patch > > > Minor refactoring to move the processing logic of the > {{ApplicationMasterService}} into a separate class. > The per appattempt locking as well as the extraction of the appAttemptId etc. > will remain in the ApplicationMasterService -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6776) Refactor ApplicaitonMasterService to move actual processing logic to a separate class
[ https://issues.apache.org/jira/browse/YARN-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-6776: -- Fix Version/s: 2.9.0 > Refactor ApplicaitonMasterService to move actual processing logic to a > separate class > - > > Key: YARN-6776 > URL: https://issues.apache.org/jira/browse/YARN-6776 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh >Priority: Minor > Fix For: 2.9.0, 3.0.0-beta1 > > Attachments: YARN-6776.001.patch, YARN-6776.002.patch, > YARN-6776.003.patch, YARN-6776.004.patch > > > Minor refactoring to move the processing logic of the > {{ApplicationMasterService}} into a separate class. > The per appattempt locking as well as the extraction of the appAttemptId etc. > will remain in the ApplicationMasterService -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6955) Concurrent registerAM thread in Federation Interceptor
[ https://issues.apache.org/jira/browse/YARN-6955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115131#comment-16115131 ] Hadoop QA commented on YARN-6955: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 50s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 39s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager in trunk has 5 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 7s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 38s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 33s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 3 new + 1 unchanged - 0 fixed = 4 total (was 1) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 47s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager generated 1 new + 5 unchanged - 0 fixed = 6 total (was 5) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 12s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 31s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 47m 17s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | | Inconsistent synchronization of org.apache.hadoop.yarn.server.nodemanager.amrmproxy.FederationInterceptor.amRegistrationRequest; locked 50% of time Unsynchronized access at FederationInterceptor.java:50% of time Unsynchronized access at FederationInterceptor.java:[line 305] | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-6955 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12880476/YARN-6955.v1.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 11c3c52aa14f 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (YARN-3254) HealthReport should include disk full information
[ https://issues.apache.org/jira/browse/YARN-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115142#comment-16115142 ] Suma Shivaprasad commented on YARN-3254: [~sunilg] Can you pls review the updated patch? > HealthReport should include disk full information > - > > Key: YARN-3254 > URL: https://issues.apache.org/jira/browse/YARN-3254 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Akira Ajisaka >Assignee: Suma Shivaprasad > Fix For: 3.0.0-beta1 > > Attachments: Screen Shot 2015-02-24 at 17.57.39.png, Screen Shot > 2015-02-25 at 14.38.10.png, YARN-3254-001.patch, YARN-3254-002.patch, > YARN-3254-003.patch, YARN-3254-004.patch, YARN-3254-005.patch > > > When a NodeManager's local disk gets almost full, the NodeManager sends a > health report to ResourceManager that "local/log dir is bad" and the message > is displayed on ResourceManager Web UI. It's difficult for users to detect > why the dir is bad. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6634) [API] Refactor ResourceManager WebServices to make API explicit
[ https://issues.apache.org/jira/browse/YARN-6634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115150#comment-16115150 ] Carlo Curino commented on YARN-6634: Thanks [~giovanni.fumarola] for the branch-2 version. Patch looks good to me, I committed this to branch-2 and I am closing this JIRA. > [API] Refactor ResourceManager WebServices to make API explicit > --- > > Key: YARN-6634 > URL: https://issues.apache.org/jira/browse/YARN-6634 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Subru Krishnan >Assignee: Giovanni Matteo Fumarola >Priority: Critical > Fix For: 2.9.0, 3.0.0-alpha4 > > Attachments: YARN-6634-branch-2.v1.patch, YARN-6634.proto.patch, > YARN-6634.v1.patch, YARN-6634.v2.patch, YARN-6634.v3.patch, > YARN-6634.v4.patch, YARN-6634.v5.patch, YARN-6634.v6.patch, > YARN-6634.v7.patch, YARN-6634.v8.patch, YARN-6634.v9.patch > > > The RM exposes few REST queries but there's no clear API interface defined. > This makes it painful to build either clients or extension components like > Router (YARN-5412) that expose REST interfaces themselves. This jira proposes > adding a RM WebServices protocol similar to the one we have for RPC, i.e. > {{ApplicationClientProtocol}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6955) Concurrent registerAM thread in Federation Interceptor
[ https://issues.apache.org/jira/browse/YARN-6955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-6955: - Issue Type: Sub-task (was: Bug) Parent: YARN-5597 > Concurrent registerAM thread in Federation Interceptor > -- > > Key: YARN-6955 > URL: https://issues.apache.org/jira/browse/YARN-6955 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Attachments: YARN-6955.v1.patch > > > The timeout between AM and AMRMProxy is shorter than the timeout + failOver > between FederationInterceptor (AMRMProxy) and RM. When the first register > thread in FI is blocked because of an RM failover, AM can timeout and resend > register call, leading to two outstanding register call inside FI. > Eventually when RM comes back up, one thread succeeds register and the other > thread got an application already registered exception. FI should swallow the > exception and return success back to AM in both threads. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-3661) Basic Federation UI
[ https://issues.apache.org/jira/browse/YARN-3661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan reassigned YARN-3661: Assignee: Inigo Goiri (was: Giovanni Matteo Fumarola) > Basic Federation UI > > > Key: YARN-3661 > URL: https://issues.apache.org/jira/browse/YARN-3661 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Giovanni Matteo Fumarola >Assignee: Inigo Goiri > > The UIs provided by each RM, provide a correct "local" view of what is > running in a sub-cluster. In the context of federation we need new > UIs that can track load, jobs, users across sub-clusters. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6946) Upgrade JUnit from 4 to 5 in hadoop-yarn-common
[ https://issues.apache.org/jira/browse/YARN-6946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated YARN-6946: Attachment: YARN-6946.wip001.patch I'm trying to upgrade JUnit 4 to 5 in hadoop-yarn-common. Very hard work. > Upgrade JUnit from 4 to 5 in hadoop-yarn-common > --- > > Key: YARN-6946 > URL: https://issues.apache.org/jira/browse/YARN-6946 > Project: Hadoop YARN > Issue Type: Sub-task > Components: test >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka > Attachments: YARN-6946.wip001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6361) FairScheduler: FSLeafQueue.fetchAppsWithDemand CPU usage is high with big queues
[ https://issues.apache.org/jira/browse/YARN-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114258#comment-16114258 ] YunFan Zhou edited comment on YARN-6361 at 8/4/17 11:21 AM: [~yufeigu] Thank Yufei. For this question, including the optimization of the scheduling performance of the FairScheduler, I have the following ideas, and I apply these ideas to our production environment. The performance of the scheduling is ideal, and the speed of the assigning container can reach 5000 ~ 1 per second when aggregate resource requirements for the cluster is high. Here's what I do: * Avoid frequent ordering, and it's pointless and a waste of time to do a sequence before each assign container. Because, after each assignment, the whole child nodes of the queue are basically staying in order. And we don't really need to ensure that all of our fair shares is guaranteed, after all, even though we do a sort of order before each of the container's assignment because the *FSQueue#demand* is updated in the last time the *FairScheduler# update* cycle. So the value of demand is not real time, which also leads to the fact that we are not strictly and fairly shared. So, we can sort all the queues at the *FairScheduler#update* cycle, and we now have a default of 0.5 s per update cycle, which is worth doing. Since we have not been able to make a strict fair share, why don't we sacrifice some of our semantics of fair scheduler in exchange for better performance? * Improve the performance of the *Schedulable#getResourceUsage* calculation, making it complex in O(1). For one, there are several related smaller but especially useful optimization points. And we can guarantee that the cost of assigning a container is at O(1) complexity. But I don't know if you can accept that. If you can accept it, I will list a few more detailed points later. was (Author: daemon): [~yufeigu] Thank Yufei. For this question, including the optimization of the scheduling performance of the FairScheduler, I have the following ideas, and I apply these ideas to our production environment. The performance of the scheduling is ideal, and the speed of the assigning container can reach 5000 ~ 1 per second when aggregate resource requirements for the cluster is high. Here's what I do: * Avoid frequent ordering, and it's pointless and a waste of time to do a sequence before each assign container. Because, after each assignment, the whole child nodes of the queue are basically staying in order. And we don't really need to ensure that all of our fair shares is guaranteed, after all, even though we do a sort of order before each of the container's assignment because the *FSQueue#demand* is updated in the last time the *FairScheduler# update* cycle. So the value of demand is not real time, which also leads to the fact that we are not strictly and fairly shared. So, we can sort all the queues at the *FairScheduler#update* cycle, and we now have a default of 0.5 s per update cycle, which is worth doing. Since we have not been able to make a strict fair share, why don't we sacrifice some of our semantics of fair scheduler in exchange for better performance? * Improve the performance of the *Schedulable#getResourceUsage* calculation, making it complex in O(1). For one, there are several related smaller but especially useful optimization points. But I don't know if you can accept that. If you can accept it, I will list a few more detailed points later. > FairScheduler: FSLeafQueue.fetchAppsWithDemand CPU usage is high with big > queues > > > Key: YARN-6361 > URL: https://issues.apache.org/jira/browse/YARN-6361 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Miklos Szegedi >Assignee: YunFan Zhou >Priority: Minor > Attachments: dispatcherthread.png, threads.png > > > FSLeafQueue.fetchAppsWithDemand sorts the applications by the current policy. > Most of the time is spent in FairShareComparator.compare. We could improve > this by doing the calculations outside the sort loop {{(O\(n\))}} and we > sorted by a fixed number inside instead {{O(n*log\(n\))}}. This could be an > performance issue when there are huge number of applications in a single > queue. The attachments shows the performance impact when there are 10k > applications in one queue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6946) Upgrade JUnit from 4 to 5 in hadoop-yarn-common
[ https://issues.apache.org/jira/browse/YARN-6946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114283#comment-16114283 ] Akira Ajisaka commented on YARN-6946: - I'm thinking it's too early to migrate JUnit from 4 to 5. JUnit 5 supports Java 8 or upper, so the migration effects on trunk, not on branch-2. Now most of the patches are backported to branch-2, and the backports will become much harder. > Upgrade JUnit from 4 to 5 in hadoop-yarn-common > --- > > Key: YARN-6946 > URL: https://issues.apache.org/jira/browse/YARN-6946 > Project: Hadoop YARN > Issue Type: Sub-task > Components: test >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka > Attachments: YARN-6946.wip001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6133) [ATSv2 Security] Renew delegation token for app automatically if an app collector is active
[ https://issues.apache.org/jira/browse/YARN-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114248#comment-16114248 ] Varun Saxena edited comment on YARN-6133 at 8/4/17 12:26 PM: - Thanks [~rohithsharma] for the review. bq. Token is renewed just before 10 seconds. Should it be increased? What do you suggest? 10 seconds should be enough as we renew only in DT manager i.e. internally in NM. Token doesn't need to go to AM. Right? bq. TimelineCollectorManager has introduced synchronized block. This is not necessary right.? This is to avoid race between Collector stopping and renewal timer expiring. So that additional renewal timer is not set unnecessarily. Has no functional impact though even if we set because it just wont find collector on expiry. But I thought better to avoid it altogether. Thoughts? bq. Renewer threads count is 1. Given load on NM not much, one thread can renew it. But I would suggest to keep it to 50? How many active collectors do we expect in one NM? Token renewal and token generation is not a very heavy task as well. Assuming we have 1000 active apps in say a 5000 node large cluster, we will have AMs' distributed across multiple nodes. So It is unlikely you will have more than 4-5 app collectors running in any NM at a particular moment. And even there it is unlikely that all collectors will have their token renewal expiry at same moment. There are no guarantees though. But it is unlikely. We may have a situation wherein we launch AMs' on a particular node partition though. In this case there might be some hotspotting, as in multiple app collectors on one node. But even there, 50 might be too many I think. We can keep a value higher than 1 though if you have concerns with only 1 thread, maybe 3-5. Keep it configurable with default 3 or 5? was (Author: varun_saxena): bq. Token is renewed just before 10 seconds. Should it be increased? What do you suggest? 10 seconds should be enough as we renew only in DT manager i.e. internally in NM. Token doesn't need to go to AM. Right? bq. TimelineCollectorManager has introduced synchronized block. This is not necessary right.? This is to avoid race between Collector stopping and renewal timer expiring. So that additional renewal timer is not set unnecessarily. Has no functional impact though even if we set because it just wont find collector on expiry. But I thought better to avoid it altogether. Thoughts? bq. Renewer threads count is 1. Given load on NM not much, one thread can renew it. But I would suggest to keep it to 50? How many active collectors do we expect in one NM? Token renewal and token generation is not a very heavy task as well. Assuming we have 1000 active apps in say a 5000 node large cluster, we will have AMs' distributed across multiple nodes. So It is unlikely you will have more than 4-5 app collectors running in any NM at a particular moment. And even there it is unlikely that all collectors will have their token renewal expiry at same moment. There are no guarantees though. But it is unlikely. We may have a situation wherein we launch AMs' on a particular node partition though. In this case there might be some hotspotting, as in multiple app collectors on one node. But even there, 50 might be too many I think. We can keep a value higher than 1 though if you have concerns with only 1 thread, maybe 3-5. Keep it configurable with default 3 or 5? > [ATSv2 Security] Renew delegation token for app automatically if an app > collector is active > --- > > Key: YARN-6133 > URL: https://issues.apache.org/jira/browse/YARN-6133 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-5355-merge-blocker > Attachments: YARN-6133-YARN-5355.01.patch, > YARN-6133-YARN-5355.02.patch, YARN-6133-YARN-5355.03.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6930) Admins should be able to explicitly enable specific LinuxContainerRuntime in the NodeManager
[ https://issues.apache.org/jira/browse/YARN-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114336#comment-16114336 ] Greg Phillips commented on YARN-6930: - Whitelisting runtimes seems to be the best option. I would likely modify the way sandbox-mode is selected to rely on the runtime whitelist and the container environment instead of using {{yarn.nodemanager.runtime.linux.sandbox-mode}}. This would remove the redundant knob issue. > Admins should be able to explicitly enable specific LinuxContainerRuntime in > the NodeManager > > > Key: YARN-6930 > URL: https://issues.apache.org/jira/browse/YARN-6930 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Vinod Kumar Vavilapalli >Assignee: Shane Kumpf > > Today, in the java land, all LinuxContainerRuntimes are always enabled when > using LinuxContainerExecutor and the user can simply invoke anything that > he/she wants - default, docker, java-sandbox. > We should have a way for admins to explicitly enable only specific runtimes > that he/she decides for the cluster. And by default, we should have > everything other than the default one disabled. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6951) Fix debug log when Resource handler chain is enabled
[ https://issues.apache.org/jira/browse/YARN-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-6951: -- Priority: Minor (was: Major) > Fix debug log when Resource handler chain is enabled > > > Key: YARN-6951 > URL: https://issues.apache.org/jira/browse/YARN-6951 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang >Assignee: Yang Wang >Priority: Minor > Labels: newbie++ > Attachments: YARN-6951.001.patch > > > {code:title=LinuxContainerExecutor.java} > ... ... > if (LOG.isDebugEnabled()) { > LOG.debug("Resource handler chain enabled = " + (resourceHandlerChain > == null)); > } > ... ... > {code} > I think it is just a typo.When resourceHandlerChain is not null, print the > log "Resource handler chain enabled = true". -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6951) Fix debug log when Resource handler chain is enabled
[ https://issues.apache.org/jira/browse/YARN-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114345#comment-16114345 ] Sunil G commented on YARN-6951: --- Looks fine. Will commit once jenkins is run. > Fix debug log when Resource handler chain is enabled > > > Key: YARN-6951 > URL: https://issues.apache.org/jira/browse/YARN-6951 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang >Assignee: Yang Wang >Priority: Minor > Labels: newbie++ > Attachments: YARN-6951.001.patch > > > {code:title=LinuxContainerExecutor.java} > ... ... > if (LOG.isDebugEnabled()) { > LOG.debug("Resource handler chain enabled = " + (resourceHandlerChain > == null)); > } > ... ... > {code} > I think it is just a typo.When resourceHandlerChain is not null, print the > log "Resource handler chain enabled = true". -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org