[jira] [Commented] (YARN-3249) Add the kill application to the Resource Manager Web UI
[ https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344707#comment-14344707 ] Tsuyoshi Ozawa commented on YARN-3249: -- One more minor comment about indentation: {code} + html.div() +.button() + .$onclick(String.format("confirmAction('%s')", +url(String.format("/killapp/%s", aid +.b("Kill Application") + ._() + ._(); {code} Above lines should be same as following lines: {code} + html.script().$type("text/javascript") + ._("function confirmAction(href) { " + + "b = confirm(\"Are you sure?\");" + + "if (b == true) {" + + " location.href = href;" + + "}" ++ "}") + ._(); {code} > Add the kill application to the Resource Manager Web UI > --- > > Key: YARN-3249 > URL: https://issues.apache.org/jira/browse/YARN-3249 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0, 2.7.0 >Reporter: Ryu Kobayashi >Assignee: Ryu Kobayashi >Priority: Minor > Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.3.patch, > YARN-3249.4.patch, YARN-3249.5.patch, YARN-3249.patch, killapp-failed.log, > killapp-failed2.log, screenshot.png, screenshot2.png > > > It want to kill the application on the JobTracker similarly Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344739#comment-14344739 ] Rohith commented on YARN-3222: -- Had a mail chat with [~jianhe] regarding the issue's observed in this jira discussions and decided to split up the jira into 2 separate jira. The observed issues in ReconnectNodeTransition are # As per defect description, order of node_resource_update and node_added events sending to schedulers. If Node_added events is being sent to schedulers then no need of sending node_resource_update event from RMNode again to scheduler which is not necessarily required. # If the RMNode state is RUNNING then Node_usable event not necessarily to be sent. # If a node is reconnceted with different capability, then RMNode#totalCapability remains with old capability. This has to be updated with new capability. 1 and 2 are going to handle in this jira. 3 issue will be done in separate jira. > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3249) Add the kill application to the Resource Manager Web UI
[ https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344766#comment-14344766 ] Hadoop QA commented on YARN-3249: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702071/YARN-3249.5.patch against trunk revision 742f9d9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6817//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6817//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6817//console This message is automatically generated. > Add the kill application to the Resource Manager Web UI > --- > > Key: YARN-3249 > URL: https://issues.apache.org/jira/browse/YARN-3249 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0, 2.7.0 >Reporter: Ryu Kobayashi >Assignee: Ryu Kobayashi >Priority: Minor > Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.3.patch, > YARN-3249.4.patch, YARN-3249.5.patch, YARN-3249.patch, killapp-failed.log, > killapp-failed2.log, screenshot.png, screenshot2.png > > > It want to kill the application on the JobTracker similarly Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3286) RMNode#totalCapability has stale capability after NM is reconnected.
Rohith created YARN-3286: Summary: RMNode#totalCapability has stale capability after NM is reconnected. Key: YARN-3286 URL: https://issues.apache.org/jira/browse/YARN-3286 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Rohith Assignee: Rohith This is found while fixing YARN-3222 mentioned in the comment [link1|https://issues.apache.org/jira/browse/YARN-3222?focusedCommentId=14339799&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14339799] and [link2|https://issues.apache.org/jira/browse/YARN-3222?focusedCommentId=14344739&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14344739] And RMNode#ReconnectNodeTransition clean up : It always remove an old node and add a new node. This need to be examined whether this is really required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344790#comment-14344790 ] Rohith commented on YARN-3222: -- For handling 3rd point, raised issue YARN-3286 > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-3222: - Attachment: 0004-YARN-3222.patch > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3286) RMNode#totalCapability has stale capability after NM is reconnected.
[ https://issues.apache.org/jira/browse/YARN-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-3286: - Attachment: YARN-3286-test-only.patch Attached test patch that simulate the issue. > RMNode#totalCapability has stale capability after NM is reconnected. > > > Key: YARN-3286 > URL: https://issues.apache.org/jira/browse/YARN-3286 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Rohith >Assignee: Rohith > Attachments: YARN-3286-test-only.patch > > > This is found while fixing YARN-3222 mentioned in the comment > [link1|https://issues.apache.org/jira/browse/YARN-3222?focusedCommentId=14339799&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14339799] > and > [link2|https://issues.apache.org/jira/browse/YARN-3222?focusedCommentId=14344739&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14344739] > And RMNode#ReconnectNodeTransition clean up : It always remove an old node > and add a new node. This need to be examined whether this is really required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344802#comment-14344802 ] Rohith commented on YARN-3222: -- Kindly review the update patch that fixes 1& 2 in as mentioned in earlier comment. > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI
[ https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344848#comment-14344848 ] Varun Vasudev commented on YARN-3248: - Thanks for the feedback [~ozawa], [~vinodkv]. {quote} The blacklist is an instance of HashSet, so it can throw ConcurrentModificationException when blacklist is modified in another thread. One alternative is to use Collections.newSetFromMap(new ConcurrentHashMap()) instead of HashSet. {quote} Good catch. Collections.newSetFromMap won't work because the blacklist itself is a set. I create a copy of the structure in the latest patch. bq. If AbstractYarnScheduler#getApplicationAttempt() can be used, I think it's more straightforward and simple. What do you think? Agreed. Changed the code. bq. Could you add tests to TestRMWebServicesApps? I'm not sure what tests to add. I'm not adding any new web services. {quote} The blacklist information is per application-attempt, and scheduler will forget previous application-attempts today. I think this is a general behaviour with the way blacklisting is done today - each AM is expected to explicitly blacklist all the nodes it wants to blacklist even if the previous attempt already informed about some of them before. That is how all of resource requests work. Given the above, we should make it clear that blacklists are really for this app-attempt. {quote} I was under this impression as well, but it the information is maintained on a per app basis in the AbstractYarnScheduler. {noformat} protected Map> applications; {noformat} bq. W.r.t UI, showing the list of all the nodes is going to be a UI scalability problem - how about we move this list to the per-app page? That is the place where this is useful the most. Agreed. Made the change. bq. We should also add this information to the web-services. You mean the app information web service? > Display count of nodes blacklisted by apps in the web UI > > > Key: YARN-3248 > URL: https://issues.apache.org/jira/browse/YARN-3248 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: Screenshot.jpg, apache-yarn-3248.0.patch > > > It would be really useful when debugging app performance and failure issues > to get a count of the nodes blacklisted by individual apps displayed in the > web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3248) Display count of nodes blacklisted by apps in the web UI
[ https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-3248: Attachment: App page.png All applications.png Uploaded screen shots from the latest patch. > Display count of nodes blacklisted by apps in the web UI > > > Key: YARN-3248 > URL: https://issues.apache.org/jira/browse/YARN-3248 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: All applications.png, App page.png, Screenshot.jpg, > apache-yarn-3248.0.patch > > > It would be really useful when debugging app performance and failure issues > to get a count of the nodes blacklisted by individual apps displayed in the > web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3248) Display count of nodes blacklisted by apps in the web UI
[ https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-3248: Attachment: apache-yarn-3248.1.patch Uploaded patch with changes. > Display count of nodes blacklisted by apps in the web UI > > > Key: YARN-3248 > URL: https://issues.apache.org/jira/browse/YARN-3248 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: All applications.png, App page.png, Screenshot.jpg, > apache-yarn-3248.0.patch, apache-yarn-3248.1.patch > > > It would be really useful when debugging app performance and failure issues > to get a count of the nodes blacklisted by individual apps displayed in the > web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI
[ https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344853#comment-14344853 ] Hadoop QA commented on YARN-3248: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702133/apache-yarn-3248.1.patch against trunk revision 4228de9. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6819//console This message is automatically generated. > Display count of nodes blacklisted by apps in the web UI > > > Key: YARN-3248 > URL: https://issues.apache.org/jira/browse/YARN-3248 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: All applications.png, App page.png, Screenshot.jpg, > apache-yarn-3248.0.patch, apache-yarn-3248.1.patch > > > It would be really useful when debugging app performance and failure issues > to get a count of the nodes blacklisted by individual apps displayed in the > web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3248) Display count of nodes blacklisted by apps in the web UI
[ https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-3248: Attachment: apache-yarn-3248.2.patch Uploaded new patch fixing conflict. > Display count of nodes blacklisted by apps in the web UI > > > Key: YARN-3248 > URL: https://issues.apache.org/jira/browse/YARN-3248 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: All applications.png, App page.png, Screenshot.jpg, > apache-yarn-3248.0.patch, apache-yarn-3248.1.patch, apache-yarn-3248.2.patch > > > It would be really useful when debugging app performance and failure issues > to get a count of the nodes blacklisted by individual apps displayed in the > web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3281) Add RMStateStore to StateMachine visualization list
[ https://issues.apache.org/jira/browse/YARN-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344905#comment-14344905 ] Hudson commented on YARN-3281: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #121 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/121/]) YARN-3281. Added RMStateStore to StateMachine visualization list. Contributed by Chengbing Liu (jianhe: rev 5d0bae550f5b9a6005aa1d373cfe1ec80513dbd9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml * hadoop-yarn-project/CHANGES.txt > Add RMStateStore to StateMachine visualization list > --- > > Key: YARN-3281 > URL: https://issues.apache.org/jira/browse/YARN-3281 > Project: Hadoop YARN > Issue Type: Bug > Components: scripts >Affects Versions: 2.6.0 >Reporter: Chengbing Liu >Assignee: Chengbing Liu >Priority: Minor > Fix For: 2.7.0 > > Attachments: YARN-3281.01.patch > > > The command "mvn compile -Pvisualize" should generate graph representations > for all state machines in the project. We are still missing > {{org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore}} for > resourcemanager project. > Another class > {{org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl.StatefulContainer}} > also has a state machine. However this one is a protected inner class, hence > cannot be seen by class {{VisualizeStateMachine}}. Any suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3265) CapacityScheduler deadlock when computing absolute max avail capacity (fix for trunk/branch-2)
[ https://issues.apache.org/jira/browse/YARN-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344903#comment-14344903 ] Hudson commented on YARN-3265: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #121 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/121/]) YARN-3265. Fixed a deadlock in CapacityScheduler by always passing a queue's available resource-limit from the parent queue. Contributed by Wangda Tan. (vinodkv: rev 14dd647c556016d351f425ee956ccf800ccb9ce2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestResourceUsage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCSQueueUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java > CapacityScheduler deadlock when computing absolute max avail capacity (fix > for trunk/branch-2) > -- > > Key: YARN-3265 > URL: https://issues.apache.org/jira/browse/YARN-3265 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Fix For: 2.7.0 > > Attachments: YARN-3265.1.patch, YARN-3265.2.patch, YARN-3265.3.patch, > YARN-3265.5.patch, YARN-3265.6.patch, YARN-3265.7.patch > > > This patch is trying to solve the same problem described in YARN-3251, but > this is a longer term fix for trunk and branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3270) node label expression not getting set in ApplicationSubmissionContext
[ https://issues.apache.org/jira/browse/YARN-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344899#comment-14344899 ] Hudson commented on YARN-3270: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #121 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/121/]) YARN-3270. Fix node label expression not getting set in ApplicationSubmissionContext (Rohit Agarwal via wangda) (wangda: rev abac6eb9d530bb1e6ff58ec3c75b17d840a0ee3f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java * hadoop-yarn-project/CHANGES.txt > node label expression not getting set in ApplicationSubmissionContext > - > > Key: YARN-3270 > URL: https://issues.apache.org/jira/browse/YARN-3270 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rohit Agarwal >Assignee: Rohit Agarwal >Priority: Minor > Fix For: 2.7.0 > > Attachments: YARN-3270.patch > > > One of the {{newInstance}} methods in {{ApplicationSubmissionContext}} is not > setting the {{appLabelExpression}} passed to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344914#comment-14344914 ] Junping Du commented on YARN-3039: -- Thanks for comments, [~Naganarasimha]! bq. +1 for this approach. Also if NM uses this new blocking call in AMRMClient to get aggregator address then there might not be any race conditions for posting AM container's life cycle events by NM immediately after creation of appAggregator through Aux service. Discussed with [~vinodkv] and [~zjshen] on this again offline. It looks heavy weight to make TimelineClient to wrap AMRMClient especially for security reason it make NM to take AMRMTokens for using TimelineClient in future which make less sense. To get rid of rack condition you mentioned above, we propose to use observer pattern to make TimelineClient can listen aggregator address update in AM or NM (wrap with retry logic to tolerant connection failure). bq. Are we just adding a method to get the aggregator address aggregator address ? or what other API's are planned ? Per above comments, we have no plan to add API to TimelineClient to talk to RM directly. bq. I beleive the idea of using AUX service was to to decouple NM and Timeline service. If NM will notify RM about new appAggregator creation (based on AUX service) then basically NM should be aware of PerNodeAggregatorServer is configured as AUX service, and and if it supports rebinding appAggregator for failure then it should be able to communicate with this Auxservice too, whether would this be clean approach? I agree we want to decouple things here. However, AUX service is not the only way to deploy app aggregators. There are other ways (check from diagram in YARN-3033) that app aggregators could be deployed in a separate process or an independent container which make less sense to have a protocol between AUX service and RM. I think now we should plan to add a protocol between aggregator and NM, and then notify RM through NM-RM heartbeat on registering/rebind for aggregator. bq. I also feel we need to support to start per app aggregator only if app requests for it (Zhijie also had mentioned abt this). If not we can make use of one default aggregator for all these kind of apps launched in NM, which is just used to post container entities from different NM's for these apps. My 2 cents here is app aggregator should have logic to consolidate all messages (events and metrics) for one application into more complex and flexible new data model. If each NM do aggregation separately, then it still a *writer* (like old timeline service), but not an *aggregator*. Thoughts? bq. Any discussions happened wrt RM having its own Aggregator ? I feel it would be better for RM to have it as it need not depend on any NM's to post any entities. Agree. I think we are on the same page now. Will update proposal to reflect all these discussions (JIRA's and offline). > [Aggregator wireup] Implement ATS app-appgregator service discovery > --- > > Key: YARN-3039 > URL: https://issues.apache.org/jira/browse/YARN-3039 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Junping Du > Attachments: Service Binding for applicationaggregator of ATS > (draft).pdf, YARN-3039-no-test.patch > > > Per design in YARN-2928, implement ATS writer service discovery. This is > essential for off-node clients to send writes to the right ATS writer. This > should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344919#comment-14344919 ] Hadoop QA commented on YARN-3222: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702122/0004-YARN-3222.patch against trunk revision 9ae7f9e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6818//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6818//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6818//console This message is automatically generated. > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3265) CapacityScheduler deadlock when computing absolute max avail capacity (fix for trunk/branch-2)
[ https://issues.apache.org/jira/browse/YARN-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344954#comment-14344954 ] Hudson commented on YARN-3265: -- FAILURE: Integrated in Hadoop-Yarn-trunk #855 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/855/]) YARN-3265. Fixed a deadlock in CapacityScheduler by always passing a queue's available resource-limit from the parent queue. Contributed by Wangda Tan. (vinodkv: rev 14dd647c556016d351f425ee956ccf800ccb9ce2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCSQueueUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestResourceUsage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java > CapacityScheduler deadlock when computing absolute max avail capacity (fix > for trunk/branch-2) > -- > > Key: YARN-3265 > URL: https://issues.apache.org/jira/browse/YARN-3265 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Fix For: 2.7.0 > > Attachments: YARN-3265.1.patch, YARN-3265.2.patch, YARN-3265.3.patch, > YARN-3265.5.patch, YARN-3265.6.patch, YARN-3265.7.patch > > > This patch is trying to solve the same problem described in YARN-3251, but > this is a longer term fix for trunk and branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3270) node label expression not getting set in ApplicationSubmissionContext
[ https://issues.apache.org/jira/browse/YARN-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344950#comment-14344950 ] Hudson commented on YARN-3270: -- FAILURE: Integrated in Hadoop-Yarn-trunk #855 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/855/]) YARN-3270. Fix node label expression not getting set in ApplicationSubmissionContext (Rohit Agarwal via wangda) (wangda: rev abac6eb9d530bb1e6ff58ec3c75b17d840a0ee3f) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java > node label expression not getting set in ApplicationSubmissionContext > - > > Key: YARN-3270 > URL: https://issues.apache.org/jira/browse/YARN-3270 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rohit Agarwal >Assignee: Rohit Agarwal >Priority: Minor > Fix For: 2.7.0 > > Attachments: YARN-3270.patch > > > One of the {{newInstance}} methods in {{ApplicationSubmissionContext}} is not > setting the {{appLabelExpression}} passed to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3281) Add RMStateStore to StateMachine visualization list
[ https://issues.apache.org/jira/browse/YARN-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344956#comment-14344956 ] Hudson commented on YARN-3281: -- FAILURE: Integrated in Hadoop-Yarn-trunk #855 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/855/]) YARN-3281. Added RMStateStore to StateMachine visualization list. Contributed by Chengbing Liu (jianhe: rev 5d0bae550f5b9a6005aa1d373cfe1ec80513dbd9) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml > Add RMStateStore to StateMachine visualization list > --- > > Key: YARN-3281 > URL: https://issues.apache.org/jira/browse/YARN-3281 > Project: Hadoop YARN > Issue Type: Bug > Components: scripts >Affects Versions: 2.6.0 >Reporter: Chengbing Liu >Assignee: Chengbing Liu >Priority: Minor > Fix For: 2.7.0 > > Attachments: YARN-3281.01.patch > > > The command "mvn compile -Pvisualize" should generate graph representations > for all state machines in the project. We are still missing > {{org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore}} for > resourcemanager project. > Another class > {{org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl.StatefulContainer}} > also has a state machine. However this one is a protected inner class, hence > cannot be seen by class {{VisualizeStateMachine}}. Any suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI
[ https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344966#comment-14344966 ] Hadoop QA commented on YARN-3248: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702135/apache-yarn-3248.2.patch against trunk revision 4228de9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6820//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6820//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6820//console This message is automatically generated. > Display count of nodes blacklisted by apps in the web UI > > > Key: YARN-3248 > URL: https://issues.apache.org/jira/browse/YARN-3248 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: All applications.png, App page.png, Screenshot.jpg, > apache-yarn-3248.0.patch, apache-yarn-3248.1.patch, apache-yarn-3248.2.patch > > > It would be really useful when debugging app performance and failure issues > to get a count of the nodes blacklisted by individual apps displayed in the > web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3265) CapacityScheduler deadlock when computing absolute max avail capacity (fix for trunk/branch-2)
[ https://issues.apache.org/jira/browse/YARN-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345122#comment-14345122 ] Hudson commented on YARN-3265: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2053 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2053/]) YARN-3265. Fixed a deadlock in CapacityScheduler by always passing a queue's available resource-limit from the parent queue. Contributed by Wangda Tan. (vinodkv: rev 14dd647c556016d351f425ee956ccf800ccb9ce2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestResourceUsage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCSQueueUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java > CapacityScheduler deadlock when computing absolute max avail capacity (fix > for trunk/branch-2) > -- > > Key: YARN-3265 > URL: https://issues.apache.org/jira/browse/YARN-3265 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Fix For: 2.7.0 > > Attachments: YARN-3265.1.patch, YARN-3265.2.patch, YARN-3265.3.patch, > YARN-3265.5.patch, YARN-3265.6.patch, YARN-3265.7.patch > > > This patch is trying to solve the same problem described in YARN-3251, but > this is a longer term fix for trunk and branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3270) node label expression not getting set in ApplicationSubmissionContext
[ https://issues.apache.org/jira/browse/YARN-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345118#comment-14345118 ] Hudson commented on YARN-3270: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2053 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2053/]) YARN-3270. Fix node label expression not getting set in ApplicationSubmissionContext (Rohit Agarwal via wangda) (wangda: rev abac6eb9d530bb1e6ff58ec3c75b17d840a0ee3f) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java > node label expression not getting set in ApplicationSubmissionContext > - > > Key: YARN-3270 > URL: https://issues.apache.org/jira/browse/YARN-3270 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rohit Agarwal >Assignee: Rohit Agarwal >Priority: Minor > Fix For: 2.7.0 > > Attachments: YARN-3270.patch > > > One of the {{newInstance}} methods in {{ApplicationSubmissionContext}} is not > setting the {{appLabelExpression}} passed to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3281) Add RMStateStore to StateMachine visualization list
[ https://issues.apache.org/jira/browse/YARN-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345124#comment-14345124 ] Hudson commented on YARN-3281: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2053 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2053/]) YARN-3281. Added RMStateStore to StateMachine visualization list. Contributed by Chengbing Liu (jianhe: rev 5d0bae550f5b9a6005aa1d373cfe1ec80513dbd9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml * hadoop-yarn-project/CHANGES.txt > Add RMStateStore to StateMachine visualization list > --- > > Key: YARN-3281 > URL: https://issues.apache.org/jira/browse/YARN-3281 > Project: Hadoop YARN > Issue Type: Bug > Components: scripts >Affects Versions: 2.6.0 >Reporter: Chengbing Liu >Assignee: Chengbing Liu >Priority: Minor > Fix For: 2.7.0 > > Attachments: YARN-3281.01.patch > > > The command "mvn compile -Pvisualize" should generate graph representations > for all state machines in the project. We are still missing > {{org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore}} for > resourcemanager project. > Another class > {{org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl.StatefulContainer}} > also has a state machine. However this one is a protected inner class, hence > cannot be seen by class {{VisualizeStateMachine}}. Any suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3281) Add RMStateStore to StateMachine visualization list
[ https://issues.apache.org/jira/browse/YARN-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345143#comment-14345143 ] Hudson commented on YARN-3281: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #112 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/112/]) YARN-3281. Added RMStateStore to StateMachine visualization list. Contributed by Chengbing Liu (jianhe: rev 5d0bae550f5b9a6005aa1d373cfe1ec80513dbd9) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml > Add RMStateStore to StateMachine visualization list > --- > > Key: YARN-3281 > URL: https://issues.apache.org/jira/browse/YARN-3281 > Project: Hadoop YARN > Issue Type: Bug > Components: scripts >Affects Versions: 2.6.0 >Reporter: Chengbing Liu >Assignee: Chengbing Liu >Priority: Minor > Fix For: 2.7.0 > > Attachments: YARN-3281.01.patch > > > The command "mvn compile -Pvisualize" should generate graph representations > for all state machines in the project. We are still missing > {{org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore}} for > resourcemanager project. > Another class > {{org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl.StatefulContainer}} > also has a state machine. However this one is a protected inner class, hence > cannot be seen by class {{VisualizeStateMachine}}. Any suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3265) CapacityScheduler deadlock when computing absolute max avail capacity (fix for trunk/branch-2)
[ https://issues.apache.org/jira/browse/YARN-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345141#comment-14345141 ] Hudson commented on YARN-3265: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #112 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/112/]) YARN-3265. Fixed a deadlock in CapacityScheduler by always passing a queue's available resource-limit from the parent queue. Contributed by Wangda Tan. (vinodkv: rev 14dd647c556016d351f425ee956ccf800ccb9ce2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestResourceUsage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCSQueueUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java > CapacityScheduler deadlock when computing absolute max avail capacity (fix > for trunk/branch-2) > -- > > Key: YARN-3265 > URL: https://issues.apache.org/jira/browse/YARN-3265 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Fix For: 2.7.0 > > Attachments: YARN-3265.1.patch, YARN-3265.2.patch, YARN-3265.3.patch, > YARN-3265.5.patch, YARN-3265.6.patch, YARN-3265.7.patch > > > This patch is trying to solve the same problem described in YARN-3251, but > this is a longer term fix for trunk and branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3270) node label expression not getting set in ApplicationSubmissionContext
[ https://issues.apache.org/jira/browse/YARN-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345136#comment-14345136 ] Hudson commented on YARN-3270: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #112 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/112/]) YARN-3270. Fix node label expression not getting set in ApplicationSubmissionContext (Rohit Agarwal via wangda) (wangda: rev abac6eb9d530bb1e6ff58ec3c75b17d840a0ee3f) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java > node label expression not getting set in ApplicationSubmissionContext > - > > Key: YARN-3270 > URL: https://issues.apache.org/jira/browse/YARN-3270 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rohit Agarwal >Assignee: Rohit Agarwal >Priority: Minor > Fix For: 2.7.0 > > Attachments: YARN-3270.patch > > > One of the {{newInstance}} methods in {{ApplicationSubmissionContext}} is not > setting the {{appLabelExpression}} passed to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3242) Old ZK client session watcher event causes ZKRMStateStore out of sync with current ZK client session due to ZooKeeper asynchronously closing client session.
[ https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345161#comment-14345161 ] Anubhav Dhoot commented on YARN-3242: - [~zxu] patch looks good overall. Instead of blindly switching in zkClient on a connect and removing it on a disconnect, we verify is activeZkClient is the one receiving the event Makes sense then that we get rid of oldZkClient logic and just have one zk client activeZkCLient that can get events, and on connection event is activated for use as zkClient to actually do processing. Verified that the updated unit test fails if i remove the check if (zk != activeZkClient) { The only minor nits a) is if we could add comments that activeZkClient is not used to do actual processing (thats still zkClient) but only to process watched events and on connection event it gets activated into zkClient. b) Also will CountdownWatcher#setWatchedClient be ever more than once? If not rename it to initializeWatchedClient and let it throw if client is already not null. LGTM otherwise > Old ZK client session watcher event causes ZKRMStateStore out of sync with > current ZK client session due to ZooKeeper asynchronously closing client > session. > > > Key: YARN-3242 > URL: https://issues.apache.org/jira/browse/YARN-3242 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3242.000.patch, YARN-3242.001.patch, > YARN-3242.002.patch, YARN-3242.003.patch > > > Old ZK client session watcher event messed up new ZK client session due to > ZooKeeper asynchronously closing client session. > The watcher event from old ZK client session can still be sent to > ZKRMStateStore after the old ZK client session is closed. > This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper > session. > We only have one ZKRMStateStore but we can have multiple ZK client sessions. > Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher > event is from current session. So the watcher event from old ZK client > session which just is closed will still be processed. > For example, If a Disconnected event received from old session after new > session is connected, the zkClient will be set to null > {code} > case Disconnected: > LOG.info("ZKRMStateStore Session disconnected"); > oldZkClient = zkClient; > zkClient = null; > break; > {code} > Then ZKRMStateStore won't receive SyncConnected event from new session > because new session is already in SyncConnected state and it won't send > SyncConnected event until it is disconnected and connected again. > Then we will see all the ZKRMStateStore operations fail with IOException > "Wait for ZKClient creation timed out" until RM shutdown. > The following code from zookeeper(ClientCnxn#EventThread) show even after > receive eventOfDeath, EventThread will still process all the events until > waitingEvents queue is empty. > {code} > while (true) { > Object event = waitingEvents.take(); > if (event == eventOfDeath) { > wasKilled = true; > } else { > processEvent(event); > } > if (wasKilled) > synchronized (waitingEvents) { >if (waitingEvents.isEmpty()) { > isRunning = false; > break; >} > } > } > private void processEvent(Object event) { > try { > if (event instanceof WatcherSetEventPair) { > // each watcher will process the event > WatcherSetEventPair pair = (WatcherSetEventPair) event; > for (Watcher watcher : pair.watchers) { > try { > watcher.process(pair.event); > } catch (Throwable t) { > LOG.error("Error while calling watcher ", t); > } > } > } else { > public void disconnect() { > if (LOG.isDebugEnabled()) { > LOG.debug("Disconnecting client for session: 0x" > + Long.toHexString(getSessionId())); > } > sendThread.close(); > eventThread.queueEventOfDeath(); > } > public void close() throws IOException { > if (LOG.isDebugEnabled()) { > LOG.debug("Closing clien
[jira] [Commented] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers
[ https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345160#comment-14345160 ] Junping Du commented on YARN-3031: -- Discussed offline with [~vrushalic] and [~zjshen] last week and we agree to consolidate APIs here. [~vrushalic], mind giving a quick update? Thx! > [Storage abstraction] Create backing storage write interface for ATS writers > > > Key: YARN-3031 > URL: https://issues.apache.org/jira/browse/YARN-3031 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C > Attachments: Sequence_diagram_write_interaction.2.png, > Sequence_diagram_write_interaction.png, YARN-3031.01.patch, > YARN-3031.02.patch, YARN-3031.03.patch > > > Per design in YARN-2928, come up with the interface for the ATS writer to > write to various backing storages. The interface should be created to capture > the right level of abstractions so that it will enable all backing storage > implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3270) node label expression not getting set in ApplicationSubmissionContext
[ https://issues.apache.org/jira/browse/YARN-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345176#comment-14345176 ] Hudson commented on YARN-3270: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #121 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/121/]) YARN-3270. Fix node label expression not getting set in ApplicationSubmissionContext (Rohit Agarwal via wangda) (wangda: rev abac6eb9d530bb1e6ff58ec3c75b17d840a0ee3f) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java > node label expression not getting set in ApplicationSubmissionContext > - > > Key: YARN-3270 > URL: https://issues.apache.org/jira/browse/YARN-3270 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rohit Agarwal >Assignee: Rohit Agarwal >Priority: Minor > Fix For: 2.7.0 > > Attachments: YARN-3270.patch > > > One of the {{newInstance}} methods in {{ApplicationSubmissionContext}} is not > setting the {{appLabelExpression}} passed to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3265) CapacityScheduler deadlock when computing absolute max avail capacity (fix for trunk/branch-2)
[ https://issues.apache.org/jira/browse/YARN-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345180#comment-14345180 ] Hudson commented on YARN-3265: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #121 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/121/]) YARN-3265. Fixed a deadlock in CapacityScheduler by always passing a queue's available resource-limit from the parent queue. Contributed by Wangda Tan. (vinodkv: rev 14dd647c556016d351f425ee956ccf800ccb9ce2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestResourceUsage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCSQueueUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java > CapacityScheduler deadlock when computing absolute max avail capacity (fix > for trunk/branch-2) > -- > > Key: YARN-3265 > URL: https://issues.apache.org/jira/browse/YARN-3265 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Fix For: 2.7.0 > > Attachments: YARN-3265.1.patch, YARN-3265.2.patch, YARN-3265.3.patch, > YARN-3265.5.patch, YARN-3265.6.patch, YARN-3265.7.patch > > > This patch is trying to solve the same problem described in YARN-3251, but > this is a longer term fix for trunk and branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3281) Add RMStateStore to StateMachine visualization list
[ https://issues.apache.org/jira/browse/YARN-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345182#comment-14345182 ] Hudson commented on YARN-3281: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #121 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/121/]) YARN-3281. Added RMStateStore to StateMachine visualization list. Contributed by Chengbing Liu (jianhe: rev 5d0bae550f5b9a6005aa1d373cfe1ec80513dbd9) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml > Add RMStateStore to StateMachine visualization list > --- > > Key: YARN-3281 > URL: https://issues.apache.org/jira/browse/YARN-3281 > Project: Hadoop YARN > Issue Type: Bug > Components: scripts >Affects Versions: 2.6.0 >Reporter: Chengbing Liu >Assignee: Chengbing Liu >Priority: Minor > Fix For: 2.7.0 > > Attachments: YARN-3281.01.patch > > > The command "mvn compile -Pvisualize" should generate graph representations > for all state machines in the project. We are still missing > {{org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore}} for > resourcemanager project. > Another class > {{org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl.StatefulContainer}} > also has a state machine. However this one is a protected inner class, hence > cannot be seen by class {{VisualizeStateMachine}}. Any suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-1963: -- Attachment: 0001-YARN-1963-prototype.patch Uploading a prototype version based on configuration file. > Support priorities across applications within the same queue > - > > Key: YARN-1963 > URL: https://issues.apache.org/jira/browse/YARN-1963 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, resourcemanager >Reporter: Arun C Murthy >Assignee: Sunil G > Attachments: 0001-YARN-1963-prototype.patch, YARN Application > Priorities Design.pdf, YARN Application Priorities Design_01.pdf > > > It will be very useful to support priorities among applications within the > same queue, particularly in production scenarios. It allows for finer-grained > controls without having to force admins to create a multitude of queues, plus > allows existing applications to continue using existing queues which are > usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3265) CapacityScheduler deadlock when computing absolute max avail capacity (fix for trunk/branch-2)
[ https://issues.apache.org/jira/browse/YARN-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345211#comment-14345211 ] Hudson commented on YARN-3265: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2071 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2071/]) YARN-3265. Fixed a deadlock in CapacityScheduler by always passing a queue's available resource-limit from the parent queue. Contributed by Wangda Tan. (vinodkv: rev 14dd647c556016d351f425ee956ccf800ccb9ce2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCSQueueUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestResourceUsage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceLimits.java > CapacityScheduler deadlock when computing absolute max avail capacity (fix > for trunk/branch-2) > -- > > Key: YARN-3265 > URL: https://issues.apache.org/jira/browse/YARN-3265 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Fix For: 2.7.0 > > Attachments: YARN-3265.1.patch, YARN-3265.2.patch, YARN-3265.3.patch, > YARN-3265.5.patch, YARN-3265.6.patch, YARN-3265.7.patch > > > This patch is trying to solve the same problem described in YARN-3251, but > this is a longer term fix for trunk and branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3281) Add RMStateStore to StateMachine visualization list
[ https://issues.apache.org/jira/browse/YARN-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345213#comment-14345213 ] Hudson commented on YARN-3281: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2071 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2071/]) YARN-3281. Added RMStateStore to StateMachine visualization list. Contributed by Chengbing Liu (jianhe: rev 5d0bae550f5b9a6005aa1d373cfe1ec80513dbd9) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml > Add RMStateStore to StateMachine visualization list > --- > > Key: YARN-3281 > URL: https://issues.apache.org/jira/browse/YARN-3281 > Project: Hadoop YARN > Issue Type: Bug > Components: scripts >Affects Versions: 2.6.0 >Reporter: Chengbing Liu >Assignee: Chengbing Liu >Priority: Minor > Fix For: 2.7.0 > > Attachments: YARN-3281.01.patch > > > The command "mvn compile -Pvisualize" should generate graph representations > for all state machines in the project. We are still missing > {{org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore}} for > resourcemanager project. > Another class > {{org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl.StatefulContainer}} > also has a state machine. However this one is a protected inner class, hence > cannot be seen by class {{VisualizeStateMachine}}. Any suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3270) node label expression not getting set in ApplicationSubmissionContext
[ https://issues.apache.org/jira/browse/YARN-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345207#comment-14345207 ] Hudson commented on YARN-3270: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2071 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2071/]) YARN-3270. Fix node label expression not getting set in ApplicationSubmissionContext (Rohit Agarwal via wangda) (wangda: rev abac6eb9d530bb1e6ff58ec3c75b17d840a0ee3f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java * hadoop-yarn-project/CHANGES.txt > node label expression not getting set in ApplicationSubmissionContext > - > > Key: YARN-3270 > URL: https://issues.apache.org/jira/browse/YARN-3270 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rohit Agarwal >Assignee: Rohit Agarwal >Priority: Minor > Fix For: 2.7.0 > > Attachments: YARN-3270.patch > > > One of the {{newInstance}} methods in {{ApplicationSubmissionContext}} is not > setting the {{appLabelExpression}} passed to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3275) CapacityScheduler: Preemption happening on non-preemptable queues
[ https://issues.apache.org/jira/browse/YARN-3275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345290#comment-14345290 ] Jason Lowe commented on YARN-3275: -- Thanks for the patch, Eric! bq. the expectation of our users is that if they are running a job on a non-preemptable queue, their containers should never be preempted. I completely agree with this. IMHO the whole point of the preemption disable feature is to guarantee a queue marked as such will never be preempted. It's as if the entire preemption feature was turned off from that queue's perspective. Looking at the patch, I'm a bit worried about this part: {code} + Resource absMaxCapIdealAssignedDelta = Resource.newInstance(0, 0); + if (Resources.greaterThanOrEqual( +rc, clusterResource, maxCapacity, idealAssigned)) { +absMaxCapIdealAssignedDelta = Resources.subtract(maxCapacity, idealAssigned); + } {code} If the intent of this calculation is to guarantee none of the components of absMaxCapIdealAssignedDelta are negative then I don't believe this accomplishes that goal. I believe it's possible for Resources.greaterThanOrEqual to return true yet subtracting the values will result in one of the components to be negative. For example, what if both resources are memory dominant, maxCapacity has more memory than idealAssigned, but the opposite is true for vcores? Subtracting idealAssigned from maxCapacity will result in a positive memory component but a negative vcore component. If we need to make sure neither component goes negative then I think we need to do a component-wise max with the zero resource rather than a comparision. Also one style nit: we normally don't do one-liner conditionals without braces, so I'd like to see the continue explicitly put in a block. It might be useful to put a debug log statement with the continue to note that we wanted to preempt this queue for some reason (and by how much) but it was marked with preemption disabled. > CapacityScheduler: Preemption happening on non-preemptable queues > - > > Key: YARN-3275 > URL: https://issues.apache.org/jira/browse/YARN-3275 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Eric Payne >Assignee: Eric Payne > Labels: capacity-scheduler > Attachments: YARN-3275.v1.txt > > > YARN-2056 introduced the ability to turn preemption on and off at the queue > level. In cases where a queue goes over its absolute max capacity (YARN-3243, > for example), containers can be preempted from that queue, even though the > queue is marked as non-preemptable. > We are using this feature in large, busy clusters and seeing this behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers
[ https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345321#comment-14345321 ] Vrushali C commented on YARN-3031: -- Yes, we've decided to have only a write and aggregate api in the writer interface. The addEvent and updateMetrics is not needed, we can use the write api to do this. Also, I have the distributed shell test case working end to end for timeline v2. I have some feedback from a chat with Zhijie on that. I am updating the code and I should be posting a patch for YARN-3167, YARN-3031 and YARN-3264 today. > [Storage abstraction] Create backing storage write interface for ATS writers > > > Key: YARN-3031 > URL: https://issues.apache.org/jira/browse/YARN-3031 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C > Attachments: Sequence_diagram_write_interaction.2.png, > Sequence_diagram_write_interaction.png, YARN-3031.01.patch, > YARN-3031.02.patch, YARN-3031.03.patch > > > Per design in YARN-2928, come up with the interface for the ATS writer to > write to various backing storages. The interface should be created to capture > the right level of abstractions so that it will enable all backing storage > implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers
[ https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345345#comment-14345345 ] Junping Du commented on YARN-3031: -- Hi [~vrushalic], Awesome! I will help to review this patch here when patch is ready. Thanks for updating. > [Storage abstraction] Create backing storage write interface for ATS writers > > > Key: YARN-3031 > URL: https://issues.apache.org/jira/browse/YARN-3031 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C > Attachments: Sequence_diagram_write_interaction.2.png, > Sequence_diagram_write_interaction.png, YARN-3031.01.patch, > YARN-3031.02.patch, YARN-3031.03.patch > > > Per design in YARN-2928, come up with the interface for the ATS writer to > write to various backing storages. The interface should be created to capture > the right level of abstractions so that it will enable all backing storage > implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records
[ https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-3267: --- Attachment: YARN_3267_WIP2.patch > Timelineserver applies the ACL rules after applying the limit on the number > of records > -- > > Key: YARN-3267 > URL: https://issues.apache.org/jira/browse/YARN-3267 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Prakash Ramachandran >Assignee: Chang Li > Attachments: YARN_3267_WIP.patch, YARN_3267_WIP1.patch, > YARN_3267_WIP2.patch > > > While fetching the entities from timelineserver, the limit is applied on the > entities to be fetched from leveldb, the ACL filters are applied after this > (TimelineDataManager.java::getEntities). > this could mean that even if there are entities available which match the > query criteria, we could end up not getting any results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records
[ https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345366#comment-14345366 ] Prakash Ramachandran commented on YARN-3267: [~lichangleo] I am not very familiar with the yarn code, I can test the patch though. > Timelineserver applies the ACL rules after applying the limit on the number > of records > -- > > Key: YARN-3267 > URL: https://issues.apache.org/jira/browse/YARN-3267 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Prakash Ramachandran >Assignee: Chang Li > Attachments: YARN_3267_WIP.patch, YARN_3267_WIP1.patch, > YARN_3267_WIP2.patch > > > While fetching the entities from timelineserver, the limit is applied on the > entities to be fetched from leveldb, the ACL filters are applied after this > (TimelineDataManager.java::getEntities). > this could mean that even if there are entities available which match the > query criteria, we could end up not getting any results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records
[ https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345370#comment-14345370 ] Chang Li commented on YARN-3267: Hi [~pramachandran], could you please help test this patch? I develop it against branch-2. I couldn't reproduce this scenario on my single node machine. But I am trying to write a unit test to test it. But it will be great if you could help test it in real scenario. Thanks a lot. > Timelineserver applies the ACL rules after applying the limit on the number > of records > -- > > Key: YARN-3267 > URL: https://issues.apache.org/jira/browse/YARN-3267 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Prakash Ramachandran >Assignee: Chang Li > Attachments: YARN_3267_WIP.patch, YARN_3267_WIP1.patch, > YARN_3267_WIP2.patch > > > While fetching the entities from timelineserver, the limit is applied on the > entities to be fetched from leveldb, the ACL filters are applied after this > (TimelineDataManager.java::getEntities). > this could mean that even if there are entities available which match the > query criteria, we could end up not getting any results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2981) DockerContainerExecutor must support a Cluster-wide default Docker image
[ https://issues.apache.org/jira/browse/YARN-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345406#comment-14345406 ] Ravi Prakash commented on YARN-2981: Hi Abin! The patch doesn't apply because documentation has been converted from apt to markdown. Could you please update it? Could you please limit lines to 80 chars? Could you please also split out the functionality you are proposing to limit cpu shares and memory into another JIRA? And also for the user the container is run as. > DockerContainerExecutor must support a Cluster-wide default Docker image > > > Key: YARN-2981 > URL: https://issues.apache.org/jira/browse/YARN-2981 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Abin Shahab >Assignee: Abin Shahab > Attachments: YARN-2981.patch > > > This allows the yarn administrator to add a cluster-wide default docker image > that will be used when there are no per-job override of docker images. With > this features, it would be convenient for newer applications like slider to > launch inside a cluster-default docker container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3272) Surface container locality info
[ https://issues.apache.org/jira/browse/YARN-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345424#comment-14345424 ] Wangda Tan commented on YARN-3272: -- LGTM, +1 will commit today if no opposite opinions. > Surface container locality info > > > Key: YARN-3272 > URL: https://issues.apache.org/jira/browse/YARN-3272 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-3272.1.patch, YARN-3272.2.patch, YARN-3272.3.patch, > YARN-3272.4.patch, YARN-3272.5.patch, YARN-3272.5.patch, YARN-3272.6.patch, > YARN-3272.6.patch, container locality table.png > > > We can surface the container locality info on the web UI. This is useful to > debug "why my applications are progressing slow", especially when locality is > bad. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.
Jonathan Eagles created YARN-3287: - Summary: TimelineClient kerberos authentication failure uses wrong login context. Key: YARN-3287 URL: https://issues.apache.org/jira/browse/YARN-3287 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Daryn Sharp TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause failure for yarn clients to create timeline domains during job submission. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3288) Document and fix indentation in the DockerContainerExecutor code
Ravi Prakash created YARN-3288: -- Summary: Document and fix indentation in the DockerContainerExecutor code Key: YARN-3288 URL: https://issues.apache.org/jira/browse/YARN-3288 Project: Hadoop YARN Issue Type: Sub-task Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Trivial The DockerContainerExecutor has several lines over 80 chars and could use some more documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345513#comment-14345513 ] Jian He commented on YARN-3222: --- thanks Rohith ! I think the condition check you added earlier about sending NodeResourceUpdate event only if the node resource is different is useful, that saves some traffic. would you mind adding that too ? {code} if (rmNode.getState().equals(NodeState.RUNNING)) { // Update scheduler node's capacity for reconnect node. rmNode.context .getDispatcher() .getEventHandler() .handle( new NodeResourceUpdateSchedulerEvent(rmNode, ResourceOption .newInstance(newNode.getTotalCapability(), -1))); } {code} > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2190) Provide a Windows container executor that can limit memory and CPU
[ https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chuan Liu updated YARN-2190: Attachment: YARN-2190.9.patch Attach a new patch that adds the following two new options for the new Windows container executor to control if memory and CPU limit should be set on the backing job object. By default, memory limit is enabled; CPU limit is disabled. {noformat} yarn.nodemanager.windows-container-executor.memory-limit.enabled yarn.nodemanager.windows-container-executor.cpu-limit.enabled {noformat} > Provide a Windows container executor that can limit memory and CPU > -- > > Key: YARN-2190 > URL: https://issues.apache.org/jira/browse/YARN-2190 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Chuan Liu >Assignee: Chuan Liu > Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, > YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, YARN-2190.5.patch, > YARN-2190.6.patch, YARN-2190.7.patch, YARN-2190.8.patch, YARN-2190.9.patch > > > Yarn default container executor on Windows does not set the resource limit on > the containers currently. The memory limit is enforced by a separate > monitoring thread. The container implementation on Windows uses Job Object > right now. The latest Windows (8 or later) API allows CPU and memory limits > on the job objects. We want to create a Windows container executor that sets > the limits on job objects thus provides resource enforcement at OS level. > http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3289) Docker images should be downloaded during localization
Ravi Prakash created YARN-3289: -- Summary: Docker images should be downloaded during localization Key: YARN-3289 URL: https://issues.apache.org/jira/browse/YARN-3289 Project: Hadoop YARN Issue Type: Sub-task Reporter: Ravi Prakash We currently call docker run on images while launching containers. If the image size if sufficiently big, the task will timeout. We should download the image we want to run during localization (if possible) to prevent this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3168) Convert site documentation from apt to markdown
[ https://issues.apache.org/jira/browse/YARN-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345577#comment-14345577 ] Jian He commented on YARN-3168: --- thanks [~iwasakims] ! I'll do the review and commit. > Convert site documentation from apt to markdown > --- > > Key: YARN-3168 > URL: https://issues.apache.org/jira/browse/YARN-3168 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer >Assignee: Gururaj Shetty > Fix For: 3.0.0 > > Attachments: YARN-3168-00.patch, YARN-3168.20150224.1.patch, > YARN-3168.20150225.2.patch, YARN-3168.20150227.3.patch > > > YARN analog to HADOOP-11495 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3289) Docker images should be downloaded during localization
[ https://issues.apache.org/jira/browse/YARN-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345578#comment-14345578 ] Jason Lowe commented on YARN-3289: -- There is no application-level (e.g.: MapReduce) task heartbeat during localization because the application code isn't running yet. Downloading a large docker image during localization will still timeout, since the task can't heartbeat back to the AM to say it's making progress. > Docker images should be downloaded during localization > -- > > Key: YARN-3289 > URL: https://issues.apache.org/jira/browse/YARN-3289 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Ravi Prakash > > We currently call docker run on images while launching containers. If the > image size if sufficiently big, the task will timeout. We should download the > image we want to run during localization (if possible) to prevent this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3210) [Source organization] Refactor timeline aggregator according to new code organization
[ https://issues.apache.org/jira/browse/YARN-3210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-3210. --- Resolution: Fixed Fix Version/s: YARN-2928 Hadoop Flags: Reviewed Committed the patch to branch YARN-2928. Thanks for the patch, Li! And thanks for review, Vinod! > [Source organization] Refactor timeline aggregator according to new code > organization > - > > Key: YARN-3210 > URL: https://issues.apache.org/jira/browse/YARN-3210 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Labels: refactor > Fix For: YARN-2928 > > Attachments: YARN-3210-022715.patch, YARN-3210-030215.patch, > YARN-3210-030215_1.patch, YARN-3210-030215_2.patch > > > We may want to refactor the code of timeline aggregator according to the > discussion of YARN-3166, the code organization for timeline service v2. We > need to refactor the code after we reach an agreement on the aggregator part > of YARN-3166. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3285) Convert branch-2 .apt.vm files of YARN to markdown
[ https://issues.apache.org/jira/browse/YARN-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345607#comment-14345607 ] Jian He commented on YARN-3285: --- some comments which may be applicable to trunk too, we may have a separate jira to fix this. - In ResourceManagerRestart page - Inside the Notes, the *e{epoch}* / *e17*, was highlighted before but not now. - yarn container command {code} list ApplicationId (should be Application Attempt ID ?) Lists containers for the application attempt. {code} - yarn application attempt command {code} list ApplicationId Lists applications attempts from the RM (should be Lists applications attempts for the given application) {code} I'll commit this into branch-2 later toady. thanks [~iwasakims] ! > Convert branch-2 .apt.vm files of YARN to markdown > -- > > Key: YARN-3285 > URL: https://issues.apache.org/jira/browse/YARN-3285 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki > Attachments: YARN-3285.001.patch > > > Backport the conversion to markdown done in YARN-3168. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3272) Surface container locality info in RM web UI
[ https://issues.apache.org/jira/browse/YARN-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3272: - Summary: Surface container locality info in RM web UI (was: Surface container locality info ) > Surface container locality info in RM web UI > > > Key: YARN-3272 > URL: https://issues.apache.org/jira/browse/YARN-3272 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-3272.1.patch, YARN-3272.2.patch, YARN-3272.3.patch, > YARN-3272.4.patch, YARN-3272.5.patch, YARN-3272.5.patch, YARN-3272.6.patch, > YARN-3272.6.patch, container locality table.png > > > We can surface the container locality info on the web UI. This is useful to > debug "why my applications are progressing slow", especially when locality is > bad. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records
[ https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345638#comment-14345638 ] Hadoop QA commented on YARN-3267: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702200/YARN_3267_WIP2.patch against trunk revision 4228de9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice: org.apache.hadoop.mapred.TestReduceFetch org.apache.hadoop.yarn.server.timeline.TestTimelineDataManager org.apache.hadoop.yarn.server.timeline.webapp.TestTimelineWebServices The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice: org.apache.hadoop.mapred.TestMRIntermediateDataEncryption Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6821//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6821//console This message is automatically generated. > Timelineserver applies the ACL rules after applying the limit on the number > of records > -- > > Key: YARN-3267 > URL: https://issues.apache.org/jira/browse/YARN-3267 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Prakash Ramachandran >Assignee: Chang Li > Attachments: YARN_3267_WIP.patch, YARN_3267_WIP1.patch, > YARN_3267_WIP2.patch > > > While fetching the entities from timelineserver, the limit is applied on the > entities to be fetched from leveldb, the ACL filters are applied after this > (TimelineDataManager.java::getEntities). > this could mean that even if there are entities available which match the > query criteria, we could end up not getting any results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode
[ https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345649#comment-14345649 ] Varun Saxena commented on YARN-2962: [~kasha] / [~jianhe] / [~ozawa], kindly review > ZKRMStateStore: Limit the number of znodes under a znode > > > Key: YARN-2962 > URL: https://issues.apache.org/jira/browse/YARN-2962 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Varun Saxena >Priority: Critical > Attachments: YARN-2962.01.patch > > > We ran into this issue where we were hitting the default ZK server message > size configs, primarily because the message had too many znodes even though > they individually they were all small. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3242) Old ZK client session watcher event causes ZKRMStateStore out of sync with current ZK client session due to ZooKeeper asynchronously closing client session.
[ https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345658#comment-14345658 ] zhihai xu commented on YARN-3242: - [~adhoot], thanks for the review, both suggestions are good to me. I uploaded a new patch YARN-3242.004.patch, which addressed both comments. Please review it. thanks zhihai > Old ZK client session watcher event causes ZKRMStateStore out of sync with > current ZK client session due to ZooKeeper asynchronously closing client > session. > > > Key: YARN-3242 > URL: https://issues.apache.org/jira/browse/YARN-3242 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3242.000.patch, YARN-3242.001.patch, > YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch > > > Old ZK client session watcher event messed up new ZK client session due to > ZooKeeper asynchronously closing client session. > The watcher event from old ZK client session can still be sent to > ZKRMStateStore after the old ZK client session is closed. > This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper > session. > We only have one ZKRMStateStore but we can have multiple ZK client sessions. > Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher > event is from current session. So the watcher event from old ZK client > session which just is closed will still be processed. > For example, If a Disconnected event received from old session after new > session is connected, the zkClient will be set to null > {code} > case Disconnected: > LOG.info("ZKRMStateStore Session disconnected"); > oldZkClient = zkClient; > zkClient = null; > break; > {code} > Then ZKRMStateStore won't receive SyncConnected event from new session > because new session is already in SyncConnected state and it won't send > SyncConnected event until it is disconnected and connected again. > Then we will see all the ZKRMStateStore operations fail with IOException > "Wait for ZKClient creation timed out" until RM shutdown. > The following code from zookeeper(ClientCnxn#EventThread) show even after > receive eventOfDeath, EventThread will still process all the events until > waitingEvents queue is empty. > {code} > while (true) { > Object event = waitingEvents.take(); > if (event == eventOfDeath) { > wasKilled = true; > } else { > processEvent(event); > } > if (wasKilled) > synchronized (waitingEvents) { >if (waitingEvents.isEmpty()) { > isRunning = false; > break; >} > } > } > private void processEvent(Object event) { > try { > if (event instanceof WatcherSetEventPair) { > // each watcher will process the event > WatcherSetEventPair pair = (WatcherSetEventPair) event; > for (Watcher watcher : pair.watchers) { > try { > watcher.process(pair.event); > } catch (Throwable t) { > LOG.error("Error while calling watcher ", t); > } > } > } else { > public void disconnect() { > if (LOG.isDebugEnabled()) { > LOG.debug("Disconnecting client for session: 0x" > + Long.toHexString(getSessionId())); > } > sendThread.close(); > eventThread.queueEventOfDeath(); > } > public void close() throws IOException { > if (LOG.isDebugEnabled()) { > LOG.debug("Closing client for session: 0x" > + Long.toHexString(getSessionId())); > } > try { > RequestHeader h = new RequestHeader(); > h.setType(ZooDefs.OpCode.closeSession); > submitRequest(h, null, null, null); > } catch (InterruptedException e) { > // ignore, close the send/event threads > } finally { > disconnect(); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3242) Old ZK client session watcher event causes ZKRMStateStore out of sync with current ZK client session due to ZooKeeper asynchronously closing client session.
[ https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3242: Attachment: YARN-3242.004.patch > Old ZK client session watcher event causes ZKRMStateStore out of sync with > current ZK client session due to ZooKeeper asynchronously closing client > session. > > > Key: YARN-3242 > URL: https://issues.apache.org/jira/browse/YARN-3242 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3242.000.patch, YARN-3242.001.patch, > YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch > > > Old ZK client session watcher event messed up new ZK client session due to > ZooKeeper asynchronously closing client session. > The watcher event from old ZK client session can still be sent to > ZKRMStateStore after the old ZK client session is closed. > This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper > session. > We only have one ZKRMStateStore but we can have multiple ZK client sessions. > Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher > event is from current session. So the watcher event from old ZK client > session which just is closed will still be processed. > For example, If a Disconnected event received from old session after new > session is connected, the zkClient will be set to null > {code} > case Disconnected: > LOG.info("ZKRMStateStore Session disconnected"); > oldZkClient = zkClient; > zkClient = null; > break; > {code} > Then ZKRMStateStore won't receive SyncConnected event from new session > because new session is already in SyncConnected state and it won't send > SyncConnected event until it is disconnected and connected again. > Then we will see all the ZKRMStateStore operations fail with IOException > "Wait for ZKClient creation timed out" until RM shutdown. > The following code from zookeeper(ClientCnxn#EventThread) show even after > receive eventOfDeath, EventThread will still process all the events until > waitingEvents queue is empty. > {code} > while (true) { > Object event = waitingEvents.take(); > if (event == eventOfDeath) { > wasKilled = true; > } else { > processEvent(event); > } > if (wasKilled) > synchronized (waitingEvents) { >if (waitingEvents.isEmpty()) { > isRunning = false; > break; >} > } > } > private void processEvent(Object event) { > try { > if (event instanceof WatcherSetEventPair) { > // each watcher will process the event > WatcherSetEventPair pair = (WatcherSetEventPair) event; > for (Watcher watcher : pair.watchers) { > try { > watcher.process(pair.event); > } catch (Throwable t) { > LOG.error("Error while calling watcher ", t); > } > } > } else { > public void disconnect() { > if (LOG.isDebugEnabled()) { > LOG.debug("Disconnecting client for session: 0x" > + Long.toHexString(getSessionId())); > } > sendThread.close(); > eventThread.queueEventOfDeath(); > } > public void close() throws IOException { > if (LOG.isDebugEnabled()) { > LOG.debug("Closing client for session: 0x" > + Long.toHexString(getSessionId())); > } > try { > RequestHeader h = new RequestHeader(); > h.setType(ZooDefs.OpCode.closeSession); > submitRequest(h, null, null, null); > } catch (InterruptedException e) { > // ignore, close the send/event threads > } finally { > disconnect(); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2981) DockerContainerExecutor must support a Cluster-wide default Docker image
[ https://issues.apache.org/jira/browse/YARN-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated YARN-2981: -- Attachment: YARN-2981.patch Removed as [~raviprak] suggested. > DockerContainerExecutor must support a Cluster-wide default Docker image > > > Key: YARN-2981 > URL: https://issues.apache.org/jira/browse/YARN-2981 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Abin Shahab >Assignee: Abin Shahab > Attachments: YARN-2981.patch, YARN-2981.patch > > > This allows the yarn administrator to add a cluster-wide default docker image > that will be used when there are no per-job override of docker images. With > this features, it would be convenient for newer applications like slider to > launch inside a cluster-default docker container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2981) DockerContainerExecutor must support a Cluster-wide default Docker image
[ https://issues.apache.org/jira/browse/YARN-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated YARN-2981: -- Attachment: YARN-2981.patch Fixed docs. > DockerContainerExecutor must support a Cluster-wide default Docker image > > > Key: YARN-2981 > URL: https://issues.apache.org/jira/browse/YARN-2981 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Abin Shahab >Assignee: Abin Shahab > Attachments: YARN-2981.patch, YARN-2981.patch, YARN-2981.patch > > > This allows the yarn administrator to add a cluster-wide default docker image > that will be used when there are no per-job override of docker images. With > this features, it would be convenient for newer applications like slider to > launch inside a cluster-default docker container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3272) Surface container locality info in RM web UI
[ https://issues.apache.org/jira/browse/YARN-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345737#comment-14345737 ] Hudson commented on YARN-3272: -- FAILURE: Integrated in Hadoop-trunk-Commit #7246 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7246/]) YARN-3272. Surface container locality info in RM web UI (Jian He via wangda) (wangda: rev e17e5ba9d7e2bd45ba6884f59f8045817594b284) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/NodeType.java * hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * hadoop-yarn-project/CHANGES.txt > Surface container locality info in RM web UI > > > Key: YARN-3272 > URL: https://issues.apache.org/jira/browse/YARN-3272 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Jian He > Fix For: 2.7.0 > > Attachments: YARN-3272.1.patch, YARN-3272.2.patch, YARN-3272.3.patch, > YARN-3272.4.patch, YARN-3272.5.patch, YARN-3272.5.patch, YARN-3272.6.patch, > YARN-3272.6.patch, container locality table.png > > > We can surface the container locality info on the web UI. This is useful to > debug "why my applications are progressing slow", especially when locality is > bad. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2190) Provide a Windows container executor that can limit memory and CPU
[ https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345734#comment-14345734 ] Hadoop QA commented on YARN-2190: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702229/YARN-2190.9.patch against trunk revision e17e5ba. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6823//console This message is automatically generated. > Provide a Windows container executor that can limit memory and CPU > -- > > Key: YARN-2190 > URL: https://issues.apache.org/jira/browse/YARN-2190 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Chuan Liu >Assignee: Chuan Liu > Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, > YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, YARN-2190.5.patch, > YARN-2190.6.patch, YARN-2190.7.patch, YARN-2190.8.patch, YARN-2190.9.patch > > > Yarn default container executor on Windows does not set the resource limit on > the containers currently. The memory limit is enforced by a separate > monitoring thread. The container implementation on Windows uses Job Object > right now. The latest Windows (8 or later) API allows CPU and memory limits > on the job objects. We want to create a Windows container executor that sets > the limits on job objects thus provides resource enforcement at OS level. > http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345747#comment-14345747 ] zhihai xu commented on YARN-2893: - I find there is another possibility which can also cause this exception for none-secure one: the JobClient corrupted the tokens buffer. The RM code only check the tokens buffer in RMAppManager#submitApplication for secure one. {code} if (UserGroupInformation.isSecurityEnabled()) { try { this.rmContext.getDelegationTokenRenewer().addApplicationAsync(appId, parseCredentials(submissionContext), submissionContext.getCancelTokensWhenComplete(), application.getUser()); } catch (Exception e) { LOG.warn("Unable to parse credentials.", e); // Sending APP_REJECTED is fine, since we assume that the // RMApp is in NEW state and thus we haven't yet informed the // scheduler about the existence of the application assert application.getState() == RMAppState.NEW; this.rmContext.getDispatcher().getEventHandler() .handle(new RMAppRejectedEvent(applicationId, e.getMessage())); throw RPCUtil.getRemoteException(e); } protected Credentials parseCredentials( ApplicationSubmissionContext application) throws IOException { Credentials credentials = new Credentials(); DataInputByteBuffer dibb = new DataInputByteBuffer(); ByteBuffer tokens = application.getAMContainerSpec().getTokens(); if (tokens != null) { dibb.reset(tokens); credentials.readTokenStorageStream(dibb); tokens.rewind(); } return credentials; } {code} I think we should do the same for none-secure one, so we can fail the application earlier to avoid confusion. Also I find out a cascading patch to fix the credentials corruption at the jobClient. https://github.com/Cascading/cascading/commit/45b33bb864172486ac43782a4d13329312d01c0e I will update the patch to check the tokens buffer for for none-secure one in RMAppManager#submitApplication. > AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream > -- > > Key: YARN-2893 > URL: https://issues.apache.org/jira/browse/YARN-2893 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Gera Shegalov >Assignee: zhihai xu > Attachments: YARN-2893.000.patch > > > MapReduce jobs on our clusters experience sporadic failures due to corrupt > tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345755#comment-14345755 ] Jian He commented on YARN-3021: --- bq. Overall I think "automatic token renewal" has always been an "auxiliary service" provided by YARN's RM. I think this raised a point that the DelegationTokenRenewal is just an auxiliary service, not a fundamental service required by YARN. RM today happens to be the renewer, in the long term solution, we can point the renewer to a real centralized renewal service to support such cross-platform trust setup. Instead of explicitly adding a user-facing API and deprecate the API in the future, we may choose to add a server-side config to not let application fail if renewal fails. thoughts ? > YARN's delegation-token handling disallows certain trust setups to operate > properly over DistCp > --- > > Key: YARN-3021 > URL: https://issues.apache.org/jira/browse/YARN-3021 > Project: Hadoop YARN > Issue Type: Bug > Components: security >Affects Versions: 2.3.0 >Reporter: Harsh J > Attachments: YARN-3021.001.patch, YARN-3021.002.patch, > YARN-3021.003.patch, YARN-3021.patch > > > Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, > and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN > clusters. > Now if one logs in with a COMMON credential, and runs a job on A's YARN that > needs to access B's HDFS (such as a DistCp), the operation fails in the RM, > as it attempts a renewDelegationToken(…) synchronously during application > submission (to validate the managed token before it adds it to a scheduler > for automatic renewal). The call obviously fails cause B realm will not trust > A's credentials (here, the RM's principal is the renewer). > In the 1.x JobTracker the same call is present, but it is done asynchronously > and once the renewal attempt failed we simply ceased to schedule any further > attempts of renewals, rather than fail the job immediately. > We should change the logic such that we attempt the renewal but go easy on > the failure and skip the scheduling alone, rather than bubble back an error > to the client, failing the app submission. This way the old behaviour is > retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users
[ https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345774#comment-14345774 ] Karthik Kambatla commented on YARN-2423: I propose we get this in. I understand the compatibility concern. My understanding is we would like to support the current APIs in TimelineClient with the new implementation as well. We can handle the new APIs added here along with them. To be on the safer side, we could annotate these methods as evolving and graduate them to stable if we continue to support them with the new implementation. Coming to the patch itself, I feel getEntity is a special case of getEntities. To limit the number of new APIs being added, can we get rid of it? [~rkanter], [~vinodkv] - thoughts? > TimelineClient should wrap all GET APIs to facilitate Java users > > > Key: YARN-2423 > URL: https://issues.apache.org/jira/browse/YARN-2423 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Robert Kanter > Attachments: YARN-2423.004.patch, YARN-2423.005.patch, > YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, > YARN-2423.patch > > > TimelineClient provides the Java method to put timeline entities. It's also > good to wrap over all GET APIs (both entity and domain), and deserialize the > json response into Java POJO objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2380) The normalizeRequests method in SchedulerUtils always resets the vCore to 1
[ https://issues.apache.org/jira/browse/YARN-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345776#comment-14345776 ] Wangda Tan commented on YARN-2380: -- [~kj-ki], Thought about it, similar problems existed in other methods like roundUp/divideAndCeil, etc. probably we should make their behavior consistent. My proposal is, if we keep vcore when normalize, we need do math on vcore when call roundUp/divideAndCeil, etc. We will only ignore vcore when doing compare operations. Make sense? > The normalizeRequests method in SchedulerUtils always resets the vCore to 1 > --- > > Key: YARN-2380 > URL: https://issues.apache.org/jira/browse/YARN-2380 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Jian Fang >Priority: Critical > Attachments: YARN-2380.patch > > > I added some log info to the method normalizeRequest() as follows. > public static void normalizeRequest( > ResourceRequest ask, > ResourceCalculator resourceCalculator, > Resource clusterResource, > Resource minimumResource, > Resource maximumResource, > Resource incrementResource) { > LOG.info("Before request normalization, the ask capacity: " + > ask.getCapability()); > Resource normalized = > Resources.normalize( > resourceCalculator, ask.getCapability(), minimumResource, > maximumResource, incrementResource); > LOG.info("After request normalization, the ask capacity: " + normalized); > ask.setCapability(normalized); > } > The resulted log showed that the vcore in ask was changed from 2 to 1. > 2014-08-01 20:54:15,537 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils (IPC > Server handler 4 on 9024): Before request normalization, the ask capacity: > > 2014-08-01 20:54:15,537 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils (IPC > Server handler 4 on 9024): After request normalization, the ask capacity: > > The root cause is the DefaultResourceCalculator calls > Resources.createResource(normalizedMemory) to regenerate a new resource with > vcore = 1. > This bug is critical and it leads to the mismatch of the request resource and > the container resource and many other potential issues if the user requests > containers with vcore > 1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3290) DockerContainerExecutor should optionally limit memory and cpu
Abin Shahab created YARN-3290: - Summary: DockerContainerExecutor should optionally limit memory and cpu Key: YARN-3290 URL: https://issues.apache.org/jira/browse/YARN-3290 Project: Hadoop YARN Issue Type: Improvement Reporter: Abin Shahab Currently, DockerContainerExecutor does not set cgroup limits on memory and cpu. It should follow LCE's example to set cgroup limits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-796: Attachment: Non-exclusive-Node-Partition-Design.pdf Attached same design doc for YARN-3214 (Non-exclusive node label) to umbrella ticket. > Allow for (admin) labels on nodes and resource-requests > --- > > Key: YARN-796 > URL: https://issues.apache.org/jira/browse/YARN-796 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.1 >Reporter: Arun C Murthy >Assignee: Wangda Tan > Attachments: LabelBasedScheduling.pdf, > Node-labels-Requirements-Design-doc-V1.pdf, > Node-labels-Requirements-Design-doc-V2.pdf, > Non-exclusive-Node-Partition-Design.pdf, YARN-796-Diagram.pdf, > YARN-796.node-label.consolidate.1.patch, > YARN-796.node-label.consolidate.10.patch, > YARN-796.node-label.consolidate.11.patch, > YARN-796.node-label.consolidate.12.patch, > YARN-796.node-label.consolidate.13.patch, > YARN-796.node-label.consolidate.14.patch, > YARN-796.node-label.consolidate.2.patch, > YARN-796.node-label.consolidate.3.patch, > YARN-796.node-label.consolidate.4.patch, > YARN-796.node-label.consolidate.5.patch, > YARN-796.node-label.consolidate.6.patch, > YARN-796.node-label.consolidate.7.patch, > YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, > YARN-796.patch, YARN-796.patch4 > > > It will be useful for admins to specify labels for nodes. Examples of labels > are OS, processor architecture etc. > We should expose these labels and allow applications to specify labels on > resource-requests. > Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2981) DockerContainerExecutor must support a Cluster-wide default Docker image
[ https://issues.apache.org/jira/browse/YARN-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345798#comment-14345798 ] Hadoop QA commented on YARN-2981: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702254/YARN-2981.patch against trunk revision e17e5ba. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.TestDockerContainerExecutorWithMocks Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6822//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6822//console This message is automatically generated. > DockerContainerExecutor must support a Cluster-wide default Docker image > > > Key: YARN-2981 > URL: https://issues.apache.org/jira/browse/YARN-2981 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Abin Shahab >Assignee: Abin Shahab > Attachments: YARN-2981.patch, YARN-2981.patch, YARN-2981.patch > > > This allows the yarn administrator to add a cluster-wide default docker image > that will be used when there are no per-job override of docker images. With > this features, it would be convenient for newer applications like slider to > launch inside a cluster-default docker container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3291) DockerContainerExecutor should run as a non-root user
Abin Shahab created YARN-3291: - Summary: DockerContainerExecutor should run as a non-root user Key: YARN-3291 URL: https://issues.apache.org/jira/browse/YARN-3291 Project: Hadoop YARN Issue Type: Improvement Reporter: Abin Shahab Currently DockerContainerExecutor runs container as root. This can be run as the user which is not root. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2190) Provide a Windows container executor that can limit memory and CPU
[ https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chuan Liu updated YARN-2190: Attachment: YARN-2190.10.patch bq. -1 patch. The patch command could not apply the patch. Not sure why it failed to apply. Attach a new patch generated on Linux. > Provide a Windows container executor that can limit memory and CPU > -- > > Key: YARN-2190 > URL: https://issues.apache.org/jira/browse/YARN-2190 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Chuan Liu >Assignee: Chuan Liu > Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, > YARN-2190.10.patch, YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, > YARN-2190.5.patch, YARN-2190.6.patch, YARN-2190.7.patch, YARN-2190.8.patch, > YARN-2190.9.patch > > > Yarn default container executor on Windows does not set the resource limit on > the containers currently. The memory limit is enforced by a separate > monitoring thread. The container implementation on Windows uses Job Object > right now. The latest Windows (8 or later) API allows CPU and memory limits > on the job objects. We want to create a Windows container executor that sets > the limits on job objects thus provides resource enforcement at OS level. > http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users
[ https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345828#comment-14345828 ] Zhijie Shen commented on YARN-2423: --- It seems that Spark wants the stable APIs. But we know, based on the new data model, APIs will be changed accordingly. One step back, even the APIs are not changed, we usually be a bit conservative to mark them \@Unstable in the release where they're pushed out. bq. Coming to the patch itself, I feel getEntity is a special case of getEntities. To limit the number of new APIs being added, can we get rid of it? They're wrapping different REST APIs. One get and one search, though we can narrow down the result set to one entity. If we want to move on with it, I suggest keeping both. > TimelineClient should wrap all GET APIs to facilitate Java users > > > Key: YARN-2423 > URL: https://issues.apache.org/jira/browse/YARN-2423 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Robert Kanter > Attachments: YARN-2423.004.patch, YARN-2423.005.patch, > YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, > YARN-2423.patch > > > TimelineClient provides the Java method to put timeline entities. It's also > good to wrap over all GET APIs (both entity and domain), and deserialize the > json response into Java POJO objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2981) DockerContainerExecutor must support a Cluster-wide default Docker image
[ https://issues.apache.org/jira/browse/YARN-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated YARN-2981: -- Attachment: YARN-2981.patch Another fix > DockerContainerExecutor must support a Cluster-wide default Docker image > > > Key: YARN-2981 > URL: https://issues.apache.org/jira/browse/YARN-2981 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Abin Shahab >Assignee: Abin Shahab > Attachments: YARN-2981.patch, YARN-2981.patch, YARN-2981.patch, > YARN-2981.patch > > > This allows the yarn administrator to add a cluster-wide default docker image > that will be used when there are no per-job override of docker images. With > this features, it would be convenient for newer applications like slider to > launch inside a cluster-default docker container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3285) Convert branch-2 .apt.vm files of YARN to markdown
[ https://issues.apache.org/jira/browse/YARN-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345821#comment-14345821 ] Masatake Iwasaki commented on YARN-3285: Thanks, [~jianhe]. I agree to keep focus only on conversion here and address your comments in follow-ups. > Convert branch-2 .apt.vm files of YARN to markdown > -- > > Key: YARN-3285 > URL: https://issues.apache.org/jira/browse/YARN-3285 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki > Attachments: YARN-3285.001.patch > > > Backport the conversion to markdown done in YARN-3168. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users
[ https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345836#comment-14345836 ] Hitesh Shah commented on YARN-2423: --- [~kasha] [~vinodkv] Will this api be backported all the way to 2.4 and hadoop maintenance releases for each line? Or will it only be available from 2.7.0 onwards? If the latter, is there really need to publish an obsolete api for just one release assuming that Timeline v2 will be ready in time for 2.8? > TimelineClient should wrap all GET APIs to facilitate Java users > > > Key: YARN-2423 > URL: https://issues.apache.org/jira/browse/YARN-2423 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Robert Kanter > Attachments: YARN-2423.004.patch, YARN-2423.005.patch, > YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, > YARN-2423.patch > > > TimelineClient provides the Java method to put timeline entities. It's also > good to wrap over all GET APIs (both entity and domain), and deserialize the > json response into Java POJO objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3242) Old ZK client session watcher event causes ZKRMStateStore out of sync with current ZK client session due to ZooKeeper asynchronously closing client session.
[ https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345852#comment-14345852 ] Anubhav Dhoot commented on YARN-3242: - LGTM > Old ZK client session watcher event causes ZKRMStateStore out of sync with > current ZK client session due to ZooKeeper asynchronously closing client > session. > > > Key: YARN-3242 > URL: https://issues.apache.org/jira/browse/YARN-3242 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3242.000.patch, YARN-3242.001.patch, > YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch > > > Old ZK client session watcher event messed up new ZK client session due to > ZooKeeper asynchronously closing client session. > The watcher event from old ZK client session can still be sent to > ZKRMStateStore after the old ZK client session is closed. > This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper > session. > We only have one ZKRMStateStore but we can have multiple ZK client sessions. > Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher > event is from current session. So the watcher event from old ZK client > session which just is closed will still be processed. > For example, If a Disconnected event received from old session after new > session is connected, the zkClient will be set to null > {code} > case Disconnected: > LOG.info("ZKRMStateStore Session disconnected"); > oldZkClient = zkClient; > zkClient = null; > break; > {code} > Then ZKRMStateStore won't receive SyncConnected event from new session > because new session is already in SyncConnected state and it won't send > SyncConnected event until it is disconnected and connected again. > Then we will see all the ZKRMStateStore operations fail with IOException > "Wait for ZKClient creation timed out" until RM shutdown. > The following code from zookeeper(ClientCnxn#EventThread) show even after > receive eventOfDeath, EventThread will still process all the events until > waitingEvents queue is empty. > {code} > while (true) { > Object event = waitingEvents.take(); > if (event == eventOfDeath) { > wasKilled = true; > } else { > processEvent(event); > } > if (wasKilled) > synchronized (waitingEvents) { >if (waitingEvents.isEmpty()) { > isRunning = false; > break; >} > } > } > private void processEvent(Object event) { > try { > if (event instanceof WatcherSetEventPair) { > // each watcher will process the event > WatcherSetEventPair pair = (WatcherSetEventPair) event; > for (Watcher watcher : pair.watchers) { > try { > watcher.process(pair.event); > } catch (Throwable t) { > LOG.error("Error while calling watcher ", t); > } > } > } else { > public void disconnect() { > if (LOG.isDebugEnabled()) { > LOG.debug("Disconnecting client for session: 0x" > + Long.toHexString(getSessionId())); > } > sendThread.close(); > eventThread.queueEventOfDeath(); > } > public void close() throws IOException { > if (LOG.isDebugEnabled()) { > LOG.debug("Closing client for session: 0x" > + Long.toHexString(getSessionId())); > } > try { > RequestHeader h = new RequestHeader(); > h.setType(ZooDefs.OpCode.closeSession); > submitRequest(h, null, null, null); > } catch (InterruptedException e) { > // ignore, close the send/event threads > } finally { > disconnect(); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3242) Old ZK client session watcher event causes ZKRMStateStore out of sync with current ZK client session due to ZooKeeper asynchronously closing client session.
[ https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345847#comment-14345847 ] Hadoop QA commented on YARN-3242: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702245/YARN-3242.004.patch against trunk revision e17e5ba. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1236 javac compiler warnings (more than the trunk's current 1199 warnings). {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 34 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/6824//artifact/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws: org.apache.hadoop.ha.TestZKFailoverController org.apache.hadoop.ha.TestZKFailoverControllerStress Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6824//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6824//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6824//console This message is automatically generated. > Old ZK client session watcher event causes ZKRMStateStore out of sync with > current ZK client session due to ZooKeeper asynchronously closing client > session. > > > Key: YARN-3242 > URL: https://issues.apache.org/jira/browse/YARN-3242 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3242.000.patch, YARN-3242.001.patch, > YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch > > > Old ZK client session watcher event messed up new ZK client session due to > ZooKeeper asynchronously closing client session. > The watcher event from old ZK client session can still be sent to > ZKRMStateStore after the old ZK client session is closed. > This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper > session. > We only have one ZKRMStateStore but we can have multiple ZK client sessions. > Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher > event is from current session. So the watcher event from old ZK client > session which just is closed will still be processed. > For example, If a Disconnected event received from old session after new > session is connected, the zkClient will be set to null > {code} > case Disconnected: > LOG.info("ZKRMStateStore Session disconnected"); > oldZkClient = zkClient; > zkClient = null; > break; > {code} > Then ZKRMStateStore won't receive SyncConnected event from new session > because new session is already in SyncConnected state and it won't send > SyncConnected event until it is disconnected and connected again. > Then we will see all the ZKRMStateStore operations fail with IOException > "Wait for ZKClient creation timed out" until RM shutdown. > The following code from zookeeper(ClientCnxn#EventThread) show even after > receive eventOfDeath, EventThread will still process all the events until > waitingEvents queue is empty. > {code} > while (true) { > Object event = waitingEvents.take(); > if (event == eventOfDeath) { > wasKilled = true; > } else { > processEvent(event); > } > if (wasKilled) > synchronized (waitingEvents) { >if (waitingEvents.isEmpty()) { > isRunning = false; > break; >} > } > } > private void processEvent(Object event) { > try { > if (event instanceof WatcherSetEventPair) { > // each watcher will process the event >
[jira] [Updated] (YARN-3242) Old ZK client session watcher event causes ZKRMStateStore out of sync with current ZK client session due to ZooKeeper asynchronously closing client session.
[ https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3242: Attachment: (was: YARN-3242.004.patch) > Old ZK client session watcher event causes ZKRMStateStore out of sync with > current ZK client session due to ZooKeeper asynchronously closing client > session. > > > Key: YARN-3242 > URL: https://issues.apache.org/jira/browse/YARN-3242 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3242.000.patch, YARN-3242.001.patch, > YARN-3242.002.patch, YARN-3242.003.patch > > > Old ZK client session watcher event messed up new ZK client session due to > ZooKeeper asynchronously closing client session. > The watcher event from old ZK client session can still be sent to > ZKRMStateStore after the old ZK client session is closed. > This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper > session. > We only have one ZKRMStateStore but we can have multiple ZK client sessions. > Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher > event is from current session. So the watcher event from old ZK client > session which just is closed will still be processed. > For example, If a Disconnected event received from old session after new > session is connected, the zkClient will be set to null > {code} > case Disconnected: > LOG.info("ZKRMStateStore Session disconnected"); > oldZkClient = zkClient; > zkClient = null; > break; > {code} > Then ZKRMStateStore won't receive SyncConnected event from new session > because new session is already in SyncConnected state and it won't send > SyncConnected event until it is disconnected and connected again. > Then we will see all the ZKRMStateStore operations fail with IOException > "Wait for ZKClient creation timed out" until RM shutdown. > The following code from zookeeper(ClientCnxn#EventThread) show even after > receive eventOfDeath, EventThread will still process all the events until > waitingEvents queue is empty. > {code} > while (true) { > Object event = waitingEvents.take(); > if (event == eventOfDeath) { > wasKilled = true; > } else { > processEvent(event); > } > if (wasKilled) > synchronized (waitingEvents) { >if (waitingEvents.isEmpty()) { > isRunning = false; > break; >} > } > } > private void processEvent(Object event) { > try { > if (event instanceof WatcherSetEventPair) { > // each watcher will process the event > WatcherSetEventPair pair = (WatcherSetEventPair) event; > for (Watcher watcher : pair.watchers) { > try { > watcher.process(pair.event); > } catch (Throwable t) { > LOG.error("Error while calling watcher ", t); > } > } > } else { > public void disconnect() { > if (LOG.isDebugEnabled()) { > LOG.debug("Disconnecting client for session: 0x" > + Long.toHexString(getSessionId())); > } > sendThread.close(); > eventThread.queueEventOfDeath(); > } > public void close() throws IOException { > if (LOG.isDebugEnabled()) { > LOG.debug("Closing client for session: 0x" > + Long.toHexString(getSessionId())); > } > try { > RequestHeader h = new RequestHeader(); > h.setType(ZooDefs.OpCode.closeSession); > submitRequest(h, null, null, null); > } catch (InterruptedException e) { > // ignore, close the send/event threads > } finally { > disconnect(); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3242) Old ZK client session watcher event causes ZKRMStateStore out of sync with current ZK client session due to ZooKeeper asynchronously closing client session.
[ https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3242: Attachment: YARN-3242.004.patch > Old ZK client session watcher event causes ZKRMStateStore out of sync with > current ZK client session due to ZooKeeper asynchronously closing client > session. > > > Key: YARN-3242 > URL: https://issues.apache.org/jira/browse/YARN-3242 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3242.000.patch, YARN-3242.001.patch, > YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch > > > Old ZK client session watcher event messed up new ZK client session due to > ZooKeeper asynchronously closing client session. > The watcher event from old ZK client session can still be sent to > ZKRMStateStore after the old ZK client session is closed. > This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper > session. > We only have one ZKRMStateStore but we can have multiple ZK client sessions. > Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher > event is from current session. So the watcher event from old ZK client > session which just is closed will still be processed. > For example, If a Disconnected event received from old session after new > session is connected, the zkClient will be set to null > {code} > case Disconnected: > LOG.info("ZKRMStateStore Session disconnected"); > oldZkClient = zkClient; > zkClient = null; > break; > {code} > Then ZKRMStateStore won't receive SyncConnected event from new session > because new session is already in SyncConnected state and it won't send > SyncConnected event until it is disconnected and connected again. > Then we will see all the ZKRMStateStore operations fail with IOException > "Wait for ZKClient creation timed out" until RM shutdown. > The following code from zookeeper(ClientCnxn#EventThread) show even after > receive eventOfDeath, EventThread will still process all the events until > waitingEvents queue is empty. > {code} > while (true) { > Object event = waitingEvents.take(); > if (event == eventOfDeath) { > wasKilled = true; > } else { > processEvent(event); > } > if (wasKilled) > synchronized (waitingEvents) { >if (waitingEvents.isEmpty()) { > isRunning = false; > break; >} > } > } > private void processEvent(Object event) { > try { > if (event instanceof WatcherSetEventPair) { > // each watcher will process the event > WatcherSetEventPair pair = (WatcherSetEventPair) event; > for (Watcher watcher : pair.watchers) { > try { > watcher.process(pair.event); > } catch (Throwable t) { > LOG.error("Error while calling watcher ", t); > } > } > } else { > public void disconnect() { > if (LOG.isDebugEnabled()) { > LOG.debug("Disconnecting client for session: 0x" > + Long.toHexString(getSessionId())); > } > sendThread.close(); > eventThread.queueEventOfDeath(); > } > public void close() throws IOException { > if (LOG.isDebugEnabled()) { > LOG.debug("Closing client for session: 0x" > + Long.toHexString(getSessionId())); > } > try { > RequestHeader h = new RequestHeader(); > h.setType(ZooDefs.OpCode.closeSession); > submitRequest(h, null, null, null); > } catch (InterruptedException e) { > // ignore, close the send/event threads > } finally { > disconnect(); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2190) Provide a Windows container executor that can limit memory and CPU
[ https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345858#comment-14345858 ] Hadoop QA commented on YARN-2190: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702264/YARN-2190.10.patch against trunk revision e17e5ba. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6826//console This message is automatically generated. > Provide a Windows container executor that can limit memory and CPU > -- > > Key: YARN-2190 > URL: https://issues.apache.org/jira/browse/YARN-2190 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Chuan Liu >Assignee: Chuan Liu > Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, > YARN-2190.10.patch, YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, > YARN-2190.5.patch, YARN-2190.6.patch, YARN-2190.7.patch, YARN-2190.8.patch, > YARN-2190.9.patch > > > Yarn default container executor on Windows does not set the resource limit on > the containers currently. The memory limit is enforced by a separate > monitoring thread. The container implementation on Windows uses Job Object > right now. The latest Windows (8 or later) API allows CPU and memory limits > on the job objects. We want to create a Windows container executor that sets > the limits on job objects thus provides resource enforcement at OS level. > http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users
[ https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345878#comment-14345878 ] Robert Kanter commented on YARN-2423: - Even if ATS v2 makes it for 2.8, I imagine it won't be fully stable; I'm assuming we're not going to suddenly throw away the old ATS in 2.8, right? We have to mark it deprecated and leave it in for a few releases, especially if the new ATS isn't 100% ready yet. We seem to be assuming that this API won't be compatible with the new ATS. I agree; it likely won't. That also means that the REST API won't either. So, regardless of whether or not we add this Java API, users will have to rewrite their code to use a new API. Given that, it's a lot easier and cleaner for Java users to rewrite their code from one Java API to another Java API, than it is to rewrite their custom wrapped REST API and JSON handling code (which they also have to write themselves) to a new Java API (and that's assuming we ship a new Java API in 2.8; what if we only have another REST API? That would be even harder for users). And if the API turns out to be compatible, then we're already ahead of the game. Either way, it seems like it would be easier for everyone if we put in this API, even if it's going to be "obsolete". Keep in mind that even if it only lasts one release, not everyone updates their cluster every time a new Hadoop release is out. There are many users who will stay on 2.7 (or distributed derived therefrom) for quite a while and would benefit from this API in the mean time. > TimelineClient should wrap all GET APIs to facilitate Java users > > > Key: YARN-2423 > URL: https://issues.apache.org/jira/browse/YARN-2423 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Robert Kanter > Attachments: YARN-2423.004.patch, YARN-2423.005.patch, > YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, > YARN-2423.patch > > > TimelineClient provides the Java method to put timeline entities. It's also > good to wrap over all GET APIs (both entity and domain), and deserialize the > json response into Java POJO objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users
[ https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345887#comment-14345887 ] Karthik Kambatla commented on YARN-2423: [~hitesh] - in my optimistic opinion, Timeline v2 is at least 6 months out. Ideally, I would like for 2.8 to have come out before then, but that is besides the point. And, as Robert mentioned, I suspect it will take us at least another 3 months to stabilize it enough to recommend it over Timeline v1. I feel it is only reasonable to provide a way for downstream apps to use the existing ATS until then. Zhijie's suggestion of marking it Unstable and adding comments to capture the reason seems like a good approach to me. > TimelineClient should wrap all GET APIs to facilitate Java users > > > Key: YARN-2423 > URL: https://issues.apache.org/jira/browse/YARN-2423 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Robert Kanter > Attachments: YARN-2423.004.patch, YARN-2423.005.patch, > YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, > YARN-2423.patch > > > TimelineClient provides the Java method to put timeline entities. It's also > good to wrap over all GET APIs (both entity and domain), and deserialize the > json response into Java POJO objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users
[ https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345890#comment-14345890 ] Karthik Kambatla commented on YARN-2423: I am open to including this in point releases based on 2.4, 2.5 and 2.6 when they come out. > TimelineClient should wrap all GET APIs to facilitate Java users > > > Key: YARN-2423 > URL: https://issues.apache.org/jira/browse/YARN-2423 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Robert Kanter > Attachments: YARN-2423.004.patch, YARN-2423.005.patch, > YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, > YARN-2423.patch > > > TimelineClient provides the Java method to put timeline entities. It's also > good to wrap over all GET APIs (both entity and domain), and deserialize the > json response into Java POJO objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-3222: - Attachment: 0005-YARN-3222.patch > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345909#comment-14345909 ] Rohith commented on YARN-3222: -- bq. check you added earlier about sending NodeResourceUpdate event only if the node resource is different Agree Updated the patch addressing above comment. Kindly review it. > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records
[ https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-3267: --- Attachment: YARN_3267_WIP3.patch > Timelineserver applies the ACL rules after applying the limit on the number > of records > -- > > Key: YARN-3267 > URL: https://issues.apache.org/jira/browse/YARN-3267 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Prakash Ramachandran >Assignee: Chang Li > Attachments: YARN_3267_WIP.patch, YARN_3267_WIP1.patch, > YARN_3267_WIP2.patch, YARN_3267_WIP3.patch > > > While fetching the entities from timelineserver, the limit is applied on the > entities to be fetched from leveldb, the ACL filters are applied after this > (TimelineDataManager.java::getEntities). > this could mean that even if there are entities available which match the > query criteria, we could end up not getting any results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2981) DockerContainerExecutor must support a Cluster-wide default Docker image
[ https://issues.apache.org/jira/browse/YARN-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345925#comment-14345925 ] Hadoop QA commented on YARN-2981: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702266/YARN-2981.patch against trunk revision e17e5ba. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6825//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6825//console This message is automatically generated. > DockerContainerExecutor must support a Cluster-wide default Docker image > > > Key: YARN-2981 > URL: https://issues.apache.org/jira/browse/YARN-2981 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Abin Shahab >Assignee: Abin Shahab > Attachments: YARN-2981.patch, YARN-2981.patch, YARN-2981.patch, > YARN-2981.patch > > > This allows the yarn administrator to add a cluster-wide default docker image > that will be used when there are no per-job override of docker images. With > this features, it would be convenient for newer applications like slider to > launch inside a cluster-default docker container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345950#comment-14345950 ] Gera Shegalov commented on YARN-2893: - Hi [~zxu], it's great that you make progress on this JIRA. Any chance you can capture the failure scenarios in some unit test so we can relate it better to the real failures we are seeing. > AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream > -- > > Key: YARN-2893 > URL: https://issues.apache.org/jira/browse/YARN-2893 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Gera Shegalov >Assignee: zhihai xu > Attachments: YARN-2893.000.patch > > > MapReduce jobs on our clusters experience sporadic failures due to corrupt > tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users
[ https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345963#comment-14345963 ] Vinod Kumar Vavilapalli commented on YARN-2423: --- bq. Zhijie's suggestion of marking it Unstable and adding comments to capture the reason seems like a good approach to me. I thought the problem for Spark per [~vanzin] was that they cannot depend on non-public or public-Unstable stuff, no? > TimelineClient should wrap all GET APIs to facilitate Java users > > > Key: YARN-2423 > URL: https://issues.apache.org/jira/browse/YARN-2423 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Robert Kanter > Attachments: YARN-2423.004.patch, YARN-2423.005.patch, > YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, > YARN-2423.patch > > > TimelineClient provides the Java method to put timeline entities. It's also > good to wrap over all GET APIs (both entity and domain), and deserialize the > json response into Java POJO objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345966#comment-14345966 ] zhihai xu commented on YARN-2893: - Hi [~jira.shegalov], That is a very good suggestion. Yes, I will think about to write a test case for this failure. thanks zhihai > AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream > -- > > Key: YARN-2893 > URL: https://issues.apache.org/jira/browse/YARN-2893 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Gera Shegalov >Assignee: zhihai xu > Attachments: YARN-2893.000.patch > > > MapReduce jobs on our clusters experience sporadic failures due to corrupt > tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users
[ https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345976#comment-14345976 ] Marcelo Vanzin commented on YARN-2423: -- We'd rather not depend on unstable APIs. But in this context, what does "Unstable" mean? When ATS v2 is released, will all support for ATS v1 be removed? Are you gonna change all the APIs to work against v2, making code built against v1 effectively broken? I'd imagine that if v2 is really incompatible you'd add a new set of APIs and then deprecate v1 instead. The v1 APIs would be public, stable and deprecated at that point. > TimelineClient should wrap all GET APIs to facilitate Java users > > > Key: YARN-2423 > URL: https://issues.apache.org/jira/browse/YARN-2423 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Robert Kanter > Attachments: YARN-2423.004.patch, YARN-2423.005.patch, > YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, > YARN-2423.patch > > > TimelineClient provides the Java method to put timeline entities. It's also > good to wrap over all GET APIs (both entity and domain), and deserialize the > json response into Java POJO objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345987#comment-14345987 ] Hadoop QA commented on YARN-3222: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702276/0005-YARN-3222.patch against trunk revision e17e5ba. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1151 javac compiler warnings (more than the trunk's current 185 warnings). {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 43 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/6828//artifact/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-distcp. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6828//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6828//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6828//console This message is automatically generated. > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345998#comment-14345998 ] Vinod Kumar Vavilapalli commented on YARN-2893: --- Great progress, [~zxu]! Your explanation sounds like this error should always happen. Do you know why we are only seeing it sporadically? Are there special conditions when this happens? > AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream > -- > > Key: YARN-2893 > URL: https://issues.apache.org/jira/browse/YARN-2893 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Gera Shegalov >Assignee: zhihai xu > Attachments: YARN-2893.000.patch > > > MapReduce jobs on our clusters experience sporadic failures due to corrupt > tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2190) Provide a Windows container executor that can limit memory and CPU
[ https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345996#comment-14345996 ] Chuan Liu commented on YARN-2190: - bq. -1 patch. The patch command could not apply the patch. Not sure what is the problem. I can apply the patch on both Windows and Linux with '{{patch -p0 < YARN-2190.10.patch}}'. > Provide a Windows container executor that can limit memory and CPU > -- > > Key: YARN-2190 > URL: https://issues.apache.org/jira/browse/YARN-2190 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Chuan Liu >Assignee: Chuan Liu > Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, > YARN-2190.10.patch, YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, > YARN-2190.5.patch, YARN-2190.6.patch, YARN-2190.7.patch, YARN-2190.8.patch, > YARN-2190.9.patch > > > Yarn default container executor on Windows does not set the resource limit on > the containers currently. The memory limit is enforced by a separate > monitoring thread. The container implementation on Windows uses Job Object > right now. The latest Windows (8 or later) API allows CPU and memory limits > on the job objects. We want to create a Windows container executor that sets > the limits on job objects thus provides resource enforcement at OS level. > http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3122) Metrics for container's actual CPU usage
[ https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346006#comment-14346006 ] Anubhav Dhoot commented on YARN-3122: - Those changes look good for me. For a sample when i run stress -c 3 using distributed shell on a 4 core machine and 8 vcores configured. This would mean 3 cores would be consumed which would map to 6 vcores. Thus PCpu would be approx 300% (similar to top) and MilliVcoresUsed would be approx 6000 which is what we see below on the actual metrics on the NodeManager {noformat} hadoop org.apache.hadoop.yarn.applications.distributedshell.Client -debug -shell_command "stress -c 3" -jar ../share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.0.0-SNAPSHOT.jar -container_memory 350 -master_memory 350 {noformat} {noformat} }, { "name" : "Hadoop:service=NodeManager,name=ContainerResource_container_1425421474415_0003_01_02", "modelerType" : "ContainerResource_container_1425421474415_0003_01_02", "tag.ContainerResource" : "container_1425421474415_0003_01_02", "tag.Context" : "container", "tag.ContainerPid" : "10095", "tag.Hostname" : "anuonebox.ent.cloudera.com", "PMemUsageMBsNumUsage" : 23, "PMemUsageMBsAvgMBs" : 2.0, "PMemUsageMBsStdevMBs" : 0.0, "PMemUsageMBsIMinMBs" : 2.0, "PMemUsageMBsIMaxMBs" : 2.0, "PMemUsageMBsMinMBs" : 2.0, "PMemUsageMBsMaxMBs" : 2.0, "PCpuUsagePercentNumUsage" : 23, "PCpuUsagePercentAvgPercents" : 284.304347826087, "PCpuUsagePercentStdevPercents" : 62.196488341829514, "PCpuUsagePercentIMinPercents" : -1.0, "PCpuUsagePercentIMaxPercents" : 298.0, "PCpuUsagePercentMinPercents" : -1.0, "PCpuUsagePercentMaxPercents" : 298.0, "MilliVcoreUsageNumUsage" : 23, "MilliVcoreUsageAvgMilliVcores" : 5694.782608695651, "MilliVcoreUsageStdevMilliVcores" : 1245.8097752255082, "MilliVcoreUsageIMinMilliVcores" : -20.0, "MilliVcoreUsageIMaxMilliVcores" : 5971.0, "MilliVcoreUsageMinMilliVcores" : -20.0, "MilliVcoreUsageMaxMilliVcores" : 5971.0, "pMemLimitMBs" : 512, "vMemLimitMBs" : 1075, "vCoreLimit" : 1 } ] } {noformat} > Metrics for container's actual CPU usage > > > Key: YARN-3122 > URL: https://issues.apache.org/jira/browse/YARN-3122 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3122.001.patch, YARN-3122.002.patch, > YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.005.patch, > YARN-3122.prelim.patch, YARN-3122.prelim.patch > > > It would be nice to capture resource usage per container, for a variety of > reasons. This JIRA is to track CPU usage. > YARN-2965 tracks the resource usage on the node, and the two implementations > should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346013#comment-14346013 ] Rohith commented on YARN-3222: -- Had glance at javac and javadoc warning, this looks unrelated to patch > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3292) [Umbrella] Tests and/or tools for YARN backwards compatibility verification
Li Lu created YARN-3292: --- Summary: [Umbrella] Tests and/or tools for YARN backwards compatibility verification Key: YARN-3292 URL: https://issues.apache.org/jira/browse/YARN-3292 Project: Hadoop YARN Issue Type: Improvement Reporter: Li Lu Assignee: Li Lu YARN-666 added the support to YARN rolling upgrade. In order to support this feature, we made changes from many perspectives. There were many assumptions made together with these existing changes. Future code changes may break these assumptions by accident, and hence break the YARN rolling upgrades feature. To simplify YARN RU regression tests, maybe we would like to create a set of tools/tests that can verify YARN RU backward compatibility. On the very first step, we may want to have a compatibility checker for important protocols and APIs. We may also want to incorporate these tools into our test Jenkins runs, if necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3242) Old ZK client session watcher event causes ZKRMStateStore out of sync with current ZK client session due to ZooKeeper asynchronously closing client session.
[ https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346048#comment-14346048 ] Hadoop QA commented on YARN-3242: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702269/YARN-3242.004.patch against trunk revision e17e5ba. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 7 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStoreZKClientConnections The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6827//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6827//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6827//console This message is automatically generated. > Old ZK client session watcher event causes ZKRMStateStore out of sync with > current ZK client session due to ZooKeeper asynchronously closing client > session. > > > Key: YARN-3242 > URL: https://issues.apache.org/jira/browse/YARN-3242 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3242.000.patch, YARN-3242.001.patch, > YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch > > > Old ZK client session watcher event messed up new ZK client session due to > ZooKeeper asynchronously closing client session. > The watcher event from old ZK client session can still be sent to > ZKRMStateStore after the old ZK client session is closed. > This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper > session. > We only have one ZKRMStateStore but we can have multiple ZK client sessions. > Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher > event is from current session. So the watcher event from old ZK client > session which just is closed will still be processed. > For example, If a Disconnected event received from old session after new > session is connected, the zkClient will be set to null > {code} > case Disconnected: > LOG.info("ZKRMStateStore Session disconnected"); > oldZkClient = zkClient; > zkClient = null; > break; > {code} > Then ZKRMStateStore won't receive SyncConnected event from new session > because new session is already in SyncConnected state and it won't send > SyncConnected event until it is disconnected and connected again. > Then we will see all the ZKRMStateStore operations fail with IOException > "Wait for ZKClient creation timed out" until RM shutdown. > The following code from zookeeper(ClientCnxn#EventThread) show even after > receive eventOfDeath, EventThread will still process all the events until > waitingEvents queue is empty. > {code} > while (true) { > Object event = waitingEvents.take(); > if (event == eventOfDeath) { > wasKilled = true; > } else { > processEvent(event); > } > if (wasKilled) > synchronized (waitingEvents) { >if (waitingEvents.isEmpty()) { > isRunning = false; >
[jira] [Commented] (YARN-3249) Add the kill application to the Resource Manager Web UI
[ https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346058#comment-14346058 ] Jian He commented on YARN-3249: --- [~ryu_kobayashi], thanks for your work ! here, can this be directly routed to the RMWebService ? "/ws/v1/cluster/apps/{appid}/state"; {code} .$onclick(String.format("confirmAction('%s')", url(String.format("/killapp/%s", aid {code} > Add the kill application to the Resource Manager Web UI > --- > > Key: YARN-3249 > URL: https://issues.apache.org/jira/browse/YARN-3249 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0, 2.7.0 >Reporter: Ryu Kobayashi >Assignee: Ryu Kobayashi >Priority: Minor > Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.3.patch, > YARN-3249.4.patch, YARN-3249.5.patch, YARN-3249.patch, killapp-failed.log, > killapp-failed2.log, screenshot.png, screenshot2.png > > > It want to kill the application on the JobTracker similarly Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)