[jira] [Commented] (YARN-4237) Support additional queries for ATSv2 Web UI
[ https://issues.apache.org/jira/browse/YARN-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948161#comment-14948161 ] Varun Saxena commented on YARN-4237: I have started. I would have infact updated patch for YARN-4179 yesterday itself but was facing some issues with tests. We take system time to set flow's activity time. So in that case I cannot write deterministic tests. Maybe i can use powermock to mock System.currentTimeMillis which is a static function. > Support additional queries for ATSv2 Web UI > --- > > Key: YARN-4237 > URL: https://issues.apache.org/jira/browse/YARN-4237 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Varun Saxena >Assignee: Varun Saxena > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-261) Ability to fail AM attempts
[ https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948134#comment-14948134 ] Rohith Sharma K S commented on YARN-261: Attached the patch fixing review comments. The attached patch does fail attempt. Regarding killing attempt, I tried initial prototyping it but it takes some time. I would prefer this to go in and kill attempt take up later in different JIRA if required. [~jlowe] The attached patch 0002-YARN-261.patch is the latest one. Kindly review it > Ability to fail AM attempts > --- > > Key: YARN-261 > URL: https://issues.apache.org/jira/browse/YARN-261 > Project: Hadoop YARN > Issue Type: New Feature > Components: api >Affects Versions: 2.0.3-alpha >Reporter: Jason Lowe >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-261.patch, 0002-YARN-261.patch, > YARN-261--n2.patch, YARN-261--n3.patch, YARN-261--n4.patch, > YARN-261--n5.patch, YARN-261--n6.patch, YARN-261--n7.patch, YARN-261.patch > > > It would be nice if clients could ask for an AM attempt to be killed. This > is analogous to the task attempt kill support provided by MapReduce. > This feature would be useful in a scenario where AM retries are enabled, the > AM supports recovery, and a particular AM attempt is stuck. Currently if > this occurs the user's only recourse is to kill the entire application, > requiring them to resubmit a new application and potentially breaking > downstream dependent jobs if it's part of a bigger workflow. Killing the > attempt would allow a new attempt to be started by the RM without killing the > entire application, and if the AM supports recovery it could potentially save > a lot of work. It could also be useful in workflow scenarios where the > failure of the entire application kills the workflow, but the ability to kill > an attempt can keep the workflow going if the subsequent attempt succeeds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-261) Ability to fail AM attempts
[ https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948124#comment-14948124 ] Rohith Sharma K S commented on YARN-261: Updated the issue description as per the patch implementation. > Ability to fail AM attempts > --- > > Key: YARN-261 > URL: https://issues.apache.org/jira/browse/YARN-261 > Project: Hadoop YARN > Issue Type: New Feature > Components: api >Affects Versions: 2.0.3-alpha >Reporter: Jason Lowe >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-261.patch, 0002-YARN-261.patch, > YARN-261--n2.patch, YARN-261--n3.patch, YARN-261--n4.patch, > YARN-261--n5.patch, YARN-261--n6.patch, YARN-261--n7.patch, YARN-261.patch > > > It would be nice if clients could ask for an AM attempt to be killed. This > is analogous to the task attempt kill support provided by MapReduce. > This feature would be useful in a scenario where AM retries are enabled, the > AM supports recovery, and a particular AM attempt is stuck. Currently if > this occurs the user's only recourse is to kill the entire application, > requiring them to resubmit a new application and potentially breaking > downstream dependent jobs if it's part of a bigger workflow. Killing the > attempt would allow a new attempt to be started by the RM without killing the > entire application, and if the AM supports recovery it could potentially save > a lot of work. It could also be useful in workflow scenarios where the > failure of the entire application kills the workflow, but the ability to kill > an attempt can keep the workflow going if the subsequent attempt succeeds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-261) Ability to fail AM attempts
[ https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-261: --- Attachment: 0002-YARN-261.patch > Ability to fail AM attempts > --- > > Key: YARN-261 > URL: https://issues.apache.org/jira/browse/YARN-261 > Project: Hadoop YARN > Issue Type: New Feature > Components: api >Affects Versions: 2.0.3-alpha >Reporter: Jason Lowe >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-261.patch, 0002-YARN-261.patch, > YARN-261--n2.patch, YARN-261--n3.patch, YARN-261--n4.patch, > YARN-261--n5.patch, YARN-261--n6.patch, YARN-261--n7.patch, YARN-261.patch > > > It would be nice if clients could ask for an AM attempt to be killed. This > is analogous to the task attempt kill support provided by MapReduce. > This feature would be useful in a scenario where AM retries are enabled, the > AM supports recovery, and a particular AM attempt is stuck. Currently if > this occurs the user's only recourse is to kill the entire application, > requiring them to resubmit a new application and potentially breaking > downstream dependent jobs if it's part of a bigger workflow. Killing the > attempt would allow a new attempt to be started by the RM without killing the > entire application, and if the AM supports recovery it could potentially save > a lot of work. It could also be useful in workflow scenarios where the > failure of the entire application kills the workflow, but the ability to kill > an attempt can keep the workflow going if the subsequent attempt succeeds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-261) Ability to fail AM attempts
[ https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-261: --- Attachment: (was: 0002-YARN-261.patch) > Ability to fail AM attempts > --- > > Key: YARN-261 > URL: https://issues.apache.org/jira/browse/YARN-261 > Project: Hadoop YARN > Issue Type: New Feature > Components: api >Affects Versions: 2.0.3-alpha >Reporter: Jason Lowe >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-261.patch, YARN-261--n2.patch, > YARN-261--n3.patch, YARN-261--n4.patch, YARN-261--n5.patch, > YARN-261--n6.patch, YARN-261--n7.patch, YARN-261.patch > > > It would be nice if clients could ask for an AM attempt to be killed. This > is analogous to the task attempt kill support provided by MapReduce. > This feature would be useful in a scenario where AM retries are enabled, the > AM supports recovery, and a particular AM attempt is stuck. Currently if > this occurs the user's only recourse is to kill the entire application, > requiring them to resubmit a new application and potentially breaking > downstream dependent jobs if it's part of a bigger workflow. Killing the > attempt would allow a new attempt to be started by the RM without killing the > entire application, and if the AM supports recovery it could potentially save > a lot of work. It could also be useful in workflow scenarios where the > failure of the entire application kills the workflow, but the ability to kill > an attempt can keep the workflow going if the subsequent attempt succeeds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-261) Ability to fail AM attempts
[ https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-261: --- Summary: Ability to fail AM attempts (was: Ability to kill AM attempts) > Ability to fail AM attempts > --- > > Key: YARN-261 > URL: https://issues.apache.org/jira/browse/YARN-261 > Project: Hadoop YARN > Issue Type: New Feature > Components: api >Affects Versions: 2.0.3-alpha >Reporter: Jason Lowe >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-261.patch, 0002-YARN-261.patch, > YARN-261--n2.patch, YARN-261--n3.patch, YARN-261--n4.patch, > YARN-261--n5.patch, YARN-261--n6.patch, YARN-261--n7.patch, YARN-261.patch > > > It would be nice if clients could ask for an AM attempt to be killed. This > is analogous to the task attempt kill support provided by MapReduce. > This feature would be useful in a scenario where AM retries are enabled, the > AM supports recovery, and a particular AM attempt is stuck. Currently if > this occurs the user's only recourse is to kill the entire application, > requiring them to resubmit a new application and potentially breaking > downstream dependent jobs if it's part of a bigger workflow. Killing the > attempt would allow a new attempt to be started by the RM without killing the > entire application, and if the AM supports recovery it could potentially save > a lot of work. It could also be useful in workflow scenarios where the > failure of the entire application kills the workflow, but the ability to kill > an attempt can keep the workflow going if the subsequent attempt succeeds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-261) Ability to kill AM attempts
[ https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-261: --- Attachment: 0002-YARN-261.patch > Ability to kill AM attempts > --- > > Key: YARN-261 > URL: https://issues.apache.org/jira/browse/YARN-261 > Project: Hadoop YARN > Issue Type: New Feature > Components: api >Affects Versions: 2.0.3-alpha >Reporter: Jason Lowe >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-261.patch, 0002-YARN-261.patch, > YARN-261--n2.patch, YARN-261--n3.patch, YARN-261--n4.patch, > YARN-261--n5.patch, YARN-261--n6.patch, YARN-261--n7.patch, YARN-261.patch > > > It would be nice if clients could ask for an AM attempt to be killed. This > is analogous to the task attempt kill support provided by MapReduce. > This feature would be useful in a scenario where AM retries are enabled, the > AM supports recovery, and a particular AM attempt is stuck. Currently if > this occurs the user's only recourse is to kill the entire application, > requiring them to resubmit a new application and potentially breaking > downstream dependent jobs if it's part of a bigger workflow. Killing the > attempt would allow a new attempt to be started by the RM without killing the > entire application, and if the AM supports recovery it could potentially save > a lot of work. It could also be useful in workflow scenarios where the > failure of the entire application kills the workflow, but the ability to kill > an attempt can keep the workflow going if the subsequent attempt succeeds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4201) AMBlacklist does not work for minicluster
[ https://issues.apache.org/jira/browse/YARN-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948093#comment-14948093 ] Jun Gong commented on YARN-4201: Could anyone help review it please? > AMBlacklist does not work for minicluster > - > > Key: YARN-4201 > URL: https://issues.apache.org/jira/browse/YARN-4201 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-4021.001.patch > > > For minicluster (scheduler.include-port-in-node-name is set to TRUE), > AMBlacklist does not work. It is because RM just puts host to AMBlacklist > whether scheduler.include-port-in-node-name is set or not. In fact RM should > put "host + port" to AMBlacklist when it is set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4237) Support additional queries for ATSv2 Web UI
[ https://issues.apache.org/jira/browse/YARN-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948051#comment-14948051 ] Naganarasimha G R commented on YARN-4237: - Hi [~varun_saxena], if you already not started on this or YARN-4179 , can i work on them ? > Support additional queries for ATSv2 Web UI > --- > > Key: YARN-4237 > URL: https://issues.apache.org/jira/browse/YARN-4237 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Varun Saxena >Assignee: Varun Saxena > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient
[ https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948044#comment-14948044 ] Naganarasimha G R commented on YARN-3367: - bq. There is still no hard guarantee that they will be received by the server in the same order. But as per the current patch If events are queued and sent one by one from the client then server should receive in the same order right ?, Basically its not multi threaded dispatcher but all events are queued and dispatched in a separate thread. > Replace starting a separate thread for post entity with event loop in > TimelineClient > > > Key: YARN-3367 > URL: https://issues.apache.org/jira/browse/YARN-3367 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Junping Du >Assignee: Naganarasimha G R > Attachments: YARN-3367.YARN-2928.001.patch > > > Since YARN-3039, we add loop in TimelineClient to wait for > collectorServiceAddress ready before posting any entity. In consumer of > TimelineClient (like AM), we are starting a new thread for each call to get > rid of potential deadlock in main thread. This way has at least 3 major > defects: > 1. The consumer need some additional code to wrap a thread before calling > putEntities() in TimelineClient. > 2. It cost many thread resources which is unnecessary. > 3. The sequence of events could be out of order because each posting > operation thread get out of waiting loop randomly. > We should have something like event loop in TimelineClient side, > putEntities() only put related entities into a queue of entities and a > separated thread handle to deliver entities in queue to collector via REST > call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4238) createdTime is not reported while publishing entities to ATSv2
[ https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948041#comment-14948041 ] Naganarasimha G R commented on YARN-4238: - Hi [~varun_saxena], Similar to V1 we have already captured these information in TimelineServiceV2Publisher too, Its been captured as part of events of those entities If you want it to be as part of Entity i can add it there but IMO it makes more sense to be as part of event, as different states of application( basically any entity) are captured as events. > createdTime is not reported while publishing entities to ATSv2 > -- > > Key: YARN-4238 > URL: https://issues.apache.org/jira/browse/YARN-4238 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Varun Saxena >Assignee: Varun Saxena > > While publishing entities from RM and elsewhere we are not sending created > time. For instance, created time in TimelineServiceV2Publisher class and for > other entities in other such similar classes is not updated. We can easily > update created time when sending application created event. Likewise for > modification time on every write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.
[ https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948032#comment-14948032 ] Hadoop QA commented on YARN-3943: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 20m 2s | Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 0s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 28s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 19s | The applied patch generated 1 release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 51s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 24s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 27s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 2m 1s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 8m 46s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 59m 2s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12765505/YARN-3943.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 35affec | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/9375/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-nodemanager.html | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/9375/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9375/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9375/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9375/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9375/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9375/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9375/console | This message was automatically generated. > Use separate threshold configurations for disk-full detection and > disk-not-full detection. > -- > > Key: YARN-3943 > URL: https://issues.apache.org/jira/browse/YARN-3943 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3943.000.patch, YARN-3943.001.patch, > YARN-3943.002.patch > > > Use separate threshold configurations to check when disks become full and > when disks become good. Currently the configuration > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" > and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are > used to check both when disks become full and when disks become good. It will > be better to use two configurations: one is used when disks become full from > not-full and the other one is used when disks become not-full from full. So > we can avoid oscillating frequently. > For example: we can set the one for disk-full detection higher than the one > for disk-not-full detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.
[ https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947996#comment-14947996 ] zhihai xu commented on YARN-3943: - Thanks [~jlowe]! yes, the comments are great. Nice catch for the backwards compatibility problem! I uploaded a new patch YARN-3943.002.patch, which addressed all your comments, Please review it. > Use separate threshold configurations for disk-full detection and > disk-not-full detection. > -- > > Key: YARN-3943 > URL: https://issues.apache.org/jira/browse/YARN-3943 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3943.000.patch, YARN-3943.001.patch, > YARN-3943.002.patch > > > Use separate threshold configurations to check when disks become full and > when disks become good. Currently the configuration > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" > and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are > used to check both when disks become full and when disks become good. It will > be better to use two configurations: one is used when disks become full from > not-full and the other one is used when disks become not-full from full. So > we can avoid oscillating frequently. > For example: we can set the one for disk-full detection higher than the one > for disk-not-full detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.
[ https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3943: Attachment: YARN-3943.002.patch > Use separate threshold configurations for disk-full detection and > disk-not-full detection. > -- > > Key: YARN-3943 > URL: https://issues.apache.org/jira/browse/YARN-3943 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3943.000.patch, YARN-3943.001.patch, > YARN-3943.002.patch > > > Use separate threshold configurations to check when disks become full and > when disks become good. Currently the configuration > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" > and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are > used to check both when disks become full and when disks become good. It will > be better to use two configurations: one is used when disks become full from > not-full and the other one is used when disks become not-full from full. So > we can avoid oscillating frequently. > For example: we can set the one for disk-full detection higher than the one > for disk-not-full detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4239) Flow page for Web UI
Varun Saxena created YARN-4239: -- Summary: Flow page for Web UI Key: YARN-4239 URL: https://issues.apache.org/jira/browse/YARN-4239 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Saxena Assignee: Varun Saxena -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4238) createdTime is not reported while publishing entities to ATSv2
Varun Saxena created YARN-4238: -- Summary: createdTime is not reported while publishing entities to ATSv2 Key: YARN-4238 URL: https://issues.apache.org/jira/browse/YARN-4238 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Saxena Assignee: Varun Saxena While publishing entities from RM and elsewhere we are not sending created time. For instance, created time in TimelineServiceV2Publisher class and for other entities in other such similar classes is not updated. We can easily update created time when sending application created event. Likewise for modification time on every write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit
[ https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947847#comment-14947847 ] Eric Payne commented on YARN-3769: -- Investigating test failures and checkstyle warnings > Preemption occurring unnecessarily because preemption doesn't consider user > limit > - > > Key: YARN-3769 > URL: https://issues.apache.org/jira/browse/YARN-3769 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0, 2.7.0, 2.8.0 >Reporter: Eric Payne >Assignee: Eric Payne > Attachments: YARN-3769-branch-2.002.patch, > YARN-3769-branch-2.7.002.patch, YARN-3769.001.branch-2.7.patch, > YARN-3769.001.branch-2.8.patch > > > We are seeing the preemption monitor preempting containers from queue A and > then seeing the capacity scheduler giving them immediately back to queue A. > This happens quite often and causes a lot of churn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4237) Support additional queries for ATSv2 Web UI
[ https://issues.apache.org/jira/browse/YARN-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-4237: --- Issue Type: Sub-task (was: Bug) Parent: YARN-2928 > Support additional queries for ATSv2 Web UI > --- > > Key: YARN-4237 > URL: https://issues.apache.org/jira/browse/YARN-4237 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Varun Saxena >Assignee: Varun Saxena > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4237) Support additional queries for ATSv2 Web UI
[ https://issues.apache.org/jira/browse/YARN-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947833#comment-14947833 ] Varun Saxena commented on YARN-4237: - Given a time range and user name, list flow runs in a given flow id. - Given a time range and user name, list all flows in a given cluster. Date range will be supported in YARN-4179. Do we need to support user ? > Support additional queries for ATSv2 Web UI > --- > > Key: YARN-4237 > URL: https://issues.apache.org/jira/browse/YARN-4237 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Varun Saxena >Assignee: Varun Saxena > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4237) Support additional queries for ATSv2 Web UI
Varun Saxena created YARN-4237: -- Summary: Support additional queries for ATSv2 Web UI Key: YARN-4237 URL: https://issues.apache.org/jira/browse/YARN-4237 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Saxena Assignee: Varun Saxena -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4235) FairScheduler PrimaryGroup does not handle empty groups returned for a user
[ https://issues.apache.org/jira/browse/YARN-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947822#comment-14947822 ] Hadoop QA commented on YARN-4235: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 7s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 58s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 16s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 19s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 51s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 29s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 36s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 28s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 56m 36s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 96m 43s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12765465/YARN-4235.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 6dd47d7 | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/9373/artifact/patchprocess/patchReleaseAuditProblems.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9373/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9373/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9373/console | This message was automatically generated. > FairScheduler PrimaryGroup does not handle empty groups returned for a user > > > Key: YARN-4235 > URL: https://issues.apache.org/jira/browse/YARN-4235 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-4235.001.patch > > > We see NPE if empty groups are returned for a user. This causes a NPE and > cause RM to crash as below > {noformat} > 2015-09-22 16:51:52,780 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type APP_ADDED to the scheduler > java.lang.IndexOutOfBoundsException: Index: 0 > at java.util.Collections$EmptyList.get(Collections.java:3212) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule$PrimaryGroup.getQueueForApp(QueuePlacementRule.java:149) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule.assignAppToQueue(QueuePlacementRule.java:74) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementPolicy.assignAppToQueue(QueuePlacementPolicy.java:167) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:689) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:595) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1180) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) > at java.lang.Thread.run(Thread.java:745) > 2015-09-22 16:51:52,797 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4233) YARN Timeline Service plugin: ATS v1.5
[ https://issues.apache.org/jira/browse/YARN-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-4233: Description: Copy the description from YARN-3942: This adds a new timeline store plugin that is intended as a stop-gap measure to mitigate some of the issues we've seen with ATS v1 while waiting for ATS v2. The intent of this plugin is to provide a workable solution for running the Tez UI against the timeline server on a large-scale clusters running many thousands of jobs per day. > YARN Timeline Service plugin: ATS v1.5 > -- > > Key: YARN-4233 > URL: https://issues.apache.org/jira/browse/YARN-4233 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > > Copy the description from YARN-3942: > This adds a new timeline store plugin that is intended as a stop-gap measure > to mitigate some of the issues we've seen with ATS v1 while waiting for ATS > v2. The intent of this plugin is to provide a workable solution for running > the Tez UI against the timeline server on a large-scale clusters running many > thousands of jobs per day. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4233) YARN Timeline Service plugin: ATS v1.5
[ https://issues.apache.org/jira/browse/YARN-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-4233: Description: Copy the description from YARN-3942: Timeline store to read events from HDFS This adds a new timeline store plugin that is intended as a stop-gap measure to mitigate some of the issues we've seen with ATS v1 while waiting for ATS v2. The intent of this plugin is to provide a workable solution for running the Tez UI against the timeline server on a large-scale clusters running many thousands of jobs per day. was: Copy the description from YARN-3942: This adds a new timeline store plugin that is intended as a stop-gap measure to mitigate some of the issues we've seen with ATS v1 while waiting for ATS v2. The intent of this plugin is to provide a workable solution for running the Tez UI against the timeline server on a large-scale clusters running many thousands of jobs per day. > YARN Timeline Service plugin: ATS v1.5 > -- > > Key: YARN-4233 > URL: https://issues.apache.org/jira/browse/YARN-4233 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > > Copy the description from YARN-3942: Timeline store to read events from HDFS > This adds a new timeline store plugin that is intended as a stop-gap measure > to mitigate some of the issues we've seen with ATS v1 while waiting for ATS > v2. The intent of this plugin is to provide a workable solution for running > the Tez UI against the timeline server on a large-scale clusters running many > thousands of jobs per day. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4233) YARN Timeline Service plugin: ATS v1.5
[ https://issues.apache.org/jira/browse/YARN-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947699#comment-14947699 ] Lars Francke commented on YARN-4233: Hi Xuan, could you add a bit of detail to this issue? I'm trying to understand what's going on but don't really know what it's about. Thank you! > YARN Timeline Service plugin: ATS v1.5 > -- > > Key: YARN-4233 > URL: https://issues.apache.org/jira/browse/YARN-4233 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4236) Metric for aggregated resources allocation per queue
[ https://issues.apache.org/jira/browse/YARN-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4236: --- Attachment: YARN-4236.patch > Metric for aggregated resources allocation per queue > > > Key: YARN-4236 > URL: https://issues.apache.org/jira/browse/YARN-4236 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4236.patch > > > We currently track allocated memory and allocated vcores per queue but we > don't have a good rate metric on how fast we're allocating these things. In > other words, a straight line in allocatedmb could equally be one extreme of > no new containers are being allocated or allocating a bunch of containers > where we free exactly what we allocate each time. Adding a resources > allocated per second per queue would give us a better insight into the rate > of resource churn on a queue. Based on this aggregated resource allocation > per queue we can easily have some tools to measure the rate of resource > allocation per queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4236) Metric for aggregated resources allocation per queue
Chang Li created YARN-4236: -- Summary: Metric for aggregated resources allocation per queue Key: YARN-4236 URL: https://issues.apache.org/jira/browse/YARN-4236 Project: Hadoop YARN Issue Type: Improvement Reporter: Chang Li Assignee: Chang Li We currently track allocated memory and allocated vcores per queue but we don't have a good rate metric on how fast we're allocating these things. In other words, a straight line in allocatedmb could equally be one extreme of no new containers are being allocated or allocating a bunch of containers where we free exactly what we allocate each time. Adding a resources allocated per second per queue would give us a better insight into the rate of resource churn on a queue. Based on this aggregated resource allocation per queue we can easily have some tools to measure the rate of resource allocation per queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3026) Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp
[ https://issues.apache.org/jira/browse/YARN-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947680#comment-14947680 ] MENG DING commented on YARN-3026: - Hi, [~leftnoteasy] I've got another question while studying this patch. Regarding the following code change: {code} @@ -1106,6 +958,11 @@ private Resource computeUserLimit(FiCaSchedulerApp application, queueCapacities.getAbsoluteCapacity(nodePartition), minimumAllocation); +// Assume we have required resource equals to minimumAllocation, this can +// make sure user limit can continuously increase till queueMaxResource +// reached. +Resource required = minimumAllocation; + {code} Before this patch, the required resource is passed into the {{computeUserLimit}} function as a parameter, indicating the actual resource requirement. Now it is always set to minimumAllocation. I understand that the leaf queue won't know the required resource any more since the patch moves the application specific logic out of the {{LeafQueue.java}}, but the fact that the required resource can be set to some arbitrary value seems quite odd. More specifically, it seems that the *required* resource only applies when the queue is over capacity when calculating the userLimit, but I am not sure how useful this userLimit is under this circumstance (i.e., over capacity situation)? I know it is used to calculate application headroom, but this headroom is not being checked for resource allocation (which only checks the {{ResourceLimits.headroom}} set in {{AbstractCSQueue.canAssignToThisQueue}}). Not sure if I've made myself clear... Thanks in advance for shedding some light on this :-) > Move application-specific container allocation logic from LeafQueue to > FiCaSchedulerApp > --- > > Key: YARN-3026 > URL: https://issues.apache.org/jira/browse/YARN-3026 > Project: Hadoop YARN > Issue Type: Task > Components: capacityscheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Fix For: 2.8.0 > > Attachments: YARN-3026.1.patch, YARN-3026.2.patch, YARN-3026.3.patch, > YARN-3026.4.patch, YARN-3026.5.patch, YARN-3026.6.patch > > > Have a discussion with [~vinodkv] and [~jianhe]: > In existing Capacity Scheduler, all allocation logics of and under LeafQueue > are located in LeafQueue.java in implementation. To make a cleaner scope of > LeafQueue, we'd better move some of them to FiCaSchedulerApp. > Ideal scope of LeafQueue should be: when a LeafQueue receives some resources > from ParentQueue (like 15% of cluster resource), and it distributes resources > to children apps, and it should be agnostic to internal logic of children > apps (like delayed-scheduling, etc.). IAW, LeafQueue shouldn't decide how > application allocating container from given resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4235) FairScheduler PrimaryGroup does not handle empty groups returned for a user
[ https://issues.apache.org/jira/browse/YARN-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-4235: Attachment: YARN-4235.001.patch Handle empty groups > FairScheduler PrimaryGroup does not handle empty groups returned for a user > > > Key: YARN-4235 > URL: https://issues.apache.org/jira/browse/YARN-4235 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-4235.001.patch > > > We see NPE if empty groups are returned for a user. This causes a NPE and > cause RM to crash as below > {noformat} > 2015-09-22 16:51:52,780 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type APP_ADDED to the scheduler > java.lang.IndexOutOfBoundsException: Index: 0 > at java.util.Collections$EmptyList.get(Collections.java:3212) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule$PrimaryGroup.getQueueForApp(QueuePlacementRule.java:149) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule.assignAppToQueue(QueuePlacementRule.java:74) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementPolicy.assignAppToQueue(QueuePlacementPolicy.java:167) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:689) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:595) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1180) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) > at java.lang.Thread.run(Thread.java:745) > 2015-09-22 16:51:52,797 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.
[ https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947641#comment-14947641 ] Jason Lowe edited comment on YARN-3943 at 10/7/15 9:36 PM: --- Thanks for the patch, [~zxu]! I think this patch causes backwards compatibility problems for users who have increased the max-disk-utilization-per-disk-percentage from the default. For example, if someone changed it to 100 and only wants disks to be treated full if they actually fill up completely then this patch will silently make those disks unusable once they fill until they free up at least 10% of their space. That's not going to be good if the cluster runs the disks over 90% most of the time, as it will effectively take the disk out of rotation for a long, long time. A more compatible change would be to _not_ set a value in yarn-default for this property. If the property is set then we use it, but if we get a null for it then we know it isn't specified and we should just use the same threshold for low as for high. Shouldn't we be capping the max of diskUtilizationPercentageCutoffLow based on diskUtilizationPercentageCutoffHigh rather than 100.0? Nit: I think using Math.max / Math.min could make the range capping more readable (certainly less redundant with the identifiers and constants), but it's not must-fix. Nit: "Use" should be "Using" in the following log message. "Use" implies we are giving a directive requiring the user to do it, while "Using" states the code is doing it for them. {code} + if (lowUsableSpacePercentagePerDisk > highUsableSpacePercentagePerDisk) { +LOG.warn("Use " + YarnConfiguration. +NM_MAX_PER_DISK_UTILIZATION_PERCENTAGE + " as " + +YarnConfiguration.NM_WM_LOW_PER_DISK_UTILIZATION_PERCENTAGE + +", because " + YarnConfiguration. +NM_WM_LOW_PER_DISK_UTILIZATION_PERCENTAGE + +" is not configured properly."); {code} was (Author: jlowe): Thanks for the patch, [~zxu]! I think this patch causes backwards compatibility problems for users who have increased the max-disk-utilization-per-disk-percentage from the default. For example, if someone changed it to 100 and only wants disks to be treated full if they actually fill up completely then this patch will silently make those disks unusable once they fill until they free up at least 10% of their space. That's not going to be good if the cluster runs the disks over 90% most of the time, as it will effectively take the disk out of rotation for a long, long time. A more compatible change would be to _not_ set a value in yarn-default for this property. If the property is set then we use it, but if we get a null for it then we know it isn't specified and we shouldn't just use the same threshold for low as for high. Shouldn't we be capping the max of diskUtilizationPercentageCutoffLow based on diskUtilizationPercentageCutoffHigh rather than 100.0? Nit: I think using Math.max / Math.min could make the range capping more readable (certainly less redundant with the identifiers and constants), but it's not must-fix. Nit: "Use" should be "Using" in the following log message. "Use" implies we are giving a directive requiring the user to do it, while "Using" states the code is doing it for them. {code} + if (lowUsableSpacePercentagePerDisk > highUsableSpacePercentagePerDisk) { +LOG.warn("Use " + YarnConfiguration. +NM_MAX_PER_DISK_UTILIZATION_PERCENTAGE + " as " + +YarnConfiguration.NM_WM_LOW_PER_DISK_UTILIZATION_PERCENTAGE + +", because " + YarnConfiguration. +NM_WM_LOW_PER_DISK_UTILIZATION_PERCENTAGE + +" is not configured properly."); {code} > Use separate threshold configurations for disk-full detection and > disk-not-full detection. > -- > > Key: YARN-3943 > URL: https://issues.apache.org/jira/browse/YARN-3943 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3943.000.patch, YARN-3943.001.patch > > > Use separate threshold configurations to check when disks become full and > when disks become good. Currently the configuration > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" > and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are > used to check both when disks become full and when disks become good. It will > be better to use two configurations: one is used when disks become full from > not-full and the other one is used when disks become not-full from full. So > we can avoid oscillating frequently. > For e
[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.
[ https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947641#comment-14947641 ] Jason Lowe commented on YARN-3943: -- Thanks for the patch, [~zxu]! I think this patch causes backwards compatibility problems for users who have increased the max-disk-utilization-per-disk-percentage from the default. For example, if someone changed it to 100 and only wants disks to be treated full if they actually fill up completely then this patch will silently make those disks unusable once they fill until they free up at least 10% of their space. That's not going to be good if the cluster runs the disks over 90% most of the time, as it will effectively take the disk out of rotation for a long, long time. A more compatible change would be to _not_ set a value in yarn-default for this property. If the property is set then we use it, but if we get a null for it then we know it isn't specified and we shouldn't just use the same threshold for low as for high. Shouldn't we be capping the max of diskUtilizationPercentageCutoffLow based on diskUtilizationPercentageCutoffHigh rather than 100.0? Nit: I think using Math.max / Math.min could make the range capping more readable (certainly less redundant with the identifiers and constants), but it's not must-fix. Nit: "Use" should be "Using" in the following log message. "Use" implies we are giving a directive requiring the user to do it, while "Using" states the code is doing it for them. {code} + if (lowUsableSpacePercentagePerDisk > highUsableSpacePercentagePerDisk) { +LOG.warn("Use " + YarnConfiguration. +NM_MAX_PER_DISK_UTILIZATION_PERCENTAGE + " as " + +YarnConfiguration.NM_WM_LOW_PER_DISK_UTILIZATION_PERCENTAGE + +", because " + YarnConfiguration. +NM_WM_LOW_PER_DISK_UTILIZATION_PERCENTAGE + +" is not configured properly."); {code} > Use separate threshold configurations for disk-full detection and > disk-not-full detection. > -- > > Key: YARN-3943 > URL: https://issues.apache.org/jira/browse/YARN-3943 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3943.000.patch, YARN-3943.001.patch > > > Use separate threshold configurations to check when disks become full and > when disks become good. Currently the configuration > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" > and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are > used to check both when disks become full and when disks become good. It will > be better to use two configurations: one is used when disks become full from > not-full and the other one is used when disks become not-full from full. So > we can avoid oscillating frequently. > For example: we can set the one for disk-full detection higher than the one > for disk-not-full detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient
[ https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947620#comment-14947620 ] Sangjin Lee commented on YARN-3367: --- {quote} You mean that flush need not be coupled with the sync call ? if so will we be exposing one more interface in client side ? Felt coupling flush with sync was a better option. {quote} I meant either coupling async with no flush or coupling sync with explicit flush. But I vaguely remember [~djp] was of the opinion that sync doesn't necessarily imply flush. So this requires a more discussion. {quote} Agree with your points only concerned what were issues/points Junping Du had, when he had mentioned in the description as 3. The sequence of events could be out of order because each posting operation thread get out of waiting loop randomly. We should have something like event loop in TimelineClient side, putEntities() only put related entities into a queue of entities and a separated thread handle to deliver entities in queue to collector via REST call. {quote} Yes, understood. I guess I am trying to say that even if we can preserve the order of event writing, there is still no hard guarantee that they will be received by the server in the same order. Therefore, I am of the opinion that we do not need to spend too much effort ensuring this order (would be nice, but would not be fatal if we didn't do it). The server would just need to rely on explicit timestamps if it needs to determine the order of events. > Replace starting a separate thread for post entity with event loop in > TimelineClient > > > Key: YARN-3367 > URL: https://issues.apache.org/jira/browse/YARN-3367 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Junping Du >Assignee: Naganarasimha G R > Attachments: YARN-3367.YARN-2928.001.patch > > > Since YARN-3039, we add loop in TimelineClient to wait for > collectorServiceAddress ready before posting any entity. In consumer of > TimelineClient (like AM), we are starting a new thread for each call to get > rid of potential deadlock in main thread. This way has at least 3 major > defects: > 1. The consumer need some additional code to wrap a thread before calling > putEntities() in TimelineClient. > 2. It cost many thread resources which is unnecessary. > 3. The sequence of events could be out of order because each posting > operation thread get out of waiting loop randomly. > We should have something like event loop in TimelineClient side, > putEntities() only put related entities into a queue of entities and a > separated thread handle to deliver entities in queue to collector via REST > call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4234) New put APIs in TimelineClient for ats v1.5
[ https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-4234: Attachment: YARN-4234.1.patch > New put APIs in TimelineClient for ats v1.5 > --- > > Key: YARN-4234 > URL: https://issues.apache.org/jira/browse/YARN-4234 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4234.1.patch > > > In this ticket, we will add new put APIs in timelineClient to let > clients/applications have the option to use ATS v1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4235) FairScheduler PrimaryGroup does not handle empty groups returned for a user
Anubhav Dhoot created YARN-4235: --- Summary: FairScheduler PrimaryGroup does not handle empty groups returned for a user Key: YARN-4235 URL: https://issues.apache.org/jira/browse/YARN-4235 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot We see NPE if empty groups are returned for a user. This causes a NPE and cause RM to crash as below {noformat} 2015-09-22 16:51:52,780 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ADDED to the scheduler java.lang.IndexOutOfBoundsException: Index: 0 at java.util.Collections$EmptyList.get(Collections.java:3212) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule$PrimaryGroup.getQueueForApp(QueuePlacementRule.java:149) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule.assignAppToQueue(QueuePlacementRule.java:74) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementPolicy.assignAppToQueue(QueuePlacementPolicy.java:167) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:689) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:595) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1180) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) at java.lang.Thread.run(Thread.java:745) 2015-09-22 16:51:52,797 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers
[ https://issues.apache.org/jira/browse/YARN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947315#comment-14947315 ] Bikas Saha commented on YARN-1509: -- Sorry for coming in late on this. I have some questions on the API. Why are there separate methods for increase and decrease instead of a single method to change the container resource size? By comparing the existing resource allocation to a container and the new requested resource allocation, it should be clear whether an increase or decrease is being requested. Also, for completeness, is there a need for a cancelContainerResourceChange()? After a container resource change request has been submitted, what are my options as a user other than to wait for the request to be satisfied by the RM? If I release the container, then does it mean all pending change requests for that container should be removed? From a quick look at the patch, it does not look like that is being covered, unless I am missing something. What will happen if the AM restarts after submitting a change request. Does the AM-RM re-register protocol need an update to handle the case of re-synchronizing on the change requests? Whats happens if the RM restarts? If these are explained in a document, then please point me to the document. The patch did not seem to have anything around this area. So I thought I would ask. Also, why have the callback interface methods been made non-public? Would that be an incompatible change? > Make AMRMClient support send increase container request and get > increased/decreased containers > -- > > Key: YARN-1509 > URL: https://issues.apache.org/jira/browse/YARN-1509 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan (No longer used) >Assignee: MENG DING > Attachments: YARN-1509.1.patch, YARN-1509.2.patch, YARN-1509.3.patch, > YARN-1509.4.patch, YARN-1509.5.patch > > > As described in YARN-1197, we need add API in AMRMClient to support > 1) Add increase request > 2) Can get successfully increased/decreased containers from RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4234) New put APIs in TimelineClient for ats v1.5
[ https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-4234: Description: In this ticket, we will add new put APIs in timelineClient to let clients/applications have the option to use ATS v1.5 (was: In this ticket, we will add new put api in timelineClient to let clients/applications have the option to use ATS v1.5) > New put APIs in TimelineClient for ats v1.5 > --- > > Key: YARN-4234 > URL: https://issues.apache.org/jira/browse/YARN-4234 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > > In this ticket, we will add new put APIs in timelineClient to let > clients/applications have the option to use ATS v1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4234) New put APIs in TimelineClient for ats v1.5
[ https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-4234: Summary: New put APIs in TimelineClient for ats v1.5 (was: New put api in TimelineClient for ats v1.5) > New put APIs in TimelineClient for ats v1.5 > --- > > Key: YARN-4234 > URL: https://issues.apache.org/jira/browse/YARN-4234 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > > In this ticket, we will add new put api in timelineClient to let > clients/applications have the option to use ATS v1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4234) New put api in TimelineClient for ats v1.5
Xuan Gong created YARN-4234: --- Summary: New put api in TimelineClient for ats v1.5 Key: YARN-4234 URL: https://issues.apache.org/jira/browse/YARN-4234 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Xuan Gong Assignee: Xuan Gong In this ticket, we will add new put api in timelineClient to let clients/applications have the option to use ATS v1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient
[ https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947261#comment-14947261 ] Naganarasimha G R commented on YARN-3367: - Hi [~sjlee0] , Thanks for the comments, Went through the comments which had pointed out in YARN-4061. Felt most of them valid. bq. Whether we will do that on the sync side or not, I think we have some flexibility. You mean that flush need not be coupled with the sync call ? if so will we be exposing one more interface in client side ? Felt coupling flush with sync was a better option. bq. As for the timestamps, I am also arguing that the timestamps should always be set explicitly for entities/metrics/events, and that the server should rely on the explicit timestamps, rather than on time of receipt. Agree with your points only concerned what were issues/points [~djp] had, when he had mentioned in the description as bq. 3. The sequence of events could be out of order because each posting operation thread get out of waiting loop randomly. We should have something like event loop in TimelineClient side, putEntities() only put related entities into a queue of entities and a separated thread handle to deliver entities in queue to collector via REST call. > Replace starting a separate thread for post entity with event loop in > TimelineClient > > > Key: YARN-3367 > URL: https://issues.apache.org/jira/browse/YARN-3367 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Junping Du >Assignee: Naganarasimha G R > Attachments: YARN-3367.YARN-2928.001.patch > > > Since YARN-3039, we add loop in TimelineClient to wait for > collectorServiceAddress ready before posting any entity. In consumer of > TimelineClient (like AM), we are starting a new thread for each call to get > rid of potential deadlock in main thread. This way has at least 3 major > defects: > 1. The consumer need some additional code to wrap a thread before calling > putEntities() in TimelineClient. > 2. It cost many thread resources which is unnecessary. > 3. The sequence of events could be out of order because each posting > operation thread get out of waiting loop randomly. > We should have something like event loop in TimelineClient side, > putEntities() only put related entities into a queue of entities and a > separated thread handle to deliver entities in queue to collector via REST > call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4233) YARN Timeline Service plugin: ATS v1.5
[ https://issues.apache.org/jira/browse/YARN-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-4233: --- Assignee: Xuan Gong > YARN Timeline Service plugin: ATS v1.5 > -- > > Key: YARN-4233 > URL: https://issues.apache.org/jira/browse/YARN-4233 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4219) New levelDB cache storage for timeline v1.5
[ https://issues.apache.org/jira/browse/YARN-4219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-4219: Issue Type: Sub-task (was: Bug) Parent: YARN-4233 > New levelDB cache storage for timeline v1.5 > --- > > Key: YARN-4219 > URL: https://issues.apache.org/jira/browse/YARN-4219 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Li Lu >Assignee: Li Lu > > We need to have an "offline" caching storage for timeline server v1.5 after > the changes in YARN-3942. The in memory timeline storage may run into OOM > issues when used as a cache storage for entity file timeline storage. We can > refactor the code and have a level db based caching storage for this use > case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3942) Timeline store to read events from HDFS
[ https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3942: Issue Type: Sub-task (was: Improvement) Parent: YARN-4233 > Timeline store to read events from HDFS > --- > > Key: YARN-3942 > URL: https://issues.apache.org/jira/browse/YARN-3942 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-3942-leveldb.001.patch, > YARN-3942-leveldb.002.patch, YARN-3942.001.patch, YARN-3942.002.patch > > > This adds a new timeline store plugin that is intended as a stop-gap measure > to mitigate some of the issues we've seen with ATS v1 while waiting for ATS > v2. The intent of this plugin is to provide a workable solution for running > the Tez UI against the timeline server on a large-scale clusters running many > thousands of jobs per day. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4233) YARN Timeline Service plugin: ATS v1.5
Xuan Gong created YARN-4233: --- Summary: YARN Timeline Service plugin: ATS v1.5 Key: YARN-4233 URL: https://issues.apache.org/jira/browse/YARN-4233 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Xuan Gong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3996) YARN-789 (Support for zero capabilities in fairscheduler) is broken after YARN-3305
[ https://issues.apache.org/jira/browse/YARN-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947142#comment-14947142 ] Neelesh Srinivas Salian commented on YARN-3996: --- I'll go back and fix those. I know why this happened; the added implementation of incrementAllocationCapability on Fifo and Capacity. Need to do this in a better way. Will update soon. Thank you. > YARN-789 (Support for zero capabilities in fairscheduler) is broken after > YARN-3305 > --- > > Key: YARN-3996 > URL: https://issues.apache.org/jira/browse/YARN-3996 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, fairscheduler >Reporter: Anubhav Dhoot >Assignee: Neelesh Srinivas Salian >Priority: Critical > Attachments: YARN-3996.001.patch, YARN-3996.prelim.patch > > > RMAppManager#validateAndCreateResourceRequest calls into normalizeRequest > with mininumResource for the incrementResource. This causes normalize to > return zero if minimum is set to zero as per YARN-789 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947108#comment-14947108 ] Sangjin Lee commented on YARN-3798: --- Please disregard. I thought the patch was misnamed, but it wasn't. Somehow jenkins checked out trunk for this test. > ZKRMStateStore shouldn't create new session without occurrance of > SESSIONEXPIED > --- > > Key: YARN-3798 > URL: https://issues.apache.org/jira/browse/YARN-3798 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Suse 11 Sp3 >Reporter: Bibin A Chundatt >Assignee: Varun Saxena >Priority: Blocker > Attachments: RM.log, YARN-3798-2.7.002.patch, > YARN-3798-branch-2.6.01.patch, YARN-3798-branch-2.7.002.patch, > YARN-3798-branch-2.7.003.patch, YARN-3798-branch-2.7.004.patch, > YARN-3798-branch-2.7.005.patch, YARN-3798-branch-2.7.006.patch, > YARN-3798-branch-2.7.patch > > > RM going down with NoNode exception during create of znode for appattempt > *Please find the exception logs* > {code} > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-09 10:09:44,886 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Exception while executing a ZK operation. > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) > at java.lang.Thread.run(Thread.java:745) > 2015-06-09 10:09:44,887 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed > out ZK retries. Giving up! > 2015-06-09 10:09:44,887 ERROR > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error > updating appAttempt: appattempt_1433764310492_7152_01 > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.ZooKeeper.multiInternal(Zo
[jira] [Commented] (YARN-4221) Store user in app to flow table
[ https://issues.apache.org/jira/browse/YARN-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947070#comment-14947070 ] Hadoop QA commented on YARN-4221: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 52s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 2s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 7s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 16s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 48s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 47s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 55s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 3m 43s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 42m 0s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12765396/YARN-4221-YARN-2928.02.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 5a3db96 | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/9372/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9372/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9372/console | This message was automatically generated. > Store user in app to flow table > --- > > Key: YARN-4221 > URL: https://issues.apache.org/jira/browse/YARN-4221 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4221-YARN-2928.01.patch, > YARN-4221-YARN-2928.02.patch > > > We should store user as well in in app to flow table. > For queries where user is not supplied and flow context can be retrieved from > app to flow table, we should take the user from app to flow table instead of > considering UGI as default user. > This is as per discussion on YARN-3864 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers
[ https://issues.apache.org/jira/browse/YARN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947056#comment-14947056 ] MENG DING commented on YARN-1509: - Test failure is not related to this patch. Checkstyle warning is the same as before. > Make AMRMClient support send increase container request and get > increased/decreased containers > -- > > Key: YARN-1509 > URL: https://issues.apache.org/jira/browse/YARN-1509 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan (No longer used) >Assignee: MENG DING > Attachments: YARN-1509.1.patch, YARN-1509.2.patch, YARN-1509.3.patch, > YARN-1509.4.patch, YARN-1509.5.patch > > > As described in YARN-1197, we need add API in AMRMClient to support > 1) Add increase request > 2) Can get successfully increased/decreased containers from RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers
[ https://issues.apache.org/jira/browse/YARN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947025#comment-14947025 ] Hadoop QA commented on YARN-1509: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 52s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 8m 3s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 28s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 19s | The applied patch generated 1 release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 28s | The applied patch generated 5 new checkstyle issues (total was 79, now 78). | | {color:green}+1{color} | whitespace | 0m 8s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 52s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 7m 23s | Tests failed in hadoop-yarn-client. | | | | 46m 45s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.client.api.impl.TestYarnClient | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12765393/YARN-1509.5.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 61b3547 | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/9371/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9371/artifact/patchprocess/diffcheckstylehadoop-yarn-client.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/9371/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9371/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9371/console | This message was automatically generated. > Make AMRMClient support send increase container request and get > increased/decreased containers > -- > > Key: YARN-1509 > URL: https://issues.apache.org/jira/browse/YARN-1509 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan (No longer used) >Assignee: MENG DING > Attachments: YARN-1509.1.patch, YARN-1509.2.patch, YARN-1509.3.patch, > YARN-1509.4.patch, YARN-1509.5.patch > > > As described in YARN-1197, we need add API in AMRMClient to support > 1) Add increase request > 2) Can get successfully increased/decreased containers from RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.
[ https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947014#comment-14947014 ] zhihai xu commented on YARN-3943: - Hi [~jlowe], Could you help review the patch thanks? > Use separate threshold configurations for disk-full detection and > disk-not-full detection. > -- > > Key: YARN-3943 > URL: https://issues.apache.org/jira/browse/YARN-3943 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3943.000.patch, YARN-3943.001.patch > > > Use separate threshold configurations to check when disks become full and > when disks become good. Currently the configuration > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" > and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are > used to check both when disks become full and when disks become good. It will > be better to use two configurations: one is used when disks become full from > not-full and the other one is used when disks become not-full from full. So > we can avoid oscillating frequently. > For example: we can set the one for disk-full detection higher than the one > for disk-not-full detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4221) Store user in app to flow table
[ https://issues.apache.org/jira/browse/YARN-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-4221: --- Attachment: YARN-4221-YARN-2928.02.patch > Store user in app to flow table > --- > > Key: YARN-4221 > URL: https://issues.apache.org/jira/browse/YARN-4221 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4221-YARN-2928.01.patch, > YARN-4221-YARN-2928.02.patch > > > We should store user as well in in app to flow table. > For queries where user is not supplied and flow context can be retrieved from > app to flow table, we should take the user from app to flow table instead of > considering UGI as default user. > This is as per discussion on YARN-3864 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers
[ https://issues.apache.org/jira/browse/YARN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MENG DING updated YARN-1509: Attachment: YARN-1509.5.patch Thanks [~leftnoteasy]. Attaching the patch that addresses the comments. > Make AMRMClient support send increase container request and get > increased/decreased containers > -- > > Key: YARN-1509 > URL: https://issues.apache.org/jira/browse/YARN-1509 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan (No longer used) >Assignee: MENG DING > Attachments: YARN-1509.1.patch, YARN-1509.2.patch, YARN-1509.3.patch, > YARN-1509.4.patch, YARN-1509.5.patch > > > As described in YARN-1197, we need add API in AMRMClient to support > 1) Add increase request > 2) Can get successfully increased/decreased containers from RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4232) TopCLI console shows exceptions for help command
[ https://issues.apache.org/jira/browse/YARN-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946658#comment-14946658 ] Bibin A Chundatt commented on YARN-4232: Testcase failure is not related this jira patch > TopCLI console shows exceptions for help command > - > > Key: YARN-4232 > URL: https://issues.apache.org/jira/browse/YARN-4232 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Attachments: 0001-YARN-4232.patch > > > *Steps to reproduce* > Start Top command in YARN in HA mode > ./yarn top > {noformat} > usage: yarn top > -cols Number of columns on the terminal > -delay The refresh delay(in seconds), default is 3 seconds > -help Print usage; for help while the tool is running press 'h' > + Enter > -queuesComma separated list of queues to restrict applications > -rows Number of rows on the terminal > -types Comma separated list of types to restrict applications, > case sensitive(though the display is lower case) > -users Comma separated list of users to restrict applications > {noformat} > Execute *for help while the tool is running press 'h' + Enter* while top > tool is running > Exception is thrown in console continuously > {noformat} > 15/10/07 14:59:28 ERROR cli.TopCLI: Could not fetch RM start time > java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:589) > at java.net.Socket.connect(Socket.java:538) > at sun.net.NetworkClient.doConnect(NetworkClient.java:180) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) > at sun.net.www.http.HttpClient.(HttpClient.java:211) > at sun.net.www.http.HttpClient.New(HttpClient.java:308) > at sun.net.www.http.HttpClient.New(HttpClient.java:326) > at > sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1168) > at > sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1104) > at > sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:998) > at > sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:932) > at > org.apache.hadoop.yarn.client.cli.TopCLI.getRMStartTime(TopCLI.java:742) > at org.apache.hadoop.yarn.client.cli.TopCLI.run(TopCLI.java:467) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at org.apache.hadoop.yarn.client.cli.TopCLI.main(TopCLI.java:420) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4228) FileSystemRMStateStore use IOUtils#close instead of fs#close
[ https://issues.apache.org/jira/browse/YARN-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946648#comment-14946648 ] Hudson commented on YARN-4228: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2403 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2403/]) YARN-4228. FileSystemRMStateStore use IOUtils#close instead of fs#close. (rohithsharmaks: rev 3793cbe4c3cce5d03c4a18d562cbcb7cacd8f743) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * hadoop-yarn-project/CHANGES.txt > FileSystemRMStateStore use IOUtils#close instead of fs#close > > > Key: YARN-4228 > URL: https://issues.apache.org/jira/browse/YARN-4228 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Fix For: 2.8.0 > > Attachments: 0001-YARN-4228.patch > > > NPE on {{FileSystemRMStateStore#closeWithRetries}} when active service > initialization fails on rm start up > {noformat} > 2015-10-05 19:56:38,626 INFO org.apache.hadoop.service.AbstractService: > Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore > failed in state STOPPED; cause: java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:721) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:718) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$FSAction.runWithRetries(FileSystemRMStateStore.java:734) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeWithRetries(FileSystemRMStateStore.java:718) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeInternal(FileSystemRMStateStore.java:169) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStop(RMStateStore.java:618) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.service.AbstractService.close(AbstractService.java:250) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:609) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:965) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:256) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1195) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4209) RMStateStore FENCED state doesn’t work due to updateFencedState called by stateMachine.doTransition
[ https://issues.apache.org/jira/browse/YARN-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946646#comment-14946646 ] Hudson commented on YARN-4209: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2403 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2403/]) YARN-4209. RMStateStore FENCED state doesnât work due to (rohithsharmaks: rev 9156fc60c654e9305411686878acb443f3be1e67) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestMemoryRMStateStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java > RMStateStore FENCED state doesn’t work due to updateFencedState called by > stateMachine.doTransition > --- > > Key: YARN-4209 > URL: https://issues.apache.org/jira/browse/YARN-4209 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.2 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Fix For: 2.7.2 > > Attachments: YARN-4209.000.patch, YARN-4209.001.patch, > YARN-4209.002.patch, YARN-4209.branch-2.7.patch > > > RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by > {{stateMachine.doTransition}}. The reason is > {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded > in {{stateMachine.doTransition}} called from public > API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So > right after the internal state transition from {{updateFencedState}} changes > the state to FENCED state, the external state transition changes the state > back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE > state even after {{notifyStoreOperationFailed}} is called. The only working > case for FENCED state is {{notifyStoreOperationFailed}} called from > {{ZKRMStateStore#VerifyActiveStatusThread}}. > For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter > external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => > {{notifyStoreOperationFailed}} > =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal > {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} > change state to FENCED => exit external {{stateMachine.doTransition}} change > state to ACTIVE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4232) TopCLI console shows exceptions for help command
[ https://issues.apache.org/jira/browse/YARN-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946598#comment-14946598 ] Hadoop QA commented on YARN-4232: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 10s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 8m 6s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 28s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 19s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 31s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 58s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 8m 0s | Tests failed in hadoop-yarn-client. | | | | 47m 46s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.client.TestApplicationClientProtocolOnHA | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12765343/0001-YARN-4232.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 3112f26 | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/9370/artifact/patchprocess/patchReleaseAuditProblems.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/9370/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9370/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9370/console | This message was automatically generated. > TopCLI console shows exceptions for help command > - > > Key: YARN-4232 > URL: https://issues.apache.org/jira/browse/YARN-4232 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Attachments: 0001-YARN-4232.patch > > > *Steps to reproduce* > Start Top command in YARN in HA mode > ./yarn top > {noformat} > usage: yarn top > -cols Number of columns on the terminal > -delay The refresh delay(in seconds), default is 3 seconds > -help Print usage; for help while the tool is running press 'h' > + Enter > -queuesComma separated list of queues to restrict applications > -rows Number of rows on the terminal > -types Comma separated list of types to restrict applications, > case sensitive(though the display is lower case) > -users Comma separated list of users to restrict applications > {noformat} > Execute *for help while the tool is running press 'h' + Enter* while top > tool is running > Exception is thrown in console continuously > {noformat} > 15/10/07 14:59:28 ERROR cli.TopCLI: Could not fetch RM start time > java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:589) > at java.net.Socket.connect(Socket.java:538) > at sun.net.NetworkClient.doConnect(NetworkClient.java:180) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) > at sun.net.www.http.HttpClient.(HttpClient.java:211) > at sun.net.www.http.HttpClient.New(HttpClient.java:308) > at sun.net.www.http.HttpClient.New(HttpClient.java:326) > at > sun.net.www.protocol
[jira] [Commented] (YARN-4009) CORS support for ResourceManager REST API
[ https://issues.apache.org/jira/browse/YARN-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946540#comment-14946540 ] Varun Vasudev commented on YARN-4009: - [~jeagles] - just to make sure we're on the same page - you're suggesting that if we find the HttpCrossOriginFilterInitializer(which is present in hadoop-common) in the filter initializers list, we should just remove it from the initializers list and use the timeline server configurations instead? > CORS support for ResourceManager REST API > - > > Key: YARN-4009 > URL: https://issues.apache.org/jira/browse/YARN-4009 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Prakash Ramachandran >Assignee: Varun Vasudev > Attachments: YARN-4009.001.patch, YARN-4009.002.patch, > YARN-4009.003.patch, YARN-4009.004.patch, YARN-4009.005.patch, > YARN-4009.006.patch > > > Currently the REST API's do not have CORS support. This means any UI (running > in browser) cannot consume the REST API's. For ex Tez UI would like to use > the REST API for getting application, application attempt information exposed > by the API's. > It would be very useful if CORS is enabled for the REST API's. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4228) FileSystemRMStateStore use IOUtils#close instead of fs#close
[ https://issues.apache.org/jira/browse/YARN-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946513#comment-14946513 ] Hudson commented on YARN-4228: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #464 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/464/]) YARN-4228. FileSystemRMStateStore use IOUtils#close instead of fs#close. (rohithsharmaks: rev 3793cbe4c3cce5d03c4a18d562cbcb7cacd8f743) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java > FileSystemRMStateStore use IOUtils#close instead of fs#close > > > Key: YARN-4228 > URL: https://issues.apache.org/jira/browse/YARN-4228 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Fix For: 2.8.0 > > Attachments: 0001-YARN-4228.patch > > > NPE on {{FileSystemRMStateStore#closeWithRetries}} when active service > initialization fails on rm start up > {noformat} > 2015-10-05 19:56:38,626 INFO org.apache.hadoop.service.AbstractService: > Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore > failed in state STOPPED; cause: java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:721) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:718) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$FSAction.runWithRetries(FileSystemRMStateStore.java:734) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeWithRetries(FileSystemRMStateStore.java:718) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeInternal(FileSystemRMStateStore.java:169) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStop(RMStateStore.java:618) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.service.AbstractService.close(AbstractService.java:250) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:609) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:965) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:256) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1195) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4209) RMStateStore FENCED state doesn’t work due to updateFencedState called by stateMachine.doTransition
[ https://issues.apache.org/jira/browse/YARN-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946512#comment-14946512 ] Hudson commented on YARN-4209: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #464 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/464/]) YARN-4209. RMStateStore FENCED state doesn’t work due to (rohithsharmaks: rev 9156fc60c654e9305411686878acb443f3be1e67) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestMemoryRMStateStore.java > RMStateStore FENCED state doesn’t work due to updateFencedState called by > stateMachine.doTransition > --- > > Key: YARN-4209 > URL: https://issues.apache.org/jira/browse/YARN-4209 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.2 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Fix For: 2.7.2 > > Attachments: YARN-4209.000.patch, YARN-4209.001.patch, > YARN-4209.002.patch, YARN-4209.branch-2.7.patch > > > RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by > {{stateMachine.doTransition}}. The reason is > {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded > in {{stateMachine.doTransition}} called from public > API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So > right after the internal state transition from {{updateFencedState}} changes > the state to FENCED state, the external state transition changes the state > back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE > state even after {{notifyStoreOperationFailed}} is called. The only working > case for FENCED state is {{notifyStoreOperationFailed}} called from > {{ZKRMStateStore#VerifyActiveStatusThread}}. > For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter > external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => > {{notifyStoreOperationFailed}} > =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal > {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} > change state to FENCED => exit external {{stateMachine.doTransition}} change > state to ACTIVE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4228) FileSystemRMStateStore use IOUtils#close instead of fs#close
[ https://issues.apache.org/jira/browse/YARN-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946497#comment-14946497 ] Hudson commented on YARN-4228: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2435 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2435/]) YARN-4228. FileSystemRMStateStore use IOUtils#close instead of fs#close. (rohithsharmaks: rev 3793cbe4c3cce5d03c4a18d562cbcb7cacd8f743) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * hadoop-yarn-project/CHANGES.txt > FileSystemRMStateStore use IOUtils#close instead of fs#close > > > Key: YARN-4228 > URL: https://issues.apache.org/jira/browse/YARN-4228 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Fix For: 2.8.0 > > Attachments: 0001-YARN-4228.patch > > > NPE on {{FileSystemRMStateStore#closeWithRetries}} when active service > initialization fails on rm start up > {noformat} > 2015-10-05 19:56:38,626 INFO org.apache.hadoop.service.AbstractService: > Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore > failed in state STOPPED; cause: java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:721) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:718) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$FSAction.runWithRetries(FileSystemRMStateStore.java:734) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeWithRetries(FileSystemRMStateStore.java:718) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeInternal(FileSystemRMStateStore.java:169) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStop(RMStateStore.java:618) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.service.AbstractService.close(AbstractService.java:250) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:609) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:965) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:256) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1195) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4232) TopCLI console shows exceptions for help command
[ https://issues.apache.org/jira/browse/YARN-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4232: --- Attachment: 0001-YARN-4232.patch Currently HA mode is not supported.For getting RM start time http request gets submitted to default IP port . Clearing screen and showing help message when request is failing. Attaching patch for the same. > TopCLI console shows exceptions for help command > - > > Key: YARN-4232 > URL: https://issues.apache.org/jira/browse/YARN-4232 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Attachments: 0001-YARN-4232.patch > > > *Steps to reproduce* > Start Top command in YARN in HA mode > ./yarn top > {noformat} > usage: yarn top > -cols Number of columns on the terminal > -delay The refresh delay(in seconds), default is 3 seconds > -help Print usage; for help while the tool is running press 'h' > + Enter > -queuesComma separated list of queues to restrict applications > -rows Number of rows on the terminal > -types Comma separated list of types to restrict applications, > case sensitive(though the display is lower case) > -users Comma separated list of users to restrict applications > {noformat} > Execute *for help while the tool is running press 'h' + Enter* while top > tool is running > Exception is thrown in console continuously > {noformat} > 15/10/07 14:59:28 ERROR cli.TopCLI: Could not fetch RM start time > java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:589) > at java.net.Socket.connect(Socket.java:538) > at sun.net.NetworkClient.doConnect(NetworkClient.java:180) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) > at sun.net.www.http.HttpClient.(HttpClient.java:211) > at sun.net.www.http.HttpClient.New(HttpClient.java:308) > at sun.net.www.http.HttpClient.New(HttpClient.java:326) > at > sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1168) > at > sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1104) > at > sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:998) > at > sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:932) > at > org.apache.hadoop.yarn.client.cli.TopCLI.getRMStartTime(TopCLI.java:742) > at org.apache.hadoop.yarn.client.cli.TopCLI.run(TopCLI.java:467) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at org.apache.hadoop.yarn.client.cli.TopCLI.main(TopCLI.java:420) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering
[ https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946465#comment-14946465 ] Varun Saxena commented on YARN-4178: Thanks for the commit [~sjlee0] and others for the reviews > [storage implementation] app id as string in row keys can cause incorrect > ordering > -- > > Key: YARN-4178 > URL: https://issues.apache.org/jira/browse/YARN-4178 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Fix For: YARN-2928 > > Attachments: YARN-4178-YARN-2928.01.patch, > YARN-4178-YARN-2928.02.patch, YARN-4178-YARN-2928.03.patch, > YARN-4178-YARN-2928.04.patch, YARN-4178-YARN-2928.05.patch > > > Currently the app id is used in various places as part of row keys. However, > currently they are treated as strings. This will cause a problem with > ordering when the id portion of the app id rolls over to the next digit. > For example, "app_1234567890_1" will be considered *earlier* than > "app_1234567890_". We should correct this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4232) TopCLI console shows exceptions for help command
Bibin A Chundatt created YARN-4232: -- Summary: TopCLI console shows exceptions for help command Key: YARN-4232 URL: https://issues.apache.org/jira/browse/YARN-4232 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor *Steps to reproduce* Start Top command in YARN in HA mode ./yarn top {noformat} usage: yarn top -cols Number of columns on the terminal -delay The refresh delay(in seconds), default is 3 seconds -help Print usage; for help while the tool is running press 'h' + Enter -queuesComma separated list of queues to restrict applications -rows Number of rows on the terminal -types Comma separated list of types to restrict applications, case sensitive(though the display is lower case) -users Comma separated list of users to restrict applications {noformat} Execute *for help while the tool is running press 'h' + Enter* while top tool is running Exception is thrown in console continuously {noformat} 15/10/07 14:59:28 ERROR cli.TopCLI: Could not fetch RM start time java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at java.net.Socket.connect(Socket.java:538) at sun.net.NetworkClient.doConnect(NetworkClient.java:180) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.(HttpClient.java:211) at sun.net.www.http.HttpClient.New(HttpClient.java:308) at sun.net.www.http.HttpClient.New(HttpClient.java:326) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1168) at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1104) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:998) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:932) at org.apache.hadoop.yarn.client.cli.TopCLI.getRMStartTime(TopCLI.java:742) at org.apache.hadoop.yarn.client.cli.TopCLI.run(TopCLI.java:467) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.yarn.client.cli.TopCLI.main(TopCLI.java:420) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4228) FileSystemRMStateStore use IOUtils#close instead of fs#close
[ https://issues.apache.org/jira/browse/YARN-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946451#comment-14946451 ] Hudson commented on YARN-4228: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #499 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/499/]) YARN-4228. FileSystemRMStateStore use IOUtils#close instead of fs#close. (rohithsharmaks: rev 3793cbe4c3cce5d03c4a18d562cbcb7cacd8f743) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java > FileSystemRMStateStore use IOUtils#close instead of fs#close > > > Key: YARN-4228 > URL: https://issues.apache.org/jira/browse/YARN-4228 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Fix For: 2.8.0 > > Attachments: 0001-YARN-4228.patch > > > NPE on {{FileSystemRMStateStore#closeWithRetries}} when active service > initialization fails on rm start up > {noformat} > 2015-10-05 19:56:38,626 INFO org.apache.hadoop.service.AbstractService: > Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore > failed in state STOPPED; cause: java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:721) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:718) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$FSAction.runWithRetries(FileSystemRMStateStore.java:734) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeWithRetries(FileSystemRMStateStore.java:718) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeInternal(FileSystemRMStateStore.java:169) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStop(RMStateStore.java:618) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.service.AbstractService.close(AbstractService.java:250) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:609) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:965) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:256) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1195) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4209) RMStateStore FENCED state doesn’t work due to updateFencedState called by stateMachine.doTransition
[ https://issues.apache.org/jira/browse/YARN-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946394#comment-14946394 ] Hudson commented on YARN-4209: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2434 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2434/]) YARN-4209. RMStateStore FENCED state doesnât work due to (rohithsharmaks: rev 9156fc60c654e9305411686878acb443f3be1e67) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestMemoryRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java > RMStateStore FENCED state doesn’t work due to updateFencedState called by > stateMachine.doTransition > --- > > Key: YARN-4209 > URL: https://issues.apache.org/jira/browse/YARN-4209 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.2 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Fix For: 2.7.2 > > Attachments: YARN-4209.000.patch, YARN-4209.001.patch, > YARN-4209.002.patch, YARN-4209.branch-2.7.patch > > > RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by > {{stateMachine.doTransition}}. The reason is > {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded > in {{stateMachine.doTransition}} called from public > API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So > right after the internal state transition from {{updateFencedState}} changes > the state to FENCED state, the external state transition changes the state > back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE > state even after {{notifyStoreOperationFailed}} is called. The only working > case for FENCED state is {{notifyStoreOperationFailed}} called from > {{ZKRMStateStore#VerifyActiveStatusThread}}. > For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter > external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => > {{notifyStoreOperationFailed}} > =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal > {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} > change state to FENCED => exit external {{stateMachine.doTransition}} change > state to ACTIVE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4228) FileSystemRMStateStore use IOUtils#close instead of fs#close
[ https://issues.apache.org/jira/browse/YARN-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946392#comment-14946392 ] Hudson commented on YARN-4228: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #1228 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1228/]) YARN-4228. FileSystemRMStateStore use IOUtils#close instead of fs#close. (rohithsharmaks: rev 3793cbe4c3cce5d03c4a18d562cbcb7cacd8f743) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * hadoop-yarn-project/CHANGES.txt > FileSystemRMStateStore use IOUtils#close instead of fs#close > > > Key: YARN-4228 > URL: https://issues.apache.org/jira/browse/YARN-4228 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Fix For: 2.8.0 > > Attachments: 0001-YARN-4228.patch > > > NPE on {{FileSystemRMStateStore#closeWithRetries}} when active service > initialization fails on rm start up > {noformat} > 2015-10-05 19:56:38,626 INFO org.apache.hadoop.service.AbstractService: > Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore > failed in state STOPPED; cause: java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:721) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:718) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$FSAction.runWithRetries(FileSystemRMStateStore.java:734) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeWithRetries(FileSystemRMStateStore.java:718) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeInternal(FileSystemRMStateStore.java:169) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStop(RMStateStore.java:618) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.service.AbstractService.close(AbstractService.java:250) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:609) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:965) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:256) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1195) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4209) RMStateStore FENCED state doesn’t work due to updateFencedState called by stateMachine.doTransition
[ https://issues.apache.org/jira/browse/YARN-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946390#comment-14946390 ] Hudson commented on YARN-4209: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #1228 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1228/]) YARN-4209. RMStateStore FENCED state doesnât work due to (rohithsharmaks: rev 9156fc60c654e9305411686878acb443f3be1e67) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestMemoryRMStateStore.java > RMStateStore FENCED state doesn’t work due to updateFencedState called by > stateMachine.doTransition > --- > > Key: YARN-4209 > URL: https://issues.apache.org/jira/browse/YARN-4209 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.2 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Fix For: 2.7.2 > > Attachments: YARN-4209.000.patch, YARN-4209.001.patch, > YARN-4209.002.patch, YARN-4209.branch-2.7.patch > > > RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by > {{stateMachine.doTransition}}. The reason is > {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded > in {{stateMachine.doTransition}} called from public > API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So > right after the internal state transition from {{updateFencedState}} changes > the state to FENCED state, the external state transition changes the state > back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE > state even after {{notifyStoreOperationFailed}} is called. The only working > case for FENCED state is {{notifyStoreOperationFailed}} called from > {{ZKRMStateStore#VerifyActiveStatusThread}}. > For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter > external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => > {{notifyStoreOperationFailed}} > =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal > {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} > change state to FENCED => exit external {{stateMachine.doTransition}} change > state to ACTIVE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)