[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient

2016-02-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1512#comment-1512
 ] 

Hadoop QA commented on YARN-3367:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 9s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
48s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 26s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 6s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
29s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 38s 
{color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
49s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 
28s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 40s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 18s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 15s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
7s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 53s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 53s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 10s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 10s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 23s 
{color} | {color:red} root: patch generated 9 new + 712 unchanged - 11 fixed = 
721 total (was 723) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 8m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 50s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 13s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 2s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 40s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 26s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 3m 55s {color} 
| 

[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires

2016-02-03 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130206#comment-15130206
 ] 

Jian He commented on YARN-4138:
---

Patch looks good to me overall,
one question for this test case:
After step 6, rmContainer.getLastConfirmedResource() will return 3G, when the 
expire event gets triggered, won't it reset it back to 3G?
{code}
/**
 * 1. Allocate 1 container: containerId2 (1G)
 * 2. Increase resource of containerId2: 1G -> 3G
 * 3. AM acquires the token
 * 4. Increase resource of containerId2 again: 3G -> 6G
 * 5. AM acquires the token
 * 6. AM uses the 1st token to increase the container in NM to 3G
 * 7. AM does NOT use the second token
 * 8. Verify containerId2 eventually uses 1G after token expires
{code}
- I think RMContainerImpl will not receive EXPIRE event at RUNNING state after 
this patch ? if so, we can remove this.
{code}
.addTransition(RMContainerState.RUNNING, RMContainerState.RUNNING,
RMContainerEventType.EXPIRE)
{code}

> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch, 
> YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch, YARN-4138.4.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient

2016-02-03 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130015#comment-15130015
 ] 

Varun Saxena commented on YARN-3367:


testSyncCall is failing again.
I think instead of a fixed sleep period maybe we can sleep in a loop(until 
condition is met) and put an overall timeout for the test case.

I will check the patch in detail.

> Replace starting a separate thread for post entity with event loop in 
> TimelineClient
> 
>
> Key: YARN-3367
> URL: https://issues.apache.org/jira/browse/YARN-3367
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3367-YARN-2928.v1.005.patch, 
> YARN-3367-YARN-2928.v1.006.patch, YARN-3367-YARN-2928.v1.007.patch, 
> YARN-3367-YARN-2928.v1.008.patch, YARN-3367-YARN-2928.v1.009.patch, 
> YARN-3367-feature-YARN-2928.003.patch, 
> YARN-3367-feature-YARN-2928.v1.002.patch, 
> YARN-3367-feature-YARN-2928.v1.004.patch, YARN-3367.YARN-2928.001.patch, 
> sjlee-suggestion.patch
>
>
> Since YARN-3039, we add loop in TimelineClient to wait for 
> collectorServiceAddress ready before posting any entity. In consumer of  
> TimelineClient (like AM), we are starting a new thread for each call to get 
> rid of potential deadlock in main thread. This way has at least 3 major 
> defects:
> 1. The consumer need some additional code to wrap a thread before calling 
> putEntities() in TimelineClient.
> 2. It cost many thread resources which is unnecessary.
> 3. The sequence of events could be out of order because each posting 
> operation thread get out of waiting loop randomly.
> We should have something like event loop in TimelineClient side, 
> putEntities() only put related entities into a queue of entities and a 
> separated thread handle to deliver entities in queue to collector via REST 
> call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3998) Add retry-times to let NM re-launch container when it fails to run

2016-02-03 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130109#comment-15130109
 ] 

Jun Gong commented on YARN-3998:


[~vvasudev], I just attached a new patch to address above problems. Thanks for 
review.

1) When finding container's previous working directory and log directory, just 
locate corresponding files in good directories which could be read/write and 
not full.

2)  Limiting diagnostic message's message to 1 bytes. If the length is 
greater than it, delete the first line whose separator is "\n".

3) After some container retries,  env variable  
*MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX*(DEFAULT_NM_ADMIN_USER_ENV) will be 
expanded to *MALLOC_ARENA_MAX=::*(a lot of ":"). I fixed it 
in *Apps#addToEnvironment*.

> Add retry-times to let NM re-launch container when it fails to run
> --
>
> Key: YARN-3998
> URL: https://issues.apache.org/jira/browse/YARN-3998
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3998.01.patch, YARN-3998.02.patch, 
> YARN-3998.03.patch, YARN-3998.04.patch, YARN-3998.05.patch, YARN-3998.06.patch
>
>
> I'd like to add a field(retry-times) in ContainerLaunchContext. When AM 
> launches containers, it could specify the value. Then NM will re-launch the 
> container 'retry-times' times when it fails to run(e.g.exit code is not 0). 
> It will save a lot of time. It avoids container localization. RM does not 
> need to re-schedule the container. And local files in container's working 
> directory will be left for re-use.(If container have downloaded some big 
> files, it does not need to re-download them when running again.) 
> We find it is useful in systems like Storm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4625) Make ApplicationSubmissionContext and ApplicationSubmissionContextInfo more consistent

2016-02-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130246#comment-15130246
 ] 

Hudson commented on YARN-4625:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9237 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9237/])
YARN-4625. Make ApplicationSubmissionContext and (vvasudev: rev 
1adb64e09bd453f97e83d31b1587079e30b4b274)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ApplicationSubmissionContextInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/LogAggregationContextInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AMBlackListingRequestInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRest.md
* hadoop-yarn-project/CHANGES.txt


> Make ApplicationSubmissionContext and ApplicationSubmissionContextInfo more 
> consistent
> --
>
> Key: YARN-4625
> URL: https://issues.apache.org/jira/browse/YARN-4625
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.9.0
>
> Attachments: YARN-4625.2.patch, YARN-4625.20160121.1.patch, 
> YARN-4625.3.patch
>
>
> There're some differences between ApplicationSubmissionContext and 
> ApplicationSubmissionContextInfo, for example, we can not submit Application 
> with logAggregationContext specified thru RM web Service . We could make them 
> more consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3998) Add retry-times to let NM re-launch container when it fails to run

2016-02-03 Thread Jun Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-3998:
---
Attachment: YARN-3998.06.patch

> Add retry-times to let NM re-launch container when it fails to run
> --
>
> Key: YARN-3998
> URL: https://issues.apache.org/jira/browse/YARN-3998
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3998.01.patch, YARN-3998.02.patch, 
> YARN-3998.03.patch, YARN-3998.04.patch, YARN-3998.05.patch, YARN-3998.06.patch
>
>
> I'd like to add a field(retry-times) in ContainerLaunchContext. When AM 
> launches containers, it could specify the value. Then NM will re-launch the 
> container 'retry-times' times when it fails to run(e.g.exit code is not 0). 
> It will save a lot of time. It avoids container localization. RM does not 
> need to re-schedule the container. And local files in container's working 
> directory will be left for re-use.(If container have downloaded some big 
> files, it does not need to re-download them when running again.) 
> We find it is useful in systems like Storm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3998) Add retry-times to let NM re-launch container when it fails to run

2016-02-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130229#comment-15130229
 ] 

Hadoop QA commented on YARN-3998:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 37s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 23s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 58s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
52s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 50s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 24s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 33s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
36s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 30s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 40s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 4 new + 
522 unchanged - 4 fixed = 526 total (was 526) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
48s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 4 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 19s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 47s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 40s 
{color} | {color:green} 

[jira] [Updated] (YARN-4446) Refactor reader API for better extensibility

2016-02-03 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4446:
---
Attachment: (was: YARN-4446-YARN-2928.03.patch)

> Refactor reader API for better extensibility
> 
>
> Key: YARN-4446
> URL: https://issues.apache.org/jira/browse/YARN-4446
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4446-YARN-2928.01.patch, 
> YARN-4446-YARN-2928.02.patch, YARN-4446-YARN-2928.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4435) Add RM Delegation Token DtFetcher Implementation for DtUtil

2016-02-03 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130380#comment-15130380
 ] 

Steve Loughran commented on YARN-4435:
--

you'll need to add one for the timeline delegation token too —without that you 
can't submit work to a cluster which has ATS enabled. Again, this is a YARN 
service & follows the same lifecycle

> Add RM Delegation Token DtFetcher Implementation for DtUtil
> ---
>
> Key: YARN-4435
> URL: https://issues.apache.org/jira/browse/YARN-4435
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Matthew Paduano
>Assignee: Matthew Paduano
> Attachments: proposed_solution
>
>
> Add a class to yarn project that implements the DtFetcher interface to return 
> a RM delegation token object.  
> I attached a proposed class implementation that does this, but it cannot be 
> added as a patch until the interface is merged in HADOOP-12563



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4635) Add global blacklist tracking for AM container failure.

2016-02-03 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130272#comment-15130272
 ] 

Junping Du commented on YARN-4635:
--

Thanks [~jianhe] for review and comments.
First, I would like to claim an assumption that the blacklist mechanism for AM 
launching is not for tracking nodes that completely not work (unhealthy) but 
tracking nodes that has suspect to fail the AM container due to previous failed 
experience. This is because we already have unhealthy report mechanism to 
report serious issue for NM so here is another one which should have a higher 
bar (as in some sense, AM container is more important than other container) 
according to the history. 
My response will be based on above assumption.
bq. why should below container exit status back list the node ?
This container failure could due to resource congestion (like 
KILLED_EXCEEDED_PMEM) or unknown reason (ABORTED, INVALID) that make this NM 
higher suspect than normal nodes.

bq. For DISKS_FAILED which is considered as global blacklist node in this jira, 
I think in this case, the node will report as unhealthy and RM should remove 
the node already.
Some DISKS_FAILED could happens due to the failed container write disk to full. 
But it could still have other directories available to use by node. It could 
still get launched with normal containers but not suitable to risk AM container.

bq. AMBlackListingRequest contains a boolean flag and a threshold number. Do 
you think it’s ok to just use the threshold number only ? 0 means disabled, and 
numbers larger than 0 means enabled?
If so, it means the job submitter have to understand how many nodes the current 
cluster have and the job parameter should be updated if it get submitted to 
different cluster (with different nodes). IMO, That sounds more complexity to 
users.

> Add global blacklist tracking for AM container failure.
> ---
>
> Key: YARN-4635
> URL: https://issues.apache.org/jira/browse/YARN-4635
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4635-v2.patch, YARN-4635.patch
>
>
> We need a global blacklist in addition to each app’s blacklist to track AM 
> container failures in global 
> affection. That means we need to differentiate the non­-succeed 
> ContainerExitStatus reasoning from 
> NM or more related to App. 
> For more details, please refer the document in YARN-4576.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient

2016-02-03 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130336#comment-15130336
 ] 

Naganarasimha G R commented on YARN-3367:
-

Thanks [~varun_saxena], 
True will update the patch accordingly in a short while

> Replace starting a separate thread for post entity with event loop in 
> TimelineClient
> 
>
> Key: YARN-3367
> URL: https://issues.apache.org/jira/browse/YARN-3367
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3367-YARN-2928.v1.005.patch, 
> YARN-3367-YARN-2928.v1.006.patch, YARN-3367-YARN-2928.v1.007.patch, 
> YARN-3367-YARN-2928.v1.008.patch, YARN-3367-YARN-2928.v1.009.patch, 
> YARN-3367-feature-YARN-2928.003.patch, 
> YARN-3367-feature-YARN-2928.v1.002.patch, 
> YARN-3367-feature-YARN-2928.v1.004.patch, YARN-3367.YARN-2928.001.patch, 
> sjlee-suggestion.patch
>
>
> Since YARN-3039, we add loop in TimelineClient to wait for 
> collectorServiceAddress ready before posting any entity. In consumer of  
> TimelineClient (like AM), we are starting a new thread for each call to get 
> rid of potential deadlock in main thread. This way has at least 3 major 
> defects:
> 1. The consumer need some additional code to wrap a thread before calling 
> putEntities() in TimelineClient.
> 2. It cost many thread resources which is unnecessary.
> 3. The sequence of events could be out of order because each posting 
> operation thread get out of waiting loop randomly.
> We should have something like event loop in TimelineClient side, 
> putEntities() only put related entities into a queue of entities and a 
> separated thread handle to deliver entities in queue to collector via REST 
> call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient

2016-02-03 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130065#comment-15130065
 ] 

Varun Saxena commented on YARN-3367:


A couple of comments.

# When we are stopping the dispatcher, we are saying in log that we are 
draining it but we are not really doing so.
I think we can try to drain the queue on stop and process the async events or 
some sync event sitting in the queue. We would need to do this before we call 
shutdownNow as that will interrupt the thread.
# nit : In TestTimelineClientV2Impl#testSyncCall, we have made an extra call to 
{{client.setSleepBeforeReturn(true);}} which is not required.

> Replace starting a separate thread for post entity with event loop in 
> TimelineClient
> 
>
> Key: YARN-3367
> URL: https://issues.apache.org/jira/browse/YARN-3367
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3367-YARN-2928.v1.005.patch, 
> YARN-3367-YARN-2928.v1.006.patch, YARN-3367-YARN-2928.v1.007.patch, 
> YARN-3367-YARN-2928.v1.008.patch, YARN-3367-YARN-2928.v1.009.patch, 
> YARN-3367-feature-YARN-2928.003.patch, 
> YARN-3367-feature-YARN-2928.v1.002.patch, 
> YARN-3367-feature-YARN-2928.v1.004.patch, YARN-3367.YARN-2928.001.patch, 
> sjlee-suggestion.patch
>
>
> Since YARN-3039, we add loop in TimelineClient to wait for 
> collectorServiceAddress ready before posting any entity. In consumer of  
> TimelineClient (like AM), we are starting a new thread for each call to get 
> rid of potential deadlock in main thread. This way has at least 3 major 
> defects:
> 1. The consumer need some additional code to wrap a thread before calling 
> putEntities() in TimelineClient.
> 2. It cost many thread resources which is unnecessary.
> 3. The sequence of events could be out of order because each posting 
> operation thread get out of waiting loop randomly.
> We should have something like event loop in TimelineClient side, 
> putEntities() only put related entities into a queue of entities and a 
> separated thread handle to deliver entities in queue to collector via REST 
> call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4307) Blacklisted nodes for AM container is not getting displayed in the Web UI

2016-02-03 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130247#comment-15130247
 ] 

Varun Vasudev commented on YARN-4307:
-

+1 for the latest patch. I'll commit this tomorrow if no one objects.

> Blacklisted nodes for AM container is not getting displayed in the Web UI
> -
>
> Key: YARN-4307
> URL: https://issues.apache.org/jira/browse/YARN-4307
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: AppInfoPage.png, RMappAttempt.png, 
> YARN-4307.v1.001.patch, YARN-4307.v1.002.patch, YARN-4307.v1.003.patch, 
> YARN-4307.v1.004.patch, YARN-4307.v1.005.patch, webpage.png, 
> yarn-capacity-scheduler-debug.log
>
>
> In pseudo cluster had 2 NM's  and had launched app with incorrect 
> configuration *./hadoop org.apache.hadoop.mapreduce.SleepJob 
> -Dmapreduce.job.node-label-expression=labelX  
> -Dyarn.app.mapreduce.am.env=JAVA_HOME=/no/jvm/here  -m 5 -mt 1200*.
> First attempt failed and 2nd attempt was launched, but the application was 
> hung. In the scheduler logs found that localhost was blacklisted but in the 
> UI (app& apps listing page) count was shown as zero and as well no hosts 
> listed in the app page. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires

2016-02-03 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131201#comment-15131201
 ] 

MENG DING commented on YARN-4138:
-

The failed tests are not related.

> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch, 
> YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch, YARN-4138.4.patch, 
> YARN-4138.5.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4594) container-executor fails to remove directory tree when chmod required

2016-02-03 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131124#comment-15131124
 ] 

Colin Patrick McCabe commented on YARN-4594:


Thanks, [~jlowe].

> container-executor fails to remove directory tree when chmod required
> -
>
> Key: YARN-4594
> URL: https://issues.apache.org/jira/browse/YARN-4594
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 2.9.0
>
> Attachments: YARN-4594.001.patch, YARN-4594.002.patch, 
> YARN-4594.003.patch, YARN-4594.004.patch
>
>
> test-container-executor.c doesn't work:
> * It assumes that realpath(/bin/ls) will be /bin/ls, whereas it is actually 
> /usr/bin/ls on many systems.
> * The recursive delete logic in container-executor.c fails -- nftw does the 
> wrong thing when confronted with directories with the wrong mode (permission 
> bits), leading to an attempt to run rmdir on a non-empty directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires

2016-02-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131157#comment-15131157
 ] 

Hadoop QA commented on YARN-4138:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 18s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 1 new + 234 unchanged - 2 fixed = 235 total (was 236) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 52s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 13s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 144m 29s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
| JDK v1.7.0_91 Failed junit tests | 

[jira] [Commented] (YARN-4667) RM Admin CLI for refreshNodesResources throws NPE when nothing is configured

2016-02-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131216#comment-15131216
 ] 

Hadoop QA commented on YARN-4667:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
4s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 7s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 17s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 154m 30s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || 

[jira] [Updated] (YARN-4386) refreshNodesGracefully() looks at active RMNode list for recommissioning decommissioned nodes

2016-02-03 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-4386:
--
Attachment: YARN-4386-v2.patch

Updating patch with a test to check if a decommissioned node can ever 
transition to running state by graceful decommissioning process. The test  
TestRMNodeTransitions#testRecommissionNode covers the other case where a node 
can be recommissioned after being in decommissioning state. Since we know that 
only inactiveRMNodes will contain the decommissioned node, the check for such 
in a node in active list is not useful.

> refreshNodesGracefully() looks at active RMNode list for recommissioning 
> decommissioned nodes
> -
>
> Key: YARN-4386
> URL: https://issues.apache.org/jira/browse/YARN-4386
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: graceful
>Affects Versions: 3.0.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Minor
> Attachments: YARN-4386-v1.patch, YARN-4386-v2.patch
>
>
> In refreshNodesGracefully(), during recommissioning, the entryset from 
> getRMNodes() which has only active nodes (RUNNING, DECOMMISSIONING etc.) is 
> used for checking 'decommissioned' nodes which are present in 
> getInactiveRMNodes() map alone. 
> {code}
> for (Entry entry:rmContext.getRMNodes().entrySet()) { 
> .
>  // Recommissioning the nodes
> if (entry.getValue().getState() == NodeState.DECOMMISSIONING
> || entry.getValue().getState() == NodeState.DECOMMISSIONED) {
>   this.rmContext.getDispatcher().getEventHandler()
>   .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3769) Consider user limit when calculating total pending resource for preemption policy in Capacity Scheduler

2016-02-03 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131146#comment-15131146
 ] 

Eric Payne commented on YARN-3769:
--

Thanks [~djp]] I will look into it.

> Consider user limit when calculating total pending resource for preemption 
> policy in Capacity Scheduler
> ---
>
> Key: YARN-3769
> URL: https://issues.apache.org/jira/browse/YARN-3769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 2.7.3
>
> Attachments: YARN-3769-branch-2.002.patch, 
> YARN-3769-branch-2.6.001.patch, YARN-3769-branch-2.7.002.patch, 
> YARN-3769-branch-2.7.003.patch, YARN-3769-branch-2.7.005.patch, 
> YARN-3769-branch-2.7.006.patch, YARN-3769-branch-2.7.007.patch, 
> YARN-3769.001.branch-2.7.patch, YARN-3769.001.branch-2.8.patch, 
> YARN-3769.003.patch, YARN-3769.004.patch, YARN-3769.005.patch
>
>
> We are seeing the preemption monitor preempting containers from queue A and 
> then seeing the capacity scheduler giving them immediately back to queue A. 
> This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient

2016-02-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131212#comment-15131212
 ] 

Hadoop QA commented on YARN-3367:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 37s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
31s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 22s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 54s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
21s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 33s 
{color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
48s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 
22s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 35s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 12s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
55s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 18s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 18s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 4s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 4s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 35s 
{color} | {color:red} root: patch generated 9 new + 711 unchanged - 11 fixed = 
720 total (was 722) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 56s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 8m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 35s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 11s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 58s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 54s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 35s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 4m 43s {color} 
| 

[jira] [Commented] (YARN-4594) container-executor fails to remove directory tree when chmod required

2016-02-03 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130555#comment-15130555
 ] 

Jason Lowe commented on YARN-4594:
--

Thanks for updating the patch!  There's just a couple of remaining bugs, both 
related to remnants from when error codes were negated:
{code}
  ret = recursive_unlink_children(full_path);
  if (ret == ENOENT) {
return 0;
  }
  if (ret != 0) {
fprintf(LOGFILE, "Error while deleting %s: %d (%s)\n",
full_path, -ret, strerror(-ret));
{code}
It's negating ret when it shouldn't at the fprintf call.  Same thing for the 
following instance:
{code}
  if (rmdir(full_path) != 0) {
ret = errno;
if (ret != ENOENT) {
  fprintf(LOGFILE, "Couldn't delete directory %s - %s\n",
  full_path, strerror(-ret));
{code}

It would also be nice to cleanup the whitespace nits, although it's no trouble 
cleaning those up as part of the commit.

> container-executor fails to remove directory tree when chmod required
> -
>
> Key: YARN-4594
> URL: https://issues.apache.org/jira/browse/YARN-4594
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: YARN-4594.001.patch, YARN-4594.002.patch, 
> YARN-4594.003.patch
>
>
> test-container-executor.c doesn't work:
> * It assumes that realpath(/bin/ls) will be /bin/ls, whereas it is actually 
> /usr/bin/ls on many systems.
> * The recursive delete logic in container-executor.c fails -- nftw does the 
> wrong thing when confronted with directories with the wrong mode (permission 
> bits), leading to an attempt to run rmdir on a non-empty directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4665) Asynch submit can lose application submissions

2016-02-03 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130520#comment-15130520
 ] 

Jason Lowe commented on YARN-4665:
--

Wouldn't a REST interface follow the same principal?  I haven't looked at the 
REST API lately, but I'd expect the submission logic to be a POST followed by 
GET polling until the state is ACCEPTED or later.  If the GET results in a 
no-such-app error then the client retries the POST and continues polling.  Yes, 
this is not the most ideal REST interface design, but unless I'm missing 
something it should be functionally equivalent to the RPC path.  In either case 
the client is going to have to do some kind of retry to handle failovers.  Even 
with a synchronous interface we can end up with submissions that appear to fail 
from the client's perspective but actually succeed (because it was successfully 
recorded in the state store before failing to deliver a response to the 
client), so it's not just fire-and-forget from the client's perspective.

> Asynch submit can lose application submissions
> --
>
> Key: YARN-4665
> URL: https://issues.apache.org/jira/browse/YARN-4665
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>
> The change introduced in YARN-514 opens up a hole into which applications can 
> fall and be lost.  Prior to YARN-514, the {{submitApplication()}} call did 
> not complete until the application state was persisted to the state store.  
> After YARN-514, the {{submitApplication()}} call is asynchronous, with the 
> application state being saved later.
> If the state store is slow or unresponsive, it may be that an application's 
> state may not be persisted for quite a while.  During that time, if the RM 
> fails (over), all applications that have not yet been persisted to the state 
> store will be lost.  If the active RM loses ZK connectivity, a significant 
> number of job submissions can pile up before the ZK connection times out, 
> resulting in a large pile of client failures when it finally does.
> This issue is inherent in the design of YARN-514.  I see three solutions:
> 1. Add a WAL to the state store. HBase does it, so we know how to do it. It 
> seems like a heavy solution to the original problem, however.  It's certainly 
> not a trivial change.
> 2. Revert YARN-514 and update the RPC layer to allow a connection to be 
> parked if it's doing something that may take a while. This is a generally 
> useful feature but could be a deep rabbit hole.
> 3. Revert YARN-514 and add back-pressure to the job submission. For example, 
> we set a maximum number of threads that can simultaneously be assigned to 
> handle job submissions.  When that threshold is reached, new job submissions 
> get a try-again-later response. This is also a generally useful feature and 
> should be a fairly constrained set of changes.
> I think the third option is the most approachable.  It's the smallest change, 
> and it adds useful behavior beyond solving the original issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient

2016-02-03 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3367:

Attachment: YARN-3367-YARN-2928.v1.010.patch

Thanks [~varun_saxena] for the comments
bq. I think we can try to drain the queue on stop and process the async events 
or some sync event sitting in the queue. We would need to do this before we 
call shutdownNow as that will interrupt the thread.
I had earlier tried to take care of trying to drain but missed in later 
patches. But IMO we should not wait in definitely as there might be chances 
that server might be down, so what i have done in the patch is to use shutdown 
so that the live workers are not stopped and it waits for 10 seconds and then 
exit. Or if we want to have more sophisticated way then we need to introduce 
some additional logic so that it doesn't get blocked and drains everything. 
Thoughts ?

> Replace starting a separate thread for post entity with event loop in 
> TimelineClient
> 
>
> Key: YARN-3367
> URL: https://issues.apache.org/jira/browse/YARN-3367
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3367-YARN-2928.v1.005.patch, 
> YARN-3367-YARN-2928.v1.006.patch, YARN-3367-YARN-2928.v1.007.patch, 
> YARN-3367-YARN-2928.v1.008.patch, YARN-3367-YARN-2928.v1.009.patch, 
> YARN-3367-YARN-2928.v1.010.patch, YARN-3367-feature-YARN-2928.003.patch, 
> YARN-3367-feature-YARN-2928.v1.002.patch, 
> YARN-3367-feature-YARN-2928.v1.004.patch, YARN-3367.YARN-2928.001.patch, 
> sjlee-suggestion.patch
>
>
> Since YARN-3039, we add loop in TimelineClient to wait for 
> collectorServiceAddress ready before posting any entity. In consumer of  
> TimelineClient (like AM), we are starting a new thread for each call to get 
> rid of potential deadlock in main thread. This way has at least 3 major 
> defects:
> 1. The consumer need some additional code to wrap a thread before calling 
> putEntities() in TimelineClient.
> 2. It cost many thread resources which is unnecessary.
> 3. The sequence of events could be out of order because each posting 
> operation thread get out of waiting loop randomly.
> We should have something like event loop in TimelineClient side, 
> putEntities() only put related entities into a queue of entities and a 
> separated thread handle to deliver entities in queue to collector via REST 
> call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4669) Fix logging statements in resource manager's Application class

2016-02-03 Thread Sidharta Seethana (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana updated YARN-4669:

Attachment: YARN-4669.001.patch

uploaded patch with logging fix.

> Fix logging statements in resource manager's Application class
> --
>
> Key: YARN-4669
> URL: https://issues.apache.org/jira/browse/YARN-4669
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
>Priority: Trivial
> Attachments: YARN-4669.001.patch
>
>
> There seem to be a couple of System.out.println() calls that should be 
> replaced by info/debug logging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3863) Enhance filters in TimelineReader

2016-02-03 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131470#comment-15131470
 ] 

Sangjin Lee commented on YARN-3863:
---

This needs to be redone after YARN-4446, correct?

> Enhance filters in TimelineReader
> -
>
> Key: YARN-3863
> URL: https://issues.apache.org/jira/browse/YARN-3863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3863-feature-YARN-2928.wip.003.patch, 
> YARN-3863-feature-YARN-2928.wip.01.patch, 
> YARN-3863-feature-YARN-2928.wip.02.patch, 
> YARN-3863-feature-YARN-2928.wip.04.patch, 
> YARN-3863-feature-YARN-2928.wip.05.patch
>
>
> Currently filters in timeline reader will return an entity only if all the 
> filter conditions hold true i.e. only AND operation is supported. We can 
> support OR operation for the filters as well. Additionally as primary backend 
> implementation is HBase, we can design our filters in a manner, where they 
> closely resemble HBase Filters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs

2016-02-03 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131476#comment-15131476
 ] 

Sangjin Lee commented on YARN-2005:
---

Would this be a good candidate for backporting to 2.6.x and 2.7.x? [~adhoot], 
thoughts?

> Blacklisting support for scheduling AMs
> ---
>
> Key: YARN-2005
> URL: https://issues.apache.org/jira/browse/YARN-2005
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
> YARN-2005.003.patch, YARN-2005.004.patch, YARN-2005.005.patch, 
> YARN-2005.006.patch, YARN-2005.006.patch, YARN-2005.007.patch, 
> YARN-2005.008.patch, YARN-2005.009.patch
>
>
> It would be nice if the RM supported blacklisting a node for an AM launch 
> after the same node fails a configurable number of AM attempts.  This would 
> be similar to the blacklisting support for scheduling task attempts in the 
> MapReduce AM but for scheduling AM attempts on the RM side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4669) Fix logging statements in resource manager's Application class

2016-02-03 Thread Sidharta Seethana (JIRA)
Sidharta Seethana created YARN-4669:
---

 Summary: Fix logging statements in resource manager's Application 
class
 Key: YARN-4669
 URL: https://issues.apache.org/jira/browse/YARN-4669
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
Priority: Trivial


There seem to be a couple of System.out.println() calls that should be replaced 
by info/debug logging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4386) refreshNodesGracefully() looks at active RMNode list for recommissioning decommissioned nodes

2016-02-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131390#comment-15131390
 ] 

Hadoop QA commented on YARN-4386:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 43s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 50s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 21s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
25s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 152m 44s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || 

[jira] [Commented] (YARN-3669) Attempt-failures validatiy interval should have a global admin configurable lower limit

2016-02-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131361#comment-15131361
 ] 

Hadoop QA commented on YARN-3669:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 15s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 39s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
41s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
37s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 35s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 3s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 0s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 16s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 16s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
36s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
36s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 4s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 2s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 38s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} 

[jira] [Commented] (YARN-4409) Fix javadoc and checkstyle issues in timelineservice code

2016-02-03 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131464#comment-15131464
 ] 

Sangjin Lee commented on YARN-4409:
---

Hi [~varun_saxena], could you refresh this patch to apply cleanly on the 
branch? Thanks.

> Fix javadoc and checkstyle issues in timelineservice code
> -
>
> Key: YARN-4409
> URL: https://issues.apache.org/jira/browse/YARN-4409
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4409-YARN-2928.wip.01.patch
>
>
> There are a large number of javadoc and checkstyle issues currently open in 
> timelineservice code. We need to fix them before we merge it into trunk.
> Refer to 
> https://issues.apache.org/jira/browse/YARN-3862?focusedCommentId=15035267=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15035267
> We still have 94 open checkstyle issues and javadocs failing for Java 8.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4670) add logging when a node is AM-blacklisted

2016-02-03 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-4670:
-

 Summary: add logging when a node is AM-blacklisted
 Key: YARN-4670
 URL: https://issues.apache.org/jira/browse/YARN-4670
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.8.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Trivial


Today there is not much logging happening when a node is blacklisted for an AM 
(see YARN-2005). We can add a little more logging to see this activity easily 
from the RM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4502) Fix two AM containers get allocated when AM restart

2016-02-03 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-4502:
--
Target Version/s: 2.7.3, 2.6.5  (was: 2.6.5)

> Fix two AM containers get allocated when AM restart
> ---
>
> Key: YARN-4502
> URL: https://issues.apache.org/jira/browse/YARN-4502
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-4502-20160114.txt, YARN-4502-20160212.txt
>
>
> Scenario : 
> * set yarn.resourcemanager.am.max-attempts = 2
> * start dshell application
> {code}
>  yarn  org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
> hadoop-yarn-applications-distributedshell-*.jar 
> -attempt_failures_validity_interval 6 -shell_command "sleep 150" 
> -num_containers 16
> {code}
> * Kill AM pid
> * Print container list for 2nd attempt
> {code}
> yarn container -list appattempt_1450825622869_0001_02
> INFO impl.TimelineClientImpl: Timeline service address: 
> http://xxx:port/ws/v1/timeline/
> INFO client.RMProxy: Connecting to ResourceManager at xxx/10.10.10.10:
> Total number of containers :2
> Container-Id Start Time Finish Time   
> StateHost   Node Http Address 
>LOG-URL
> container_e12_1450825622869_0001_02_02 Tue Dec 22 23:07:35 + 2015 
>   N/A RUNNINGxxx:25454   http://xxx:8042 
> http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_02/hrt_qa
> container_e12_1450825622869_0001_02_01 Tue Dec 22 23:07:34 + 2015 
>   N/A RUNNINGxxx:25454   http://xxx:8042 
> http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_01/hrt_qa
> {code}
> * look for new AM pid 
> Here, 2nd AM container was suppose to be started on  
> container_e12_1450825622869_0001_02_01. But AM was not launched on 
> container_e12_1450825622869_0001_02_01. It was in AQUIRED state. 
> On other hand, container_e12_1450825622869_0001_02_02 got the AM running. 
> Expected behavior: RM should not start 2 containers for starting AM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4386) refreshNodesGracefully() looks at active RMNode list for recommissioning decommissioned nodes

2016-02-03 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131526#comment-15131526
 ] 

Kuhu Shukla commented on YARN-4386:
---

bq. Updating patch with a test to check if a decommissioned node can ever 
transition to running state by graceful decommissioning process. The test 
TestRMNodeTransitions#testRecommissionNode covers the other case where a node 
can be recommissioned after being in decommissioning state. Since we know that 
only inactiveRMNodes will contain the decommissioned node, the check for such 
in a node in active list is not useful.

[~djp], [~sunilg] Request for comments/review. Thanks a lot!

> refreshNodesGracefully() looks at active RMNode list for recommissioning 
> decommissioned nodes
> -
>
> Key: YARN-4386
> URL: https://issues.apache.org/jira/browse/YARN-4386
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: graceful
>Affects Versions: 3.0.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Minor
> Attachments: YARN-4386-v1.patch, YARN-4386-v2.patch
>
>
> In refreshNodesGracefully(), during recommissioning, the entryset from 
> getRMNodes() which has only active nodes (RUNNING, DECOMMISSIONING etc.) is 
> used for checking 'decommissioned' nodes which are present in 
> getInactiveRMNodes() map alone. 
> {code}
> for (Entry entry:rmContext.getRMNodes().entrySet()) { 
> .
>  // Recommissioning the nodes
> if (entry.getValue().getState() == NodeState.DECOMMISSIONING
> || entry.getValue().getState() == NodeState.DECOMMISSIONED) {
>   this.rmContext.getDispatcher().getEventHandler()
>   .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4670) add logging when a node is AM-blacklisted

2016-02-03 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131643#comment-15131643
 ] 

Naganarasimha G R commented on YARN-4670:
-

hi [~sjlee0],
To an extent now we wil be able to find it out after YARN-4307 and YARN-3946, 
and also in the trunk code i am able to see a debug log for the same in 
{{SchedulerAppUtils.isBlackListed}}, anything more is planned for this ?

> add logging when a node is AM-blacklisted
> -
>
> Key: YARN-4670
> URL: https://issues.apache.org/jira/browse/YARN-4670
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Trivial
>
> Today there is not much logging happening when a node is blacklisted for an 
> AM (see YARN-2005). We can add a little more logging to see this activity 
> easily from the RM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4669) Fix logging statements in resource manager's Application class

2016-02-03 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131600#comment-15131600
 ] 

Sidharta Seethana commented on YARN-4669:
-

Test failures seem unrelated.

> Fix logging statements in resource manager's Application class
> --
>
> Key: YARN-4669
> URL: https://issues.apache.org/jira/browse/YARN-4669
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
>Priority: Trivial
> Attachments: YARN-4669.001.patch
>
>
> There seem to be a couple of System.out.println() calls that should be 
> replaced by info/debug logging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4667) RM Admin CLI for refreshNodesResources throws NPE when nothing is configured

2016-02-03 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131654#comment-15131654
 ] 

Naganarasimha G R commented on YARN-4667:
-

{{TestClientRMTokens}} and {{TestAMAuthorization}} are already tracked in other 
jiras...

> RM Admin CLI for refreshNodesResources throws NPE when nothing is configured
> 
>
> Key: YARN-4667
> URL: https://issues.apache.org/jira/browse/YARN-4667
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4667.v1.001.patch
>
>
> {quote}
> $ ./yarn rmadmin -refreshNodesResources
> 16/02/03 10:54:27 INFO client.RMProxy: Connecting to ResourceManager at 
> /0.0.0.0:8033
> refreshNodesResources: java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshNodesResources(AdminService.java:655)
>   at 
> org.apache.hadoop.yarn.server.api.impl.pb.service.ResourceManagerAdministrationProtocolPBServiceImpl.refreshNodesResources(ResourceManagerAdministrationProtocolPBServiceImpl.java:246)
>   at 
> org.apache.hadoop.yarn.proto.ResourceManagerAdministrationProtocol$ResourceManagerAdministrationProtocolService$2.callBlockingMethod(ResourceManagerAdministrationProtocol.java:287)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4669) Fix logging statements in resource manager's Application class

2016-02-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131525#comment-15131525
 ] 

Hadoop QA commented on YARN-4669:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
4s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 47s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 17s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
20s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 142m 23s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || 

[jira] [Updated] (YARN-4409) Fix javadoc and checkstyle issues in timelineservice code

2016-02-03 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4409:
---
Attachment: YARN-4409-YARN-2928.01.patch

> Fix javadoc and checkstyle issues in timelineservice code
> -
>
> Key: YARN-4409
> URL: https://issues.apache.org/jira/browse/YARN-4409
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4409-YARN-2928.01.patch, 
> YARN-4409-YARN-2928.wip.01.patch
>
>
> There are a large number of javadoc and checkstyle issues currently open in 
> timelineservice code. We need to fix them before we merge it into trunk.
> Refer to 
> https://issues.apache.org/jira/browse/YARN-3862?focusedCommentId=15035267=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15035267
> We still have 94 open checkstyle issues and javadocs failing for Java 8.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4446) Refactor reader API for better extensibility

2016-02-03 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131564#comment-15131564
 ] 

Varun Saxena commented on YARN-4446:


Thanks [~sjlee0] for the review and commit.

> Refactor reader API for better extensibility
> 
>
> Key: YARN-4446
> URL: https://issues.apache.org/jira/browse/YARN-4446
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Fix For: YARN-2928
>
> Attachments: YARN-4446-YARN-2928.01.patch, 
> YARN-4446-YARN-2928.02.patch, YARN-4446-YARN-2928.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4409) Fix javadoc and checkstyle issues in timelineservice code

2016-02-03 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131568#comment-15131568
 ] 

Varun Saxena commented on YARN-4409:


Yes, will do so.
I have the patch ready.

> Fix javadoc and checkstyle issues in timelineservice code
> -
>
> Key: YARN-4409
> URL: https://issues.apache.org/jira/browse/YARN-4409
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4409-YARN-2928.wip.01.patch
>
>
> There are a large number of javadoc and checkstyle issues currently open in 
> timelineservice code. We need to fix them before we merge it into trunk.
> Refer to 
> https://issues.apache.org/jira/browse/YARN-3862?focusedCommentId=15035267=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15035267
> We still have 94 open checkstyle issues and javadocs failing for Java 8.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4635) Add global blacklist tracking for AM container failure.

2016-02-03 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131632#comment-15131632
 ] 

Jian He commented on YARN-4635:
---

bq. Some DISKS_FAILED could happens due to the failed container write disk to 
full. But it could still have other directories available to use by node. It 
could still get launched with normal containers but not suitable to risk AM 
container.
In current code, the DISKS_FAILED status is set when this condition is true
{code}
  if (!dirsHandler.areDisksHealthy()) {
ret = ContainerExitStatus.DISKS_FAILED;
throw new IOException("Most of the disks failed. "
+ dirsHandler.getDisksHealthReport(false));
  }
{code}
The same check {{dirsHandler.areDisksHealthy}} is used by DiskHealth monitor. 
{code}
  boolean isHealthy() {
boolean scriptHealthStatus = (nodeHealthScriptRunner == null) ? true
: nodeHealthScriptRunner.isHealthy();
return scriptHealthStatus && dirsHandler.areDisksHealthy();
  }
{code}
Essentially, if this condition is false, the node will be reported as unhealthy 
in the first place, which makes RM remove the node. And the global blacklisted 
becomes not useful in practice because the node is already removed. Maybe I 
missed something, a unit test can prove this.

bq. If so, it means the job submitter have to understand how many nodes the 
current cluster have 
Sorry, I don't understand why job submitter needs to understand the number of 
nodes. what I meant is that, right now a boolean flag(false) is used to 
indicate that this feature is disabled. alternatively,  a 0 threshold can 
achieve the same result (with logic change on RM side).  I said this because I 
feel the API may look simpler and we don't need a separate nested 
AMBlackListingRequest class. Having the threshold set in submissionContext will 
be enough. But I don't have strong opinion on this. Current way is ok too.

> Add global blacklist tracking for AM container failure.
> ---
>
> Key: YARN-4635
> URL: https://issues.apache.org/jira/browse/YARN-4635
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4635-v2.patch, YARN-4635.patch
>
>
> We need a global blacklist in addition to each app’s blacklist to track AM 
> container failures in global 
> affection. That means we need to differentiate the non­-succeed 
> ContainerExitStatus reasoning from 
> NM or more related to App. 
> For more details, please refer the document in YARN-4576.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4594) container-executor fails to remove directory tree when chmod required

2016-02-03 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated YARN-4594:
---
Attachment: YARN-4594.004.patch

Sigh.  The negation is a really hard habit to break... it's the pattern for how 
errors are handled in the kernel.  This should fix it.

I also changed it to use "fullpath" when printing error messages, to make it 
easier to figure out which file had a problem.

Fixed the whitespace nits as well.

Thanks

> container-executor fails to remove directory tree when chmod required
> -
>
> Key: YARN-4594
> URL: https://issues.apache.org/jira/browse/YARN-4594
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: YARN-4594.001.patch, YARN-4594.002.patch, 
> YARN-4594.003.patch, YARN-4594.004.patch
>
>
> test-container-executor.c doesn't work:
> * It assumes that realpath(/bin/ls) will be /bin/ls, whereas it is actually 
> /usr/bin/ls on many systems.
> * The recursive delete logic in container-executor.c fails -- nftw does the 
> wrong thing when confronted with directories with the wrong mode (permission 
> bits), leading to an attempt to run rmdir on a non-empty directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4594) container-executor fails to remove directory tree when chmod required

2016-02-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130640#comment-15130640
 ] 

Hadoop QA commented on YARN-4594:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
49s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 7s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 48s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 17s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
16s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 29m 6s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12786037/YARN-4594.004.patch |
| JIRA Issue | YARN-4594 |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux 5021c6c9ce40 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 1adb64e |
| Default Java | 1.7.0_91 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_66 
/usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 |
| JDK v1.7.0_91  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/10482/testReport/ |
| modules | C:  
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
  U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Max memory used | 77MB |
| Powered by | Apache Yetus 0.2.0-SNAPSHOT   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10482/console |


This message was automatically generated.



> container-executor 

[jira] [Updated] (YARN-4409) Fix javadoc and checkstyle issues in timelineservice code

2016-02-03 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4409:
---
Attachment: (was: YARN-4409-YARN-2928.wip.01.patch)

> Fix javadoc and checkstyle issues in timelineservice code
> -
>
> Key: YARN-4409
> URL: https://issues.apache.org/jira/browse/YARN-4409
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4409-YARN-2928.01.patch
>
>
> There are a large number of javadoc and checkstyle issues currently open in 
> timelineservice code. We need to fix them before we merge it into trunk.
> Refer to 
> https://issues.apache.org/jira/browse/YARN-3862?focusedCommentId=15035267=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15035267
> We still have 94 open checkstyle issues and javadocs failing for Java 8.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4307) Display blacklisted nodes for AM container in the RM web UI

2016-02-03 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-4307:

Summary: Display blacklisted nodes for AM container in the RM web UI  (was: 
Blacklisted nodes for AM container is not getting displayed in the Web UI)

> Display blacklisted nodes for AM container in the RM web UI
> ---
>
> Key: YARN-4307
> URL: https://issues.apache.org/jira/browse/YARN-4307
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: AppInfoPage.png, RMappAttempt.png, 
> YARN-4307.v1.001.patch, YARN-4307.v1.002.patch, YARN-4307.v1.003.patch, 
> YARN-4307.v1.004.patch, YARN-4307.v1.005.patch, webpage.png, 
> yarn-capacity-scheduler-debug.log
>
>
> In pseudo cluster had 2 NM's  and had launched app with incorrect 
> configuration *./hadoop org.apache.hadoop.mapreduce.SleepJob 
> -Dmapreduce.job.node-label-expression=labelX  
> -Dyarn.app.mapreduce.am.env=JAVA_HOME=/no/jvm/here  -m 5 -mt 1200*.
> First attempt failed and 2nd attempt was launched, but the application was 
> hung. In the scheduler logs found that localhost was blacklisted but in the 
> UI (app& apps listing page) count was shown as zero and as well no hosts 
> listed in the app page. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4409) Fix javadoc and checkstyle issues in timelineservice code

2016-02-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131866#comment-15131866
 ] 

Hadoop QA commented on YARN-4409:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 10s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
5s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 0s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 21s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
37s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 17s 
{color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
58s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
45s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 2s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 36s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
56s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 54s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 54s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 20s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 20s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 34s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 2 new + 
36 unchanged - 346 fixed = 38 total (was 382) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 11s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
36s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 3m 17s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-jdk1.8.0_66 with JDK v1.8.0_66 
generated 6 new + 94 unchanged - 6 fixed = 100 total (was 100) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 35s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 57s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 20s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 31s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 15s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 

[jira] [Commented] (YARN-4670) add logging when a node is AM-blacklisted

2016-02-03 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131820#comment-15131820
 ] 

Naganarasimha G R commented on YARN-4670:
-

 YARN-3946 informs if the app's AM is stuck and if it misses to schedule on a 
node due to its blacklisting for launching AM's for this app.
bq. Regarding logging in SchedulerAppUtils.isBlackListed(), does that get used 
for the AM blacklisting too? It's not obvious to me. 
*SchedulerAppUtils.isBlackListed* -> 
*SchedulerApplicationAttempt.isBlackListed* -> 
*AppSchedulingInfo.isBlackListed* and finally in the last call they are 
checking for the AM black list so in a way logging is there but as you said its 
not too obvious.
??volume of logging?? depends on the number of nodes which are free and and how 
many are blacklisted for the application and how long other nodes are occupied. 
but IMO it would be rare scenario volume explodes and we can have logging for 
it in *AppSchedulingInfo.isBlackListed* , Thoughts ?


> add logging when a node is AM-blacklisted
> -
>
> Key: YARN-4670
> URL: https://issues.apache.org/jira/browse/YARN-4670
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Trivial
>
> Today there is not much logging happening when a node is blacklisted for an 
> AM (see YARN-2005). We can add a little more logging to see this activity 
> easily from the RM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires

2016-02-03 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131755#comment-15131755
 ] 

Jian He commented on YARN-4138:
---

bq.  We only confirm resource when NM reported resource is the same as RM 
resource.
thanks for the explanation. I wonder why the decision was made to reset to the 
initial resource, in this case, the first increase happened successfully from 
app's point of view, will this confuse the apps if the resource somehow 
decrease back to the initial resource. 

> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch, 
> YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch, YARN-4138.4.patch, 
> YARN-4138.5.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4662) Document some newly added metrics

2016-02-03 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131688#comment-15131688
 ] 

Xuan Gong commented on YARN-4662:
-

+1 LGTM. Checking this in

> Document some newly added metrics
> -
>
> Key: YARN-4662
> URL: https://issues.apache.org/jira/browse/YARN-4662
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4662.1.patch, YARN-4662.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4662) Document some newly added metrics

2016-02-03 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131699#comment-15131699
 ] 

Xuan Gong commented on YARN-4662:
-

Committed into trunk/branch-2/branch-2.8. Thanks, Jian !

> Document some newly added metrics
> -
>
> Key: YARN-4662
> URL: https://issues.apache.org/jira/browse/YARN-4662
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.8.0
>
> Attachments: YARN-4662.1.patch, YARN-4662.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4662) Document some newly added metrics

2016-02-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131707#comment-15131707
 ] 

Hudson commented on YARN-4662:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9243 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9243/])
YARN-4662. Document some newly added metrics. Contributed by Jian He (xgong: 
rev 63c63e298cf9ff252532297deedde15e77323809)
* hadoop-common-project/hadoop-common/src/site/markdown/Metrics.md
* hadoop-yarn-project/CHANGES.txt


> Document some newly added metrics
> -
>
> Key: YARN-4662
> URL: https://issues.apache.org/jira/browse/YARN-4662
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.8.0
>
> Attachments: YARN-4662.1.patch, YARN-4662.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4670) add logging when a node is AM-blacklisted

2016-02-03 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131726#comment-15131726
 ] 

Sangjin Lee commented on YARN-4670:
---

Thanks for the info. I missed YARN-4307.

Regarding logging in SchedulerAppUtils.isBlackListed(), does that get used for 
the *AM* blacklisting too? It's not obvious to me. I was looking more at 
RMAppAttemptImpl.sendAMContainerToNM(). Also, it would be good if this can be 
logged at the INFO level, as I don't think the volume of this logging is going 
to be too much and logging this during the normal operation would be useful?

> add logging when a node is AM-blacklisted
> -
>
> Key: YARN-4670
> URL: https://issues.apache.org/jira/browse/YARN-4670
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Trivial
>
> Today there is not much logging happening when a node is blacklisted for an 
> AM (see YARN-2005). We can add a little more logging to see this activity 
> easily from the RM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4635) Add global blacklist tracking for AM container failure.

2016-02-03 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130324#comment-15130324
 ] 

Sunil G commented on YARN-4635:
---

Thanks [~jianhe] for the comments and thanks [~djp] for the clarifications.

bq.Do you think it’s ok to just use the threshold number only ? 0 means 
disabled, and numbers larger than 0 means enabled
Adding one more minor advantage using a threshold. If app specifies 
{{AMBlackListingRequest}} flag as false, then global blacklisting will not be 
applicable for this app. Such control is easier with a flag i think, how do you 
feel. 

> Add global blacklist tracking for AM container failure.
> ---
>
> Key: YARN-4635
> URL: https://issues.apache.org/jira/browse/YARN-4635
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4635-v2.patch, YARN-4635.patch
>
>
> We need a global blacklist in addition to each app’s blacklist to track AM 
> container failures in global 
> affection. That means we need to differentiate the non­-succeed 
> ContainerExitStatus reasoning from 
> NM or more related to App. 
> For more details, please refer the document in YARN-4576.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4665) Asynch submit can lose application submissions

2016-02-03 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130701#comment-15130701
 ] 

Varun Vasudev commented on YARN-4665:
-

Jason's understanding on the REST API is correct - the user submits the app 
using POST and polls using GET. Internally the functionality uses the same code 
flow as the RPC path - all calls flow through 
ClientRMService#submitApplication.  The RMAppManager has a check - {code} if 
(rmContext.getRMApps().putIfAbsent(applicationId, application) {code} so 
subsequent re-submits should not result in anything destructive.

> Asynch submit can lose application submissions
> --
>
> Key: YARN-4665
> URL: https://issues.apache.org/jira/browse/YARN-4665
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>
> The change introduced in YARN-514 opens up a hole into which applications can 
> fall and be lost.  Prior to YARN-514, the {{submitApplication()}} call did 
> not complete until the application state was persisted to the state store.  
> After YARN-514, the {{submitApplication()}} call is asynchronous, with the 
> application state being saved later.
> If the state store is slow or unresponsive, it may be that an application's 
> state may not be persisted for quite a while.  During that time, if the RM 
> fails (over), all applications that have not yet been persisted to the state 
> store will be lost.  If the active RM loses ZK connectivity, a significant 
> number of job submissions can pile up before the ZK connection times out, 
> resulting in a large pile of client failures when it finally does.
> This issue is inherent in the design of YARN-514.  I see three solutions:
> 1. Add a WAL to the state store. HBase does it, so we know how to do it. It 
> seems like a heavy solution to the original problem, however.  It's certainly 
> not a trivial change.
> 2. Revert YARN-514 and update the RPC layer to allow a connection to be 
> parked if it's doing something that may take a while. This is a generally 
> useful feature but could be a deep rabbit hole.
> 3. Revert YARN-514 and add back-pressure to the job submission. For example, 
> we set a maximum number of threads that can simultaneously be assigned to 
> handle job submissions.  When that threshold is reached, new job submissions 
> get a try-again-later response. This is also a generally useful feature and 
> should be a fairly constrained set of changes.
> I think the third option is the most approachable.  It's the smallest change, 
> and it adds useful behavior beyond solving the original issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4594) container-executor fails to remove directory tree when chmod required

2016-02-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130733#comment-15130733
 ] 

Hudson commented on YARN-4594:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9239 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9239/])
YARN-4594. container-executor fails to remove directory tree when chmod (jlowe: 
rev fa328e2d39eda1c479389b99a5c121e640a1e0ad)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c


> container-executor fails to remove directory tree when chmod required
> -
>
> Key: YARN-4594
> URL: https://issues.apache.org/jira/browse/YARN-4594
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 2.9.0
>
> Attachments: YARN-4594.001.patch, YARN-4594.002.patch, 
> YARN-4594.003.patch, YARN-4594.004.patch
>
>
> test-container-executor.c doesn't work:
> * It assumes that realpath(/bin/ls) will be /bin/ls, whereas it is actually 
> /usr/bin/ls on many systems.
> * The recursive delete logic in container-executor.c fails -- nftw does the 
> wrong thing when confronted with directories with the wrong mode (permission 
> bits), leading to an attempt to run rmdir on a non-empty directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient

2016-02-03 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130971#comment-15130971
 ] 

Sangjin Lee commented on YARN-3367:
---

I agree it might be slightly better to try to drain the queue when it's 
shutting down. But we need to be clear that is still on a best-effort basis. 
Also, let's not increase the wait time. It might add to the stop time of things 
unnecessarily.

I think there are ways to do it, but given the structure of the dispatcher 
code, it might be more practical to use a finally clause (outside the while 
loop). Note that the shutdown will come to this thread in the form of an 
interrupt. Otherwise, more restructuring of that code is needed.

> Replace starting a separate thread for post entity with event loop in 
> TimelineClient
> 
>
> Key: YARN-3367
> URL: https://issues.apache.org/jira/browse/YARN-3367
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3367-YARN-2928.v1.005.patch, 
> YARN-3367-YARN-2928.v1.006.patch, YARN-3367-YARN-2928.v1.007.patch, 
> YARN-3367-YARN-2928.v1.008.patch, YARN-3367-YARN-2928.v1.009.patch, 
> YARN-3367-YARN-2928.v1.010.patch, YARN-3367-feature-YARN-2928.003.patch, 
> YARN-3367-feature-YARN-2928.v1.002.patch, 
> YARN-3367-feature-YARN-2928.v1.004.patch, YARN-3367.YARN-2928.001.patch, 
> sjlee-suggestion.patch
>
>
> Since YARN-3039, we add loop in TimelineClient to wait for 
> collectorServiceAddress ready before posting any entity. In consumer of  
> TimelineClient (like AM), we are starting a new thread for each call to get 
> rid of potential deadlock in main thread. This way has at least 3 major 
> defects:
> 1. The consumer need some additional code to wrap a thread before calling 
> putEntities() in TimelineClient.
> 2. It cost many thread resources which is unnecessary.
> 3. The sequence of events could be out of order because each posting 
> operation thread get out of waiting loop randomly.
> We should have something like event loop in TimelineClient side, 
> putEntities() only put related entities into a queue of entities and a 
> separated thread handle to deliver entities in queue to collector via REST 
> call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4446) Refactor reader API for better extensibility

2016-02-03 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130974#comment-15130974
 ] 

Sangjin Lee commented on YARN-4446:
---

+1. I'll commit it soon. Please let me know now if you have any additional 
feedback.

> Refactor reader API for better extensibility
> 
>
> Key: YARN-4446
> URL: https://issues.apache.org/jira/browse/YARN-4446
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4446-YARN-2928.01.patch, 
> YARN-4446-YARN-2928.02.patch, YARN-4446-YARN-2928.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token

2016-02-03 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130988#comment-15130988
 ] 

Sangjin Lee commented on YARN-4183:
---

I am +1 with the latest patch, but I'd wait until Mit and/or Jon chime in.

[~mitdesai], [~jeagles], what are your thoughts? Is the conclusion here an 
acceptable conclusion for you guys?

> Enabling generic application history forces every job to get a timeline 
> service delegation token
> 
>
> Key: YARN-4183
> URL: https://issues.apache.org/jira/browse/YARN-4183
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Mit Desai
>Assignee: Naganarasimha G R
> Attachments: YARN-4183.1.patch, YARN-4183.v1.001.patch, 
> YARN-4183.v1.002.patch
>
>
> When enabling just the Generic History Server and not the timeline server, 
> the system metrics publisher will not publish the events to the timeline 
> store as it checks if the timeline server and system metrics publisher are 
> enabled before creating a timeline client.
> To make it work, if the timeline service flag is turned on, it will force 
> every yarn application to get a delegation token.
> Instead of checking if timeline service is enabled, we should be checking if 
> application history server is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4594) container-executor fails to remove directory tree when chmod required

2016-02-03 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130708#comment-15130708
 ] 

Jason Lowe commented on YARN-4594:
--

+1 lgtm.  Committing this.

> container-executor fails to remove directory tree when chmod required
> -
>
> Key: YARN-4594
> URL: https://issues.apache.org/jira/browse/YARN-4594
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: YARN-4594.001.patch, YARN-4594.002.patch, 
> YARN-4594.003.patch, YARN-4594.004.patch
>
>
> test-container-executor.c doesn't work:
> * It assumes that realpath(/bin/ls) will be /bin/ls, whereas it is actually 
> /usr/bin/ls on many systems.
> * The recursive delete logic in container-executor.c fails -- nftw does the 
> wrong thing when confronted with directories with the wrong mode (permission 
> bits), leading to an attempt to run rmdir on a non-empty directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4138) Roll back container resource allocation after resource increase token expires

2016-02-03 Thread MENG DING (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MENG DING updated YARN-4138:

Attachment: YARN-4138.5.patch

Hi, [~jianhe]

bq. After step 6, rmContainer.getLastConfirmedResource() will return 3G, when 
the expire event gets triggered, won't it reset it back to 3G?

No, it won't reset it back to 3G. rmContainer.getLastConfirmedResource() will 
not return 3G after step 6, it is still 1G. We only confirm resource when NM 
reported resource is the same as RM resource. In this test case, NM reported 
resource is 3G, but RM allocated resource is 6G, so 3G is NOT confirmed. This 
issues was discussed in this thread a while ago: 
https://issues.apache.org/jira/browse/YARN-4138?focusedCommentId=14737229=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14737229

bq. I think RMContainerImpl will not receive EXPIRE event at RUNNING state 
after this patch ? if so, we can remove this.

You are right, we can remove this. Attaching the latest patch that remove this.


> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch, 
> YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch, YARN-4138.4.patch, 
> YARN-4138.5.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4665) Asynch submit can lose application submissions

2016-02-03 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R reassigned YARN-4665:
---

Assignee: Naganarasimha G R  (was: Daniel Templeton)

> Asynch submit can lose application submissions
> --
>
> Key: YARN-4665
> URL: https://issues.apache.org/jira/browse/YARN-4665
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Daniel Templeton
>Assignee: Naganarasimha G R
>
> The change introduced in YARN-514 opens up a hole into which applications can 
> fall and be lost.  Prior to YARN-514, the {{submitApplication()}} call did 
> not complete until the application state was persisted to the state store.  
> After YARN-514, the {{submitApplication()}} call is asynchronous, with the 
> application state being saved later.
> If the state store is slow or unresponsive, it may be that an application's 
> state may not be persisted for quite a while.  During that time, if the RM 
> fails (over), all applications that have not yet been persisted to the state 
> store will be lost.  If the active RM loses ZK connectivity, a significant 
> number of job submissions can pile up before the ZK connection times out, 
> resulting in a large pile of client failures when it finally does.
> This issue is inherent in the design of YARN-514.  I see three solutions:
> 1. Add a WAL to the state store. HBase does it, so we know how to do it. It 
> seems like a heavy solution to the original problem, however.  It's certainly 
> not a trivial change.
> 2. Revert YARN-514 and update the RPC layer to allow a connection to be 
> parked if it's doing something that may take a while. This is a generally 
> useful feature but could be a deep rabbit hole.
> 3. Revert YARN-514 and add back-pressure to the job submission. For example, 
> we set a maximum number of threads that can simultaneously be assigned to 
> handle job submissions.  When that threshold is reached, new job submissions 
> get a try-again-later response. This is also a generally useful feature and 
> should be a fairly constrained set of changes.
> I think the third option is the most approachable.  It's the smallest change, 
> and it adds useful behavior beyond solving the original issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4665) Asynch submit can lose application submissions

2016-02-03 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130906#comment-15130906
 ] 

Naganarasimha G R commented on YARN-4665:
-

In that case would it be helpful if we have a retry logic in 
{{RMWebServices.submitApplication}} ? so  that it either gets succeeded or 
fails during RM failover ?

> Asynch submit can lose application submissions
> --
>
> Key: YARN-4665
> URL: https://issues.apache.org/jira/browse/YARN-4665
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Daniel Templeton
>Assignee: Naganarasimha G R
>
> The change introduced in YARN-514 opens up a hole into which applications can 
> fall and be lost.  Prior to YARN-514, the {{submitApplication()}} call did 
> not complete until the application state was persisted to the state store.  
> After YARN-514, the {{submitApplication()}} call is asynchronous, with the 
> application state being saved later.
> If the state store is slow or unresponsive, it may be that an application's 
> state may not be persisted for quite a while.  During that time, if the RM 
> fails (over), all applications that have not yet been persisted to the state 
> store will be lost.  If the active RM loses ZK connectivity, a significant 
> number of job submissions can pile up before the ZK connection times out, 
> resulting in a large pile of client failures when it finally does.
> This issue is inherent in the design of YARN-514.  I see three solutions:
> 1. Add a WAL to the state store. HBase does it, so we know how to do it. It 
> seems like a heavy solution to the original problem, however.  It's certainly 
> not a trivial change.
> 2. Revert YARN-514 and update the RPC layer to allow a connection to be 
> parked if it's doing something that may take a while. This is a generally 
> useful feature but could be a deep rabbit hole.
> 3. Revert YARN-514 and add back-pressure to the job submission. For example, 
> we set a maximum number of threads that can simultaneously be assigned to 
> handle job submissions.  When that threshold is reached, new job submissions 
> get a try-again-later response. This is also a generally useful feature and 
> should be a fairly constrained set of changes.
> I think the third option is the most approachable.  It's the smallest change, 
> and it adds useful behavior beyond solving the original issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4665) Asynch submit can lose application submissions

2016-02-03 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4665:

Assignee: Daniel Templeton  (was: Naganarasimha G R)

> Asynch submit can lose application submissions
> --
>
> Key: YARN-4665
> URL: https://issues.apache.org/jira/browse/YARN-4665
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>
> The change introduced in YARN-514 opens up a hole into which applications can 
> fall and be lost.  Prior to YARN-514, the {{submitApplication()}} call did 
> not complete until the application state was persisted to the state store.  
> After YARN-514, the {{submitApplication()}} call is asynchronous, with the 
> application state being saved later.
> If the state store is slow or unresponsive, it may be that an application's 
> state may not be persisted for quite a while.  During that time, if the RM 
> fails (over), all applications that have not yet been persisted to the state 
> store will be lost.  If the active RM loses ZK connectivity, a significant 
> number of job submissions can pile up before the ZK connection times out, 
> resulting in a large pile of client failures when it finally does.
> This issue is inherent in the design of YARN-514.  I see three solutions:
> 1. Add a WAL to the state store. HBase does it, so we know how to do it. It 
> seems like a heavy solution to the original problem, however.  It's certainly 
> not a trivial change.
> 2. Revert YARN-514 and update the RPC layer to allow a connection to be 
> parked if it's doing something that may take a while. This is a generally 
> useful feature but could be a deep rabbit hole.
> 3. Revert YARN-514 and add back-pressure to the job submission. For example, 
> we set a maximum number of threads that can simultaneously be assigned to 
> handle job submissions.  When that threshold is reached, new job submissions 
> get a try-again-later response. This is also a generally useful feature and 
> should be a fairly constrained set of changes.
> I think the third option is the most approachable.  It's the smallest change, 
> and it adds useful behavior beyond solving the original issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4616) Default RM retry interval (30s) is too long

2016-02-03 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-4616:
--
Fix Version/s: (was: 2.8.0)

> Default RM retry interval (30s) is too long
> ---
>
> Key: YARN-4616
> URL: https://issues.apache.org/jira/browse/YARN-4616
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
>
> I think the default 30s for the RM retry interval is too long.
> The default node-heartbeat-interval is only 1s 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4667) RM Admin CLI for refreshNodesResources throws NPE when nothing is configured

2016-02-03 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4667:

Attachment: YARN-4667.v1.001.patch

Attaching a patch to fix this issue, [~rohithsharma]/[~devaraj.k], can one of 
your review this simple fix ?


> RM Admin CLI for refreshNodesResources throws NPE when nothing is configured
> 
>
> Key: YARN-4667
> URL: https://issues.apache.org/jira/browse/YARN-4667
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4667.v1.001.patch
>
>
> {quote}
> $ ./yarn rmadmin -refreshNodesResources
> 16/02/03 10:54:27 INFO client.RMProxy: Connecting to ResourceManager at 
> /0.0.0.0:8033
> refreshNodesResources: java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshNodesResources(AdminService.java:655)
>   at 
> org.apache.hadoop.yarn.server.api.impl.pb.service.ResourceManagerAdministrationProtocolPBServiceImpl.refreshNodesResources(ResourceManagerAdministrationProtocolPBServiceImpl.java:246)
>   at 
> org.apache.hadoop.yarn.proto.ResourceManagerAdministrationProtocol$ResourceManagerAdministrationProtocolService$2.callBlockingMethod(ResourceManagerAdministrationProtocol.java:287)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4665) Asynch submit can lose application submissions

2016-02-03 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130825#comment-15130825
 ] 

Naganarasimha G R commented on YARN-4665:
-

Thanks for some clarification [~vvasudev] & [~jlowe],
bq.  but I'd expect the submission logic to be a POST followed by GET polling 
until the state is ACCEPTED or later. If the GET results in a no-such-app error 
then the client retries the POST and continues polling.
IIUC REST API user *needs to take care explicitly* in the above mentioned way 
so that its successfully submitted, if yes then we should better capture it in 
the document as nothing about this is mentioned 2.7.2 doc. Or correct me if i 
am missing something. 
[~vvasudev],
bq.  Internally the functionality uses the same code flow as the RPC path - all 
calls flow through ClientRMService#submitApplication. 
IIUC here the concern is, as the app submission is asynchronous so the submit 
call might return successfully but the statestore operation fails so on RM 
failover the submitted app is lost. In case of {{YarnClient}}, client takes 
care of re-requesting till the app state is appropriate but in case of REST, 
caller/user needs to take care of calling GET apps after doing a POST 
submission of a app.  ??subsequent re-submits?? is handled in the server side 
but client needs to retry until it doesn't get a  no-such-app error, right ?

> Asynch submit can lose application submissions
> --
>
> Key: YARN-4665
> URL: https://issues.apache.org/jira/browse/YARN-4665
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>
> The change introduced in YARN-514 opens up a hole into which applications can 
> fall and be lost.  Prior to YARN-514, the {{submitApplication()}} call did 
> not complete until the application state was persisted to the state store.  
> After YARN-514, the {{submitApplication()}} call is asynchronous, with the 
> application state being saved later.
> If the state store is slow or unresponsive, it may be that an application's 
> state may not be persisted for quite a while.  During that time, if the RM 
> fails (over), all applications that have not yet been persisted to the state 
> store will be lost.  If the active RM loses ZK connectivity, a significant 
> number of job submissions can pile up before the ZK connection times out, 
> resulting in a large pile of client failures when it finally does.
> This issue is inherent in the design of YARN-514.  I see three solutions:
> 1. Add a WAL to the state store. HBase does it, so we know how to do it. It 
> seems like a heavy solution to the original problem, however.  It's certainly 
> not a trivial change.
> 2. Revert YARN-514 and update the RPC layer to allow a connection to be 
> parked if it's doing something that may take a while. This is a generally 
> useful feature but could be a deep rabbit hole.
> 3. Revert YARN-514 and add back-pressure to the job submission. For example, 
> we set a maximum number of threads that can simultaneously be assigned to 
> handle job submissions.  When that threshold is reached, new job submissions 
> get a try-again-later response. This is also a generally useful feature and 
> should be a fairly constrained set of changes.
> I think the third option is the most approachable.  It's the smallest change, 
> and it adds useful behavior beyond solving the original issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4665) Asynch submit can lose application submissions

2016-02-03 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130847#comment-15130847
 ] 

Varun Vasudev commented on YARN-4665:
-

{quote}
IIUC here the concern is, as the app submission is asynchronous so the submit 
call might return successfully but the statestore operation fails so on RM 
failover the submitted app is lost. In case of YarnClient, client takes care of 
re-requesting till the app state is appropriate but in case of REST, 
caller/user needs to take care of calling GET apps after doing a POST 
submission of a app. subsequent re-submits is handled in the server side but 
client needs to retry until it doesn't get a no-such-app error, right ?
{quote}

Yes. In the REST case, the submit call will return a 202 Accepted. It's the 
responsibility of the REST client to poll to figure out the state and re-submit 
if necessary. 

> Asynch submit can lose application submissions
> --
>
> Key: YARN-4665
> URL: https://issues.apache.org/jira/browse/YARN-4665
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>
> The change introduced in YARN-514 opens up a hole into which applications can 
> fall and be lost.  Prior to YARN-514, the {{submitApplication()}} call did 
> not complete until the application state was persisted to the state store.  
> After YARN-514, the {{submitApplication()}} call is asynchronous, with the 
> application state being saved later.
> If the state store is slow or unresponsive, it may be that an application's 
> state may not be persisted for quite a while.  During that time, if the RM 
> fails (over), all applications that have not yet been persisted to the state 
> store will be lost.  If the active RM loses ZK connectivity, a significant 
> number of job submissions can pile up before the ZK connection times out, 
> resulting in a large pile of client failures when it finally does.
> This issue is inherent in the design of YARN-514.  I see three solutions:
> 1. Add a WAL to the state store. HBase does it, so we know how to do it. It 
> seems like a heavy solution to the original problem, however.  It's certainly 
> not a trivial change.
> 2. Revert YARN-514 and update the RPC layer to allow a connection to be 
> parked if it's doing something that may take a while. This is a generally 
> useful feature but could be a deep rabbit hole.
> 3. Revert YARN-514 and add back-pressure to the job submission. For example, 
> we set a maximum number of threads that can simultaneously be assigned to 
> handle job submissions.  When that threshold is reached, new job submissions 
> get a try-again-later response. This is also a generally useful feature and 
> should be a fairly constrained set of changes.
> I think the third option is the most approachable.  It's the smallest change, 
> and it adds useful behavior beyond solving the original issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3669) Attempt-failures validatiy interval should have a global admin configurable lower limit

2016-02-03 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3669:

Attachment: YARN-3669.2.patch

> Attempt-failures validatiy interval should have a global admin configurable 
> lower limit
> ---
>
> Key: YARN-3669
> URL: https://issues.apache.org/jira/browse/YARN-3669
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Xuan Gong
>  Labels: newbie
> Attachments: YARN-3669.1.patch, YARN-3669.2.patch
>
>
> Found this while reviewing YARN-3480.
> bq. When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to 
> a small value, retried attempts might be very large. So we need to delete 
> some attempts stored in RMStateStore and RMStateStore.
> I think we need to have a lower limit on the failure-validaty interval to 
> avoid situations like this.
> Having this will avoid pardoning too-many failures in too-short a duration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)