[jira] [Commented] (YARN-4497) RM might fail to restart when recovering apps whose attempts are missing

2015-12-23 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069443#comment-15069443
 ] 

Rohith Sharma K S commented on YARN-4497:
-

Thinking when it can happen attempt1 is stored , attempt2  is not stored and 
attempt3 is stored? One way is manually delete the attempt2 node from 
zookeeper. 

> RM might fail to restart when recovering apps whose attempts are missing
> 
>
> Key: YARN-4497
> URL: https://issues.apache.org/jira/browse/YARN-4497
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jun Gong
>Assignee: Jun Gong
>
> Find following problem when discussing in YARN-3480.
> If RM fails to store some attempts in RMStateStore, there will be missing 
> attempts in RMStateStore, for the case storing attempt1, attempt2 and 
> attempt3, RM successfully stored attempt1 and attempt3, but failed to store 
> attempt2. When RM restarts, in *RMAppImpl#recover*, we recover attempts one 
> by one, for this case, we will recover attmept1, then attempt2. When 
> recovering attempt2, we call  
> *((RMAppAttemptImpl)this.currentAttempt).recover(state)*, it will first find 
> its ApplicationAttemptStateData, but it could not find it, an error will come 
> at *assert attemptState != null*(*RMAppAttemptImpl#recover*, line 880).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4497) RM might fail to restart when recovering apps whose attempts are missing

2015-12-23 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069454#comment-15069454
 ] 

Jun Gong commented on YARN-4497:


In *RMStateStore#notifyStoreOperationFailedInternal*, RMStateStore might skip 
store errors, so RMStateStore might fail to store attempt2 for some 
reasons(e.g. network error), but the app could continue running, and starts a 
new attempt attempt3, then RMStateStore stores attempt3 successfully(suppose 
network is OK now).

> RM might fail to restart when recovering apps whose attempts are missing
> 
>
> Key: YARN-4497
> URL: https://issues.apache.org/jira/browse/YARN-4497
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jun Gong
>Assignee: Jun Gong
>
> Find following problem when discussing in YARN-3480.
> If RM fails to store some attempts in RMStateStore, there will be missing 
> attempts in RMStateStore, for the case storing attempt1, attempt2 and 
> attempt3, RM successfully stored attempt1 and attempt3, but failed to store 
> attempt2. When RM restarts, in *RMAppImpl#recover*, we recover attempts one 
> by one, for this case, we will recover attmept1, then attempt2. When 
> recovering attempt2, we call  
> *((RMAppAttemptImpl)this.currentAttempt).recover(state)*, it will first find 
> its ApplicationAttemptStateData, but it could not find it, an error will come 
> at *assert attemptState != null*(*RMAppAttemptImpl#recover*, line 880).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4497) RM might fail to restart when recovering apps whose attempts are missing

2015-12-23 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069470#comment-15069470
 ] 

Rohith Sharma K S commented on YARN-4497:
-

Currently, If any errors happened while storing into RMstateStore then 
RMStatestore is FENCED. So no more attempts are stored in state-store. And the 
RMStatState store state machine has transition is only from {{ACTIVE to 
FENCED}} but there is No {{FENCED to ACTIVE}}. 

If I am missing anything in flow, could you explain elaborately? 

> RM might fail to restart when recovering apps whose attempts are missing
> 
>
> Key: YARN-4497
> URL: https://issues.apache.org/jira/browse/YARN-4497
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jun Gong
>Assignee: Jun Gong
>
> Find following problem when discussing in YARN-3480.
> If RM fails to store some attempts in RMStateStore, there will be missing 
> attempts in RMStateStore, for the case storing attempt1, attempt2 and 
> attempt3, RM successfully stored attempt1 and attempt3, but failed to store 
> attempt2. When RM restarts, in *RMAppImpl#recover*, we recover attempts one 
> by one, for this case, we will recover attmept1, then attempt2. When 
> recovering attempt2, we call  
> *((RMAppAttemptImpl)this.currentAttempt).recover(state)*, it will first find 
> its ApplicationAttemptStateData, but it could not find it, an error will come 
> at *assert attemptState != null*(*RMAppAttemptImpl#recover*, line 880).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4497) RM might fail to restart when recovering apps whose attempts are missing

2015-12-23 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069477#comment-15069477
 ] 

Rohith Sharma K S commented on YARN-4497:
-

I got your point, if RM HA is not configured and fail fast is false, this would 
happen.

> RM might fail to restart when recovering apps whose attempts are missing
> 
>
> Key: YARN-4497
> URL: https://issues.apache.org/jira/browse/YARN-4497
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jun Gong
>Assignee: Jun Gong
>
> Find following problem when discussing in YARN-3480.
> If RM fails to store some attempts in RMStateStore, there will be missing 
> attempts in RMStateStore, for the case storing attempt1, attempt2 and 
> attempt3, RM successfully stored attempt1 and attempt3, but failed to store 
> attempt2. When RM restarts, in *RMAppImpl#recover*, we recover attempts one 
> by one, for this case, we will recover attmept1, then attempt2. When 
> recovering attempt2, we call  
> *((RMAppAttemptImpl)this.currentAttempt).recover(state)*, it will first find 
> its ApplicationAttemptStateData, but it could not find it, an error will come 
> at *assert attemptState != null*(*RMAppAttemptImpl#recover*, line 880).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4497) RM might fail to restart when recovering apps whose attempts are missing

2015-12-23 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069480#comment-15069480
 ] 

Jun Gong commented on YARN-4497:


Yes, it is the problem.

> RM might fail to restart when recovering apps whose attempts are missing
> 
>
> Key: YARN-4497
> URL: https://issues.apache.org/jira/browse/YARN-4497
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jun Gong
>Assignee: Jun Gong
>
> Find following problem when discussing in YARN-3480.
> If RM fails to store some attempts in RMStateStore, there will be missing 
> attempts in RMStateStore, for the case storing attempt1, attempt2 and 
> attempt3, RM successfully stored attempt1 and attempt3, but failed to store 
> attempt2. When RM restarts, in *RMAppImpl#recover*, we recover attempts one 
> by one, for this case, we will recover attmept1, then attempt2. When 
> recovering attempt2, we call  
> *((RMAppAttemptImpl)this.currentAttempt).recover(state)*, it will first find 
> its ApplicationAttemptStateData, but it could not find it, an error will come 
> at *assert attemptState != null*(*RMAppAttemptImpl#recover*, line 880).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4352) Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient

2015-12-23 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069537#comment-15069537
 ] 

Sunil G commented on YARN-4352:
---

Yes.Its related. I ll fix the same..

> Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient
> 
>
> Key: YARN-4352
> URL: https://issues.apache.org/jira/browse/YARN-4352
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Sunil G
>  Labels: security
> Attachments: 0001-YARN-4352.patch
>
>
> From 
> https://builds.apache.org/job/PreCommit-YARN-Build/9661/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client-jdk1.7.0_79.txt,
>  we can see the tests in TestYarnClient, TestAMRMClient and TestNMClient get 
> timeout which can be reproduced locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4098) Document ApplicationPriority feature

2015-12-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069717#comment-15069717
 ] 

Hadoop QA commented on YARN-4098:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 14s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 18s 
{color} | {color:red} Patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 0m 58s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12779251/0003-YARN-4098.patch |
| JIRA Issue | YARN-4098 |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux 88545b721bd1 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 882f2f0 |
| asflicense | 
https://builds.apache.org/job/PreCommit-YARN-Build/10082/artifact/patchprocess/patch-asflicense-problems.txt
 |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Max memory used | 29MB |
| Powered by | Apache Yetus 0.2.0-SNAPSHOT   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10082/console |


This message was automatically generated.



> Document ApplicationPriority feature
> 
>
> Key: YARN-4098
> URL: https://issues.apache.org/jira/browse/YARN-4098
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4098.patch, 0001-YARN-4098.patch, 
> 0002-YARN-4098.patch, 0003-YARN-4098.patch, YARN-4098.rar
>
>
> This JIRA is to track documentation of application priority and its user, 
> admin and REST interfaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4501) Document new put APIs in TimelineClient for ATS 1.5

2015-12-23 Thread Junping Du (JIRA)
Junping Du created YARN-4501:


 Summary: Document new put APIs in TimelineClient for ATS 1.5
 Key: YARN-4501
 URL: https://issues.apache.org/jira/browse/YARN-4501
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Junping Du
Assignee: Xuan Gong


In YARN-4234, we are adding new put APIs in TimelineClient, we should document 
it properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-12-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069621#comment-15069621
 ] 

Hudson commented on YARN-4234:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9018 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9018/])
YARN-4234. New put APIs in TimelineClient for ats v1.5. Contributed by 
(junping_du: rev 882f2f04644a13cadb93070d5545f7a4f8691fde)
* q
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClientForATS1_5.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineWriter.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelineEntityGroupId.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/TimelineClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestTimelineEntityGroupId.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/FileSystemTimelineWriter.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/DirectTimelineWriter.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServicesWithSSL.java


> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-4234-2015-11-13.1.patch, 
> YARN-4234-2015-11-16.1.patch, YARN-4234-2015-11-16.2.patch, 
> YARN-4234-2015.2.patch, YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.2015-11-12.1.patch, YARN-4234.2015-11-12.1.patch, 
> YARN-4234.2015-11-18.1.patch, YARN-4234.2015-11-18.2.patch, 
> YARN-4234.2015-11-18.patch, YARN-4234.2015-12-09.patch, 
> YARN-4234.2015-12-09.patch, YARN-4234.2015-12-17.1.patch, 
> YARN-4234.2015-12-18.1.patch, YARN-4234.2015-12-18.patch, 
> YARN-4234.2015-12-21.1.patch, YARN-4234.20151109.patch, 
> YARN-4234.20151110.1.patch, YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4098) Document ApplicationPriority feature

2015-12-23 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069744#comment-15069744
 ] 

Sunil G commented on YARN-4098:
---

Looks good for me. Will wait for [~jianhe] comments also.

> Document ApplicationPriority feature
> 
>
> Key: YARN-4098
> URL: https://issues.apache.org/jira/browse/YARN-4098
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4098.patch, 0001-YARN-4098.patch, 
> 0002-YARN-4098.patch, 0003-YARN-4098.patch, YARN-4098.rar
>
>
> This JIRA is to track documentation of application priority and its user, 
> admin and REST interfaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-12-23 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069625#comment-15069625
 ] 

Junping Du commented on YARN-4234:
--

Forget to mention, I think we need to document new APIs that we are adding 
here. Just filed YARN-4501 to track this effort.

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-4234-2015-11-13.1.patch, 
> YARN-4234-2015-11-16.1.patch, YARN-4234-2015-11-16.2.patch, 
> YARN-4234-2015.2.patch, YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.2015-11-12.1.patch, YARN-4234.2015-11-12.1.patch, 
> YARN-4234.2015-11-18.1.patch, YARN-4234.2015-11-18.2.patch, 
> YARN-4234.2015-11-18.patch, YARN-4234.2015-12-09.patch, 
> YARN-4234.2015-12-09.patch, YARN-4234.2015-12-17.1.patch, 
> YARN-4234.2015-12-18.1.patch, YARN-4234.2015-12-18.patch, 
> YARN-4234.2015-12-21.1.patch, YARN-4234.20151109.patch, 
> YARN-4234.20151110.1.patch, YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4098) Document ApplicationPriority feature

2015-12-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069702#comment-15069702
 ] 

Hadoop QA commented on YARN-4098:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 15s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 21s 
{color} | {color:red} Patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 1m 1s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12779248/0002-YARN-4098.patch |
| JIRA Issue | YARN-4098 |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux 08368cfedd76 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 882f2f0 |
| asflicense | 
https://builds.apache.org/job/PreCommit-YARN-Build/10081/artifact/patchprocess/patch-asflicense-problems.txt
 |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Max memory used | 30MB |
| Powered by | Apache Yetus 0.2.0-SNAPSHOT   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10081/console |


This message was automatically generated.



> Document ApplicationPriority feature
> 
>
> Key: YARN-4098
> URL: https://issues.apache.org/jira/browse/YARN-4098
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4098.patch, 0001-YARN-4098.patch, 
> 0002-YARN-4098.patch, YARN-4098.rar
>
>
> This JIRA is to track documentation of application priority and its user, 
> admin and REST interfaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching

2015-12-23 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069629#comment-15069629
 ] 

Junping Du commented on YARN-4265:
--

I just commit YARN-4234. [~gtCarrera9], would you rebase your patch on latest 
trunk branch? Thanks!
Hi [~jlowe], I saw your comments above: "This looks like most of the patch is a 
copy of the entity timeline store from YARN-3942 with a few edits, so I'm sorta 
reviewing my own code here. As such I did a diff of the patch from this JIRA 
and the one from YARN-3942 so I could focus on what's changed. I'll defer to 
others to review the parts that are identical to YARN-3942. Eventually I can 
see this being a superset of YARN-3942, since it can cache to memory and either 
cache everything or a subset based on what the plugins decide." Are you OK with 
continue the review effort going with this patch? Or you have some other 
preferences?


> Provide new timeline plugin storage to support fine-grained entity caching
> --
>
> Key: YARN-4265
> URL: https://issues.apache.org/jira/browse/YARN-4265
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4265-trunk.poc_001.patch, 
> YARN-4265.YARN-4234.001.patch, YARN-4265.YARN-4234.002.patch
>
>
> To support the newly proposed APIs in YARN-4234, we need to create a new 
> plugin timeline store. The store may have similar behavior as the 
> EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id 
> granularity, instead of application id granularity. Let's have this storage 
> as a standalone one, instead of updating EntityFileTimelineStore, to keep the 
> existing store (EntityFileTimelineStore) stable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching

2015-12-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069633#comment-15069633
 ] 

Hadoop QA commented on YARN-4265:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s {color} 
| {color:red} YARN-4265 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12776931/YARN-4265.YARN-4234.002.patch
 |
| JIRA Issue | YARN-4265 |
| Powered by | Apache Yetus 0.2.0-SNAPSHOT   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10080/console |


This message was automatically generated.



> Provide new timeline plugin storage to support fine-grained entity caching
> --
>
> Key: YARN-4265
> URL: https://issues.apache.org/jira/browse/YARN-4265
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4265-trunk.poc_001.patch, 
> YARN-4265.YARN-4234.001.patch, YARN-4265.YARN-4234.002.patch
>
>
> To support the newly proposed APIs in YARN-4234, we need to create a new 
> plugin timeline store. The store may have similar behavior as the 
> EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id 
> granularity, instead of application id granularity. Let's have this storage 
> as a standalone one, instead of updating EntityFileTimelineStore, to keep the 
> existing store (EntityFileTimelineStore) stable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4098) Document ApplicationPriority feature

2015-12-23 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4098:

Attachment: 0003-YARN-4098.patch

> Document ApplicationPriority feature
> 
>
> Key: YARN-4098
> URL: https://issues.apache.org/jira/browse/YARN-4098
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4098.patch, 0001-YARN-4098.patch, 
> 0002-YARN-4098.patch, 0003-YARN-4098.patch, YARN-4098.rar
>
>
> This JIRA is to track documentation of application priority and its user, 
> admin and REST interfaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4098) Document ApplicationPriority feature

2015-12-23 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069723#comment-15069723
 ] 

Rohith Sharma K S commented on YARN-4098:
-

bq. -1  asflicense
No new files are added, and for the existing modified file has asf header.

> Document ApplicationPriority feature
> 
>
> Key: YARN-4098
> URL: https://issues.apache.org/jira/browse/YARN-4098
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4098.patch, 0001-YARN-4098.patch, 
> 0002-YARN-4098.patch, 0003-YARN-4098.patch, YARN-4098.rar
>
>
> This JIRA is to track documentation of application priority and its user, 
> admin and REST interfaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4098) Document ApplicationPriority feature

2015-12-23 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069726#comment-15069726
 ] 

Rohith Sharma K S commented on YARN-4098:
-

[~sunilg]/[~jianhe] kindly review the patch

> Document ApplicationPriority feature
> 
>
> Key: YARN-4098
> URL: https://issues.apache.org/jira/browse/YARN-4098
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4098.patch, 0001-YARN-4098.patch, 
> 0002-YARN-4098.patch, 0003-YARN-4098.patch, YARN-4098.rar
>
>
> This JIRA is to track documentation of application priority and its user, 
> admin and REST interfaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4098) Document ApplicationPriority feature

2015-12-23 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069691#comment-15069691
 ] 

Rohith Sharma K S commented on YARN-4098:
-

Updated the patch fixing review comments.

> Document ApplicationPriority feature
> 
>
> Key: YARN-4098
> URL: https://issues.apache.org/jira/browse/YARN-4098
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4098.patch, 0001-YARN-4098.patch, 
> 0002-YARN-4098.patch, YARN-4098.rar
>
>
> This JIRA is to track documentation of application priority and its user, 
> admin and REST interfaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4098) Document ApplicationPriority feature

2015-12-23 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4098:

Attachment: 0002-YARN-4098.patch

> Document ApplicationPriority feature
> 
>
> Key: YARN-4098
> URL: https://issues.apache.org/jira/browse/YARN-4098
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4098.patch, 0001-YARN-4098.patch, 
> 0002-YARN-4098.patch, YARN-4098.rar
>
>
> This JIRA is to track documentation of application priority and its user, 
> admin and REST interfaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-23 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070006#comment-15070006
 ] 

Sangjin Lee commented on YARN-4224:
---

Sorry I am catching up with the discussion. Just to put my opinions on some of 
the questions raised so far.

Regarding omitting some part of the path in the hierarchical form of the URL:
bq. Sangjin Lee did you mean providing shortcuts to thing like applications 
(instead of cluster, user, flow, flowrun, app, we can directly have cluster and 
app)?
Yes, for example, when you query for things like all apps in a flow run, it is 
possible to omit things like "user" as it can be inferred from the rest of the 
information. Although the path is /cluster/user/flow/flow-run-id/apps, I was 
hoping one could do /cluster/flow/flow-run-id/apps and the server will accept 
it as long as it can infer the missing path from the rest of the context. The 
UID form would have to specify all parts of the information with no exception 
however to eliminate any ambiguity. I hope that answers the question.

Regarding creating the UID, I think we still need to make a call on whether to 
make the UID composition a public protocol. If we do, then potentially we don't 
need to return anything and don't have to worry about in which layer in the 
server-side it will be composed.

On a related note, I'm leaning against making the UID composition configurable. 
I don't see a whole lot of practical need to customize UID composition, and it 
will only cause more confusion especially when a user/client deals with 
multiple clusters.

On specifying the entity type along with the entity's UID, I think it would 
definitely better if not required. My memory is bit hazy on this, but I think 
there is no hard guarantee that an entity id is unique even within a parent 
yarn app. Entity id's are essentially up to whoever writes them, and they may 
choose degenerate id's. I think we always said only the tuple of (entity type, 
entity id) is unique within an application, right? So, what is the required 
info for uniquely locating an entity? Entity type, and entity id are needed, 
but how about the context? App id? Any flow contexts?

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications

2015-12-23 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4479:

Attachment: 0002-YARN-4479.patch

> Retrospect app-priority in pendingOrderingPolicy during recovering 
> applications
> ---
>
> Key: YARN-4479
> URL: https://issues.apache.org/jira/browse/YARN-4479
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch
>
>
> Currently, same ordering policy is used for pending applications and active 
> applications. When priority is configured for an applications, during 
> recovery high priority application get activated first. It is possible that 
> low priority job was submitted and running state. 
> This causes low priority job in starvation after recovery



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4352) Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient

2015-12-23 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4352:
--
Attachment: 0002-YARN-4352.patch

Attaching an updated patch addressing test fails.

> Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient
> 
>
> Key: YARN-4352
> URL: https://issues.apache.org/jira/browse/YARN-4352
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Sunil G
>  Labels: security
> Attachments: 0001-YARN-4352.patch, 0002-YARN-4352.patch
>
>
> From 
> https://builds.apache.org/job/PreCommit-YARN-Build/9661/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client-jdk1.7.0_79.txt,
>  we can see the tests in TestYarnClient, TestAMRMClient and TestNMClient get 
> timeout which can be reproduced locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4352) Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient

2015-12-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070013#comment-15070013
 ] 

Hadoop QA commented on YARN-4352:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
55s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
47s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
35s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 16s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 16s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
52s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
11s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 11s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
59s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 59s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 28s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 42s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 23s 
{color} | {color:red} Patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 82m 32s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12779269/0002-YARN-4352.patch |
| JIRA Issue | YARN-4352 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 1f5d40077910 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision 

[jira] [Commented] (YARN-4400) AsyncDispatcher.waitForDrained should be final

2015-12-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069883#comment-15069883
 ] 

Hudson commented on YARN-4400:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9019 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9019/])
YARN-4400. AsyncDispatcher.waitForDrained should be final. Contributed 
(junping_du: rev bb5df272b9c0be9830ee8480cd33e75d26deb9d1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java
* hadoop-yarn-project/CHANGES.txt


> AsyncDispatcher.waitForDrained should be final
> --
>
> Key: YARN-4400
> URL: https://issues.apache.org/jira/browse/YARN-4400
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Trivial
> Fix For: 2.8.0
>
> Attachments: YARN-4400.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient

2015-12-23 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070009#comment-15070009
 ] 

Naganarasimha G R commented on YARN-3367:
-

thanks [~djp] for looking into this issue.
bq. Sounds good. I just commit YARN-4400 to trunk.
Sorry mislead with wrong jira number actually thought YARN-4457 of [~templedf], 
can solve the issue, just took a further look, *AsyncDispatcher* has been coded 
to handle Events only (like AsyncDispatcher's  *BlockingQueue 
eventQueue*, EventHandler's *handle(T event)* ..). Hence its not easy to 
replace, so probability to reuse for dispatching *Timeline Entities* is bit 
difficult(/ far too many changes for little re-usability).

bq. Does TimelineEntityAsyncDispatcher can be reused by other classes? If not, 
better to keep it as private class.
Though the plan for this class is only to be used by TimelineClientImpl, class 
is getting cluttered with V1 and V2 code, and impacting readability, hence 
thought of v2 publishing part of the code in TimelineClientImpl  to be moved 
along with the TimelineEntityAsyncDispatcher, thoughts?

> Replace starting a separate thread for post entity with event loop in 
> TimelineClient
> 
>
> Key: YARN-3367
> URL: https://issues.apache.org/jira/browse/YARN-3367
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3367-feature-YARN-2928.003.patch, 
> YARN-3367-feature-YARN-2928.v1.002.patch, 
> YARN-3367-feature-YARN-2928.v1.004.patch, YARN-3367.YARN-2928.001.patch
>
>
> Since YARN-3039, we add loop in TimelineClient to wait for 
> collectorServiceAddress ready before posting any entity. In consumer of  
> TimelineClient (like AM), we are starting a new thread for each call to get 
> rid of potential deadlock in main thread. This way has at least 3 major 
> defects:
> 1. The consumer need some additional code to wrap a thread before calling 
> putEntities() in TimelineClient.
> 2. It cost many thread resources which is unnecessary.
> 3. The sequence of events could be out of order because each posting 
> operation thread get out of waiting loop randomly.
> We should have something like event loop in TimelineClient side, 
> putEntities() only put related entities into a queue of entities and a 
> separated thread handle to deliver entities in queue to collector via REST 
> call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4400) AsyncDispatcher.waitForDrained should be final

2015-12-23 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069849#comment-15069849
 ] 

Junping Du commented on YARN-4400:
--

Nice catch, [~templedf]! +1 on the patch, committing it now.

> AsyncDispatcher.waitForDrained should be final
> --
>
> Key: YARN-4400
> URL: https://issues.apache.org/jira/browse/YARN-4400
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Trivial
> Attachments: YARN-4400.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient

2015-12-23 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069952#comment-15069952
 ] 

Junping Du commented on YARN-3367:
--

Thanks Naga for updating the patch! Quickly go through your patch but haven't 
deep dive there. 
Quickly response on your comments above:
bq. I could reuse/extend Async Dispatcher after YARN-4400 is committed to trunk.
Sounds good. I just commit YARN-4400 to trunk.

bq. I think it can be more organized if i can move the all this related 
code(dispatcher code) to a new class.
Does TimelineEntityAsyncDispatcher can be reused by other classes? If not, 
better to keep it as private class.

bq. will work on other locations(removing the thread pools in the caller side) 
once the approach is finalized.
Make sense. That could make caller code much simpler.

More comments come later.


> Replace starting a separate thread for post entity with event loop in 
> TimelineClient
> 
>
> Key: YARN-3367
> URL: https://issues.apache.org/jira/browse/YARN-3367
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3367-feature-YARN-2928.003.patch, 
> YARN-3367-feature-YARN-2928.v1.002.patch, 
> YARN-3367-feature-YARN-2928.v1.004.patch, YARN-3367.YARN-2928.001.patch
>
>
> Since YARN-3039, we add loop in TimelineClient to wait for 
> collectorServiceAddress ready before posting any entity. In consumer of  
> TimelineClient (like AM), we are starting a new thread for each call to get 
> rid of potential deadlock in main thread. This way has at least 3 major 
> defects:
> 1. The consumer need some additional code to wrap a thread before calling 
> putEntities() in TimelineClient.
> 2. It cost many thread resources which is unnecessary.
> 3. The sequence of events could be out of order because each posting 
> operation thread get out of waiting loop randomly.
> We should have something like event loop in TimelineClient side, 
> putEntities() only put related entities into a queue of entities and a 
> separated thread handle to deliver entities in queue to collector via REST 
> call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-23 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069951#comment-15069951
 ] 

Varun Saxena commented on YARN-4224:


To aid in tonight's discussion, I will jot down the REST endpoints added and 
points to discuss.
[~gtCarrera9], if you have suggestion on these endpoints, you can jot them down 
here as well. So that we can have a faster discussion during call.
* REST endpoints based on UID as per current patch are as under :
{panel}
*Query multiple flows* : Endpoint is */ws/v2/timeline/flows or 
/ws/v2/timeline/\{clusterid\}/flows*. This query will return a UID of the form 
*cluster:user:flowname* for each flow name.
*Query multiple flowruns* : Endpoint is */ws/v2/timeline/runs/\{flow UID\}* 
where flow UID is of the form *cluster:user:flowname* i.e. the one returned in 
query above. This query returns a UID of the form *cluster:user:flowname:runid* 
for each flow run.
*Query single flowrun* : Endpoint is */ws/v2/timeline/run/\{flowrun UID\}* 
where flowrun UID is of the form *cluster:user:flowname:runid* i.e. the one 
returned in query above. This query also returns a UID of the form 
*cluster:user:flowname:runid* for the flowrun returned. Is this required for 
Web UI ?
*Query multiple apps in a flowrun* : Endpoint is 
*/ws/v2/timeline/runapps/\{flowrun UID\}* where flowrun UID is of the form 
*cluster:user:flowname:runid*. runapps because we are querying apps within a 
flowrun. Hierarchical endpoint has one endpoint to query apps within a flow 
name as well. This query also returns a UID of the form 
*cluster:user:flowname:runid:appid* for each app returned.
*Query single app* : Endpoint is */ws/v2/timeline/app/\{app UID\}* where app 
UID is of the form *cluster:user:flowname:runid:appid* i.e. the one returned in 
query above.
*Query Entities* : Current endpoint is 
*/ws/v2/timeline/entities/\{entitytype\}/\{app UID\}*. Entity type is separate 
because we cannot know entity type when we query apps. This was decided to be 
endpoint when we had decided separator will not be public. Now as it will be 
public, endpoint can probably be */ws/v2/timeline/entities/\{app UID plus 
entity type\}* i.e. UID will be *cluster:user:flowname:runid:appid:entitytype*. 
But for this specific query, client needs to specifically do extra operation on 
UID returned in previous query, unlike other endpoints. This query also returns 
a UID of the form *cluster:user:flowname:runid:appid:entitytype:entityid* for 
each entity returned.
*Query Entity* : Endpoint is */ws/v2/timeline/entity/\{entity UID\}* where 
entity UID is of the form  
*cluster:user:flowname:runid:appid:entitytype:entityid*
{panel}

* Need to discuss pros and cons of filling UID inside storage layer and outside 
it.

We can add an endpoint for single flow once offline aggregation is done.

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3976) Catch ApplicationNotFoundException instead of parent YarnException in YarnClient and AppReportFetcher

2015-12-23 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3976:
--
Fix Version/s: (was: 2.7.2)

> Catch ApplicationNotFoundException instead of parent YarnException in 
> YarnClient and AppReportFetcher
> -
>
> Key: YARN-3976
> URL: https://issues.apache.org/jira/browse/YARN-3976
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Mit Desai
>Assignee: Mit Desai
>Priority: Trivial
>
> It's is better to catch the ApplicationNotFoundException rather than the 
> parent YarnException and rethrow it when it's not ApplicationNotFoundExcepton
> {noformat}
>  catch (YarnException e) {
>   if (!historyServiceEnabled) {
> // Just throw it as usual if historyService is not enabled.
> throw e;
>   }
>   // Even if history-service is enabled, treat all exceptions still the 
> same
>   // except the following
>   if (!(e.getClass() == ApplicationNotFoundException.class)) {
> throw e;
>   }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling

2015-12-23 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070186#comment-15070186
 ] 

Wangda Tan commented on YARN-3870:
--

Hi [~grey],

Thanks for raising this, we definitely need such mechanism to better describe 
our resource request.

[~asuresh], I'm not sure how the unique id works? Are you planing to add it as 
a key to AppSchedulingInfo resource requests map? (e.g. {{Map = >>}})

> Providing raw container request information for fine scheduling
> ---
>
> Key: YARN-3870
> URL: https://issues.apache.org/jira/browse/YARN-3870
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, applications, capacityscheduler, fairscheduler, 
> resourcemanager, scheduler, yarn
>Reporter: Lei Guo
>
> Currently, when AM sends container requests to RM and scheduler, it expands 
> individual container requests into host/rack/any format. For instance, if I 
> am asking for container request with preference "host1, host2, host3", 
> assuming all are in the same rack rack1, instead of sending one raw container 
> request to RM/Scheduler with raw preference list, it basically expand it to 
> become 5 different objects with host1, host2, host3, rack1 and any in there. 
> When scheduler receives information, it basically already lost the raw 
> request. This is ok for single container request, but it will cause trouble 
> when dealing with multiple container requests from the same application. 
> Consider this case:
> 6 hosts, two racks:
> rack1 (host1, host2, host3) rack2 (host4, host5, host6)
> When application requests two containers with different data locality 
> preference:
> c1: host1, host2, host4
> c2: host2, host3, host5
> This will end up with following container request list when client sending 
> request to RM/Scheduler:
> host1: 1 instance
> host2: 2 instances
> host3: 1 instance
> host4: 1 instance
> host5: 1 instance
> rack1: 2 instances
> rack2: 2 instances
> any: 2 instances
> Fundamentally, it is hard for scheduler to make a right judgement without 
> knowing the raw container request. The situation will get worse when dealing 
> with affinity and anti-affinity or even gang scheduling etc.
> We need some way to provide raw container request information for fine 
> scheduling purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-23 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070229#comment-15070229
 ] 

Li Lu commented on YARN-4224:
-

Actually I think the /ws/v2/timeline/apps/{app UID}/entities?entityType=... 
format looks fine. On querying entities, entity types is a query parameter but 
may not be mandatory. /ws/v2/timeline/apps/{app UID}/entities semantically 
should list all entities in one application. Implementation-wise, this may not 
be a good idea since there may be too many entities. There are solutions to 
this problem. For example, we can restrict /ws/v2/timeline/apps/{app 
UID}/entities will always return first 100 entities. With this design, if users 
would like to list all CONTAINER type entities, they can add entityType as one 
query parameter. 

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2882) Add ExecutionType to denote if a container execution is GUARANTEED or OPPORTUNISTIC

2015-12-23 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070329#comment-15070329
 ] 

Wangda Tan commented on YARN-2882:
--

Hi [~asuresh],

Thanks for answering my question , but I still may not understand correctly:
- If opportunistic/guaranteed is solely decided by scheduler, is it possible 
that AM cannot get container in predictable behavior? For example, LRS 
container will be guaranteed only. Another example is MR job want to 
speculative tasks to be opportunistic only.
- Why add limitation to AMs that can only allocate for opportunistic resources. 
(your 2nd point). 

> Add ExecutionType to denote if a container execution is GUARANTEED or 
> OPPORTUNISTIC
> ---
>
> Key: YARN-2882
> URL: https://issues.apache.org/jira/browse/YARN-2882
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-2882-yarn-2877.001.patch, 
> YARN-2882-yarn-2877.002.patch, YARN-2882-yarn-2877.003.patch, 
> YARN-2882-yarn-2877.004.patch, yarn-2882.patch
>
>
> This JIRA introduces the notion of container types.
> We propose two initial types of containers: guaranteed-start and queueable 
> containers.
> Guaranteed-start are the existing containers, which are allocated by the 
> central RM and are instantaneously started, once allocated.
> Queueable is a new type of container, which allows containers to be queued in 
> the NM, thus their execution may be arbitrarily delayed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications

2015-12-23 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070521#comment-15070521
 ] 

Naganarasimha G R commented on YARN-4479:
-

Hi [~rohithsharma],  Thanks for the patch,
New approach seems to be better than the older as it tries to avoid additional 
data structure used for the same purpose, but few points :
* If we consider for FairOrderingPolicy it first considers {{FairComparator}} 
and then the {{FifoComparator}}, so only if fairness is equal it will consider 
whether the application was already running, so would it be better to add 
additional comparator for recovery which can be used by both Fair and Fifo ?
* So it will be totally left to Ordering policy whether to consider the order 
of the recovered app based on submission time or not, so better to get that 
documented so that custom ordering policy can consider it.

> Retrospect app-priority in pendingOrderingPolicy during recovering 
> applications
> ---
>
> Key: YARN-4479
> URL: https://issues.apache.org/jira/browse/YARN-4479
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch
>
>
> Currently, same ordering policy is used for pending applications and active 
> applications. When priority is configured for an applications, during 
> recovery high priority application get activated first. It is possible that 
> low priority job was submitted and running state. 
> This causes low priority job in starvation after recovery



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4098) Document ApplicationPriority feature

2015-12-23 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070336#comment-15070336
 ] 

Jian He commented on YARN-4098:
---

looks good overall 
- this feature allows applications to be submitted and scheduled with different 
priorities.
 may be  {{this feature allows applications to be submitted and scheduled with 
different priorities.}}
- which is greater then - should be greater than 

> Document ApplicationPriority feature
> 
>
> Key: YARN-4098
> URL: https://issues.apache.org/jira/browse/YARN-4098
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4098.patch, 0001-YARN-4098.patch, 
> 0002-YARN-4098.patch, 0003-YARN-4098.patch, YARN-4098.rar
>
>
> This JIRA is to track documentation of application priority and its user, 
> admin and REST interfaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching

2015-12-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070353#comment-15070353
 ] 

Hadoop QA commented on YARN-4265:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 8 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 2s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 15s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 43s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
45s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 32s 
{color} | {color:red} branch/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server 
no findbugs output file 
(hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/target/findbugsXml.xml) 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 13s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 45s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 21s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 21s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 29s 
{color} | {color:red} Patch generated 54 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 292, now 345). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 42s 
{color} | {color:red} patch/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server 
no findbugs output file 
(hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/target/findbugsXml.xml) 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 11s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 46s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 5s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 79m 12s {color} 
| {color:red} hadoop-yarn-server in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 19s 
{color} | {color:green} hadoop-yarn-common 

[jira] [Commented] (YARN-4156) TestAMRestart#testAMBlacklistPreventsRestartOnSameNode assumes CapacityScheduler

2015-12-23 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070387#comment-15070387
 ] 

Karthik Kambatla commented on YARN-4156:


+1

> TestAMRestart#testAMBlacklistPreventsRestartOnSameNode assumes 
> CapacityScheduler
> 
>
> Key: YARN-4156
> URL: https://issues.apache.org/jira/browse/YARN-4156
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-4156.001.patch
>
>
> The assumes the scheduler is CapacityScheduler without configuring it as 
> such. This causes it to fail if the default is something else such as the 
> FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-12-23 Thread Masatake Iwasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070495#comment-15070495
 ] 

Masatake Iwasaki commented on YARN-4234:


[~djp], file named "q" seemed to be accidentally added to top directory.  I'm 
adding addendum patch to remove the file.

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-4234-2015-11-13.1.patch, 
> YARN-4234-2015-11-16.1.patch, YARN-4234-2015-11-16.2.patch, 
> YARN-4234-2015.2.patch, YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.2015-11-12.1.patch, YARN-4234.2015-11-12.1.patch, 
> YARN-4234.2015-11-18.1.patch, YARN-4234.2015-11-18.2.patch, 
> YARN-4234.2015-11-18.patch, YARN-4234.2015-12-09.patch, 
> YARN-4234.2015-12-09.patch, YARN-4234.2015-12-17.1.patch, 
> YARN-4234.2015-12-18.1.patch, YARN-4234.2015-12-18.patch, 
> YARN-4234.2015-12-21.1.patch, YARN-4234.20151109.patch, 
> YARN-4234.20151110.1.patch, YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires

2015-12-23 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070534#comment-15070534
 ] 

MENG DING commented on YARN-4138:
-

Hi [~jianhe], which file(s) are you referring to in particular?

> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch, 
> YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4156) TestAMRestart#testAMBlacklistPreventsRestartOnSameNode assumes CapacityScheduler

2015-12-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070492#comment-15070492
 ] 

Hudson commented on YARN-4156:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9021 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9021/])
YARN-4156. TestAMRestart#testAMBlacklistPreventsRestartOnSameNode (kasha: rev 
0af492b4bdb0356ea04e13690b78a236b82bd40c)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java


> TestAMRestart#testAMBlacklistPreventsRestartOnSameNode assumes 
> CapacityScheduler
> 
>
> Key: YARN-4156
> URL: https://issues.apache.org/jira/browse/YARN-4156
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Fix For: 2.9.0
>
> Attachments: YARN-4156.001.patch
>
>
> The assumes the scheduler is CapacityScheduler without configuring it as 
> such. This causes it to fail if the default is something else such as the 
> FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-23 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070575#comment-15070575
 ] 

Sangjin Lee commented on YARN-4224:
---

Regarding the ambiguity between /ws/v2/timeline/apps/\{app 
UID\}/entities/\{entitytype\} (UID) and 
/ws/v2/timeline/apps/app_id/entities/entitytype (hierachical), doesn't the 
hierarchical URL need more context such as cluster/user/flow/flow-run? Is it 
because all of them can be omitted?

At any rate, I agree that due to the possibility of omission ambiguities are 
perhaps possible. In that case, I suspect using different query nouns might be 
the ultimate solution (e.g. "apps" for the hierachical and "apps-uid" for UIDs).

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4353) Provide short circuit user group mapping for NM/AM

2015-12-23 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070358#comment-15070358
 ] 

Karthik Kambatla commented on YARN-4353:


bq. If secure LDAP is configured for group mapping, then there are some 
additional complications created by the unnecessary group resolution.
Could you elaborate? What complications? 

I would think Vinod's suggestion here should work, albeit a more substantial 
change. Could you also comment on how the change here helps/hurts the long-term 
overall fix? 

> Provide short circuit user group mapping for NM/AM
> --
>
> Key: YARN-4353
> URL: https://issues.apache.org/jira/browse/YARN-4353
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-4353.prelim.patch
>
>
> When the NM launches an AM, the {{ContainerLocalizer}} gets the current user 
> from {{UserGroupInformation}}, which triggers user group mapping, even though 
> the user groups are never accessed.  If secure LDAP is configured for group 
> mapping, then there are some additional complications created by the 
> unnecessary group resolution.  Additionally, it adds unnecessary latency to 
> the container launch time.
> To address the issue, before getting the current user, the 
> {{ContainerLocalizer}} should configure {{UserGroupInformation}} with a null 
> group mapping service that quickly and quietly returns an empty group list 
> for all users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4503) Allow for a pluggable policy to decide if a ResourceRequest is GUARANTEED or not

2015-12-23 Thread Arun Suresh (JIRA)
Arun Suresh created YARN-4503:
-

 Summary: Allow for a pluggable policy to decide if a 
ResourceRequest is GUARANTEED or not
 Key: YARN-4503
 URL: https://issues.apache.org/jira/browse/YARN-4503
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun Suresh
Assignee: Arun Suresh


As per discussions on the YARN-2882 thread, specifically [this 
comment|https://issues.apache.org/jira/browse/YARN-2882?focusedCommentId=15065547=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15065547],
 we would require a pluggable policy that can decide if a ResourceRequest is 
GUARANTEED or OPPORTUNISTIC



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2934) Improve handling of container's stderr

2015-12-23 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070337#comment-15070337
 ] 

Gera Shegalov commented on YARN-2934:
-

Hi [~Naganarasimha]. Thanks for updating the patch. 

Things we have not addressed from my previous comments is capping the buffer 
size. But I now think it's good enough because we have a good small default for 
the tail NM_CONTAINER_STDERR_BYTES.

Still please rename:
{code}
-  FileStatus[] listStatus = fileSystem
+  FileStatus[] errorStatuses = fileSystem
{code}
or similar. It's an array of statuses and not status of a list

Let us have a space after ',' and a new line in:
{code}
-  .append(StringUtils.arrayToString(errorFileNames)).append(". ");
+  .append(StringUtils.join(", ", errorFileNames)).append(".\n");
{code}
Fix the test code accordingly

method verifyTailErrorLogOnContainerExit can/should be private. Same for 
ContainerExitHandler class.

Assume.assumeTrue(Shell.LINUX);
should be 
Assume.assumeFalse(Shell.WINDOWS || Shell.OTHER);
but actually why do we need this? The test seems to be platform-independent.

Assert.assertNotNull(exitEvent.getDiagnosticInfo());

seems redundant because you then have other asserts implying this already. I 
suggest to LOG.info the diagnostics instead to make the test log more useful.



> Improve handling of container's stderr 
> ---
>
> Key: YARN-2934
> URL: https://issues.apache.org/jira/browse/YARN-2934
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Gera Shegalov
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-2934.v1.001.patch, YARN-2934.v1.002.patch, 
> YARN-2934.v1.003.patch, YARN-2934.v1.004.patch, YARN-2934.v1.005.patch, 
> YARN-2934.v1.006.patch, YARN-2934.v1.007.patch, YARN-2934.v1.008.patch, 
> YARN-2934.v2.001.patch, YARN-2934.v2.002.patch, YARN-2934.v2.003.patch
>
>
> Most YARN applications redirect stderr to some file. That's why when 
> container launch fails with {{ExitCodeException}} the message is empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2882) Add ExecutionType to denote if a container execution is GUARANTEED or OPPORTUNISTIC

2015-12-23 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070360#comment-15070360
 ] 

Arun Suresh commented on YARN-2882:
---

Hey [~leftnoteasy],

bq. If opportunistic/guaranteed is solely decided by scheduler..
So, it neednt be decided solely by the Scheduler. Taking into consideration 
YARN-1011, if oversubscription is required, yes, this will be decided by the 
Scheduler, else if the NM is configured to support Distributed Scheduling, this 
decision can be made by the LocalScheduler, or via the application of a Policy 
(just created YARN-4503 to track this)

bq. is it possible that AM cannot get container in predictable behavior?
If the above mentioned policy makes the decision on static parameters such as 
locality or container size etc, yes, it should be consistent.. if it is more 
dynamic, for eg. based on load etc, then not so much... but we feel AM should 
not need to know.

bq. Why add limitation to AMs that can only allocate for opportunistic 
resources. 
Apologize if I wasnt clear. what I meant was this : If an AM is not able to 
specifying the type of resource request.. then we can also ensure that 
mis-behaving AMs wont flood the scheduler with only GUARANTEED requests.

> Add ExecutionType to denote if a container execution is GUARANTEED or 
> OPPORTUNISTIC
> ---
>
> Key: YARN-2882
> URL: https://issues.apache.org/jira/browse/YARN-2882
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-2882-yarn-2877.001.patch, 
> YARN-2882-yarn-2877.002.patch, YARN-2882-yarn-2877.003.patch, 
> YARN-2882-yarn-2877.004.patch, yarn-2882.patch
>
>
> This JIRA introduces the notion of container types.
> We propose two initial types of containers: guaranteed-start and queueable 
> containers.
> Guaranteed-start are the existing containers, which are allocated by the 
> central RM and are instantaneously started, once allocated.
> Queueable is a new type of container, which allows containers to be queued in 
> the NM, thus their execution may be arbitrarily delayed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-12-23 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated YARN-4234:
---
Attachment: YARN-4234.addendum.patch

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-4234-2015-11-13.1.patch, 
> YARN-4234-2015-11-16.1.patch, YARN-4234-2015-11-16.2.patch, 
> YARN-4234-2015.2.patch, YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.2015-11-12.1.patch, YARN-4234.2015-11-12.1.patch, 
> YARN-4234.2015-11-18.1.patch, YARN-4234.2015-11-18.2.patch, 
> YARN-4234.2015-11-18.patch, YARN-4234.2015-12-09.patch, 
> YARN-4234.2015-12-09.patch, YARN-4234.2015-12-17.1.patch, 
> YARN-4234.2015-12-18.1.patch, YARN-4234.2015-12-18.patch, 
> YARN-4234.2015-12-21.1.patch, YARN-4234.20151109.patch, 
> YARN-4234.20151110.1.patch, YARN-4234.2015.1.patch, YARN-4234.3.patch, 
> YARN-4234.addendum.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4098) Document ApplicationPriority feature

2015-12-23 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070526#comment-15070526
 ] 

Rohith Sharma K S commented on YARN-4098:
-

bq. may be this feature allows applications to be submitted and scheduled with 
different priorities.
I did not get what is the change to be done, I see both are sentences are same. 
I think something missed.



> Document ApplicationPriority feature
> 
>
> Key: YARN-4098
> URL: https://issues.apache.org/jira/browse/YARN-4098
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4098.patch, 0001-YARN-4098.patch, 
> 0002-YARN-4098.patch, 0003-YARN-4098.patch, YARN-4098.rar
>
>
> This JIRA is to track documentation of application priority and its user, 
> admin and REST interfaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4462) FairScheduler: Disallow preemption from a queue

2015-12-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070557#comment-15070557
 ] 

Hadoop QA commented on YARN-4462:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
22s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
36s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s 
{color} | {color:red} Patch generated 6 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 76, now 79). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 8 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 43s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 29s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 35s 
{color} | {color:red} Patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 154m 21s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA 

[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications

2015-12-23 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070556#comment-15070556
 ] 

Rohith Sharma K S commented on YARN-4479:
-

I had 2 options in doing this in fifoordering policy. I took simpler approach 
to make working patch. Further improvements like this will/can be addressed in 
coming patches once initial approach is agreed upon.

> Retrospect app-priority in pendingOrderingPolicy during recovering 
> applications
> ---
>
> Key: YARN-4479
> URL: https://issues.apache.org/jira/browse/YARN-4479
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch
>
>
> Currently, same ordering policy is used for pending applications and active 
> applications. When priority is configured for an applications, during 
> recovery high priority application get activated first. It is possible that 
> low priority job was submitted and running state. 
> This causes low priority job in starvation after recovery



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4496) Improve HA ResourceManager Failover detection on the client

2015-12-23 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070140#comment-15070140
 ] 

Subru Krishnan commented on YARN-4496:
--

+1. Thanks [~asuresh] for initiating this. To add more context, our deployments 
are fairly large with multiple secondaries which is resulting in considerable 
connection latencies based on the current failover proxy.

> Improve HA ResourceManager Failover detection on the client
> ---
>
> Key: YARN-4496
> URL: https://issues.apache.org/jira/browse/YARN-4496
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client, resourcemanager
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>
> HDFS deployments can currently use the {{RequestHedgingProxyProvider}} to 
> improve Namenode failover detection in the client. It does this by 
> concurrently trying all namenodes and picks the namenode that returns the 
> fastest with a successful response as the active node.
> It would be useful to have a similar ProxyProvider for the Yarn RM (it can 
> possibly be done by converging some the class hierarchies to use the same 
> ProxyProvider)
> This would especially be useful for large YARN deployments with multiple 
> standby RMs where clients will be able to pick the active RM without having 
> to traverse a list of configured RMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling

2015-12-23 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070178#comment-15070178
 ] 

Subru Krishnan commented on YARN-3870:
--

+1 on this.

Thanks [~grey] for raising this. I have been having offline discussions with 
[~asuresh] and [~curino] around Distributed Scheduling (YARN-2877) and 
Federation (YARN-2915). In both scenarios, sending the raw container request 
and letting the RM expand will save us a lot of pain as currently we are 
finding it very difficult to route requests correctly in the AMRMProxy 
(YARN-2844) 

> Providing raw container request information for fine scheduling
> ---
>
> Key: YARN-3870
> URL: https://issues.apache.org/jira/browse/YARN-3870
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, applications, capacityscheduler, fairscheduler, 
> resourcemanager, scheduler, yarn
>Reporter: Lei Guo
>
> Currently, when AM sends container requests to RM and scheduler, it expands 
> individual container requests into host/rack/any format. For instance, if I 
> am asking for container request with preference "host1, host2, host3", 
> assuming all are in the same rack rack1, instead of sending one raw container 
> request to RM/Scheduler with raw preference list, it basically expand it to 
> become 5 different objects with host1, host2, host3, rack1 and any in there. 
> When scheduler receives information, it basically already lost the raw 
> request. This is ok for single container request, but it will cause trouble 
> when dealing with multiple container requests from the same application. 
> Consider this case:
> 6 hosts, two racks:
> rack1 (host1, host2, host3) rack2 (host4, host5, host6)
> When application requests two containers with different data locality 
> preference:
> c1: host1, host2, host4
> c2: host2, host3, host5
> This will end up with following container request list when client sending 
> request to RM/Scheduler:
> host1: 1 instance
> host2: 2 instances
> host3: 1 instance
> host4: 1 instance
> host5: 1 instance
> rack1: 2 instances
> rack2: 2 instances
> any: 2 instances
> Fundamentally, it is hard for scheduler to make a right judgement without 
> knowing the raw container request. The situation will get worse when dealing 
> with affinity and anti-affinity or even gang scheduling etc.
> We need some way to provide raw container request information for fine 
> scheduling purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-23 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070212#comment-15070212
 ] 

Varun Saxena commented on YARN-4224:


Thanks [~leftnoteasy].
For entities endpoint would  /ws/v2/timeline/apps/\{app UID\}/\{entitytype\} be 
fine for UI ? This would be a slight deviation from other endpoints because 
entity type cannot be put as part of UID in previous(parent) response
For querying app attempts entity type will be YARN_APP_ATTEMPT and for 
containers it will be YARN_CONTAINER i.e. endpoints will basically be 
/ws/v2/timeline/apps/\{app UID\}/YARN_APP_ATTEMPT and 
/ws/v2/timeline/apps/\{app UID\}/YARN_CONTAINER respectively.
I dont think in UI we will be displaying all possible generic entity types. 
Only app attempts and containers will be required.

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-23 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070218#comment-15070218
 ] 

Varun Saxena commented on YARN-4224:


Another option would be to make entities endpoint as 
{{/ws/v2/timeline/apps/\{app UID\}/entities?entityType=...}}. However this will 
be a mandatory param(there will be check at server side).
Pls note that hierarchical REST endpoint has been kept as 
{{/ws/v2/timeline/apps/\{appid\}/entities/\{entitytype\}}}. Pls note app UID 
and app id are not the same thing. We need some differentiation between UID 
endpoint and hierarchical endpoint because if we follow general scheme the 
endpoints will clash.
Although mandatory params in REST are part of path param generally but I guess 
we have no other option here. For UID,  we can put entity type as query param 
and hierarchical endpoint a path param. 
Its confusing anyways.

Or should we have endpoints like  {{/ws/v2/timeline/runsUID/\{run UID\}/apps}}, 
 {{/ws/v2/timeline/appsUID/\{app UID\}}}, {{/ws/v2/timeline/appsUID/\{app 
UID\}/entities/\{entitytype\}}}, thereby clearly indicating that UID is being 
passed and avoiding conflict as mentioned above.
Thoughts ?

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-23 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070243#comment-15070243
 ] 

Li Lu commented on YARN-4224:
-

Yes, I think this is fine for entities. The root cause of this is that entities 
need both id and type to be uniquely identified. For UID based queries we can 
pass type as a query parameter. For the hierarchical endpoints, type is modeled 
as a part of entity ids (we have to do this to uniquely id an entity). The 
clash will happen if we hit the .../apps endpoint, and we have to distinguish 
those two cases. 

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-23 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070302#comment-15070302
 ] 

Wangda Tan commented on YARN-4224:
--

Hi [~varun_saxena], [~gtCarrera],

bq. Currently query without entity type is not supported
I feel that we should split API-design and internal implementation, it is quite 
possible that web UI wants to make a single RPC call, pull more rich 
application entities (aka, all entities in one app), and render charts locally. 
It's fine if the currently implementation doesn't support it, we can return bad 
response if we cannot support now. But it will be important to make a 
extensible REST API that we can support it in the future without semantics 
change.

Thoughts?

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-23 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070302#comment-15070302
 ] 

Wangda Tan edited comment on YARN-4224 at 12/23/15 11:13 PM:
-

Hi [~varun_saxena], [~gtCarrera],

bq. Currently query without entity type is not supported
I feel that we should split API-design and internal implementation, it is quite 
possible that web UI wants to make a single REST call, pull more rich 
application entities (aka, all entities in one app), and render charts locally. 
It's fine if the currently implementation doesn't support it, we can return bad 
response if we cannot support now. But it will be important to make a 
extensible REST API that we can support it in the future without semantics 
change.

Thoughts?


was (Author: leftnoteasy):
Hi [~varun_saxena], [~gtCarrera],

bq. Currently query without entity type is not supported
I feel that we should split API-design and internal implementation, it is quite 
possible that web UI wants to make a single RPC call, pull more rich 
application entities (aka, all entities in one app), and render charts locally. 
It's fine if the currently implementation doesn't support it, we can return bad 
response if we cannot support now. But it will be important to make a 
extensible REST API that we can support it in the future without semantics 
change.

Thoughts?

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-23 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070141#comment-15070141
 ] 

Varun Saxena commented on YARN-4224:


Sorry for entities, we cannot really have an endpoint as 
/ws/v2/timeline/apps/\{app UID\}/entities/\{entitytype\} because this will 
clash with hierarchical endpoint for entities.

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching

2015-12-23 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-4265:

Attachment: YARN-4265-trunk.001.patch

Thanks [~djp]! I just rebased my patch to the latest trunk. 

> Provide new timeline plugin storage to support fine-grained entity caching
> --
>
> Key: YARN-4265
> URL: https://issues.apache.org/jira/browse/YARN-4265
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4265-trunk.001.patch, 
> YARN-4265-trunk.poc_001.patch, YARN-4265.YARN-4234.001.patch, 
> YARN-4265.YARN-4234.002.patch
>
>
> To support the newly proposed APIs in YARN-4234, we need to create a new 
> plugin timeline store. The store may have similar behavior as the 
> EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id 
> granularity, instead of application id granularity. Let's have this storage 
> as a standalone one, instead of updating EntityFileTimelineStore, to keep the 
> existing store (EntityFileTimelineStore) stable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching

2015-12-23 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-4265:

Attachment: (was: YARN-4265-trunk.poc_001.patch)

> Provide new timeline plugin storage to support fine-grained entity caching
> --
>
> Key: YARN-4265
> URL: https://issues.apache.org/jira/browse/YARN-4265
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4265-trunk.001.patch, 
> YARN-4265.YARN-4234.001.patch, YARN-4265.YARN-4234.002.patch
>
>
> To support the newly proposed APIs in YARN-4234, we need to create a new 
> plugin timeline store. The store may have similar behavior as the 
> EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id 
> granularity, instead of application id granularity. Let's have this storage 
> as a standalone one, instead of updating EntityFileTimelineStore, to keep the 
> existing store (EntityFileTimelineStore) stable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-23 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070198#comment-15070198
 ] 

Wangda Tan commented on YARN-4224:
--

Thanks [~varun_saxena],

Synced with [~gtCarrera] about this, I think it's fine to me to have two 
hierarchy ({{.timeline/\{parent\}/childrens}} to locate entities such as apps 
within a flow, flowruns within a flow. I don' have strong opinion between the 
two-hierarchy API OR adding parent-id to query parameter 
({{timeline/apps/flowrun=\{flowrun_uid\}}}. 
The most important things to me for the REST API is allowing client locate 
single object at one hierarchy (such as {{timeline/flowruns/\{flowrun_uid\}}}. 
I think we're on the same page for this.

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-23 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070198#comment-15070198
 ] 

Wangda Tan edited comment on YARN-4224 at 12/23/15 9:11 PM:


Thanks [~varun_saxena],

Synced with [~gtCarrera] about this, I think it's fine to me to have two 
hierarchy ({{.timeline/\{parent\}/childrens}} to locate entities such as apps 
within a flow, flowruns within a flow. I don' have strong opinion between the 
two-hierarchy API OR adding parent-id to query parameter 
({{timeline/apps/?flowrun=\{flowrun_uid\}}}. 
The most important things to me for the REST API is allowing client locate 
single object at one hierarchy (such as {{timeline/flowruns/\{flowrun_uid\}}}. 
I think we're on the same page for this.


was (Author: leftnoteasy):
Thanks [~varun_saxena],

Synced with [~gtCarrera] about this, I think it's fine to me to have two 
hierarchy ({{.timeline/\{parent\}/childrens}} to locate entities such as apps 
within a flow, flowruns within a flow. I don' have strong opinion between the 
two-hierarchy API OR adding parent-id to query parameter 
({{timeline/apps/flowrun=\{flowrun_uid\}}}. 
The most important things to me for the REST API is allowing client locate 
single object at one hierarchy (such as {{timeline/flowruns/\{flowrun_uid\}}}. 
I think we're on the same page for this.

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-23 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070232#comment-15070232
 ] 

Varun Saxena commented on YARN-4224:


Well for hierarchical endpoint, we have something like 
{{/ws/v2/timeline/apps/\{appid}/entities/\{entityType\} as endpoint. Shouldnt 
they be consistent ? If they are consistent, they will clash. 
Maybe for UID, we can go with query param for entity types because UID endpoint 
will primarily be called from UI and entity type always supplied. Default limit 
for number of entities is 100.

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling

2015-12-23 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070279#comment-15070279
 ] 

Arun Suresh commented on YARN-3870:
---

[~leftnoteasy], With respect to the AM, I was thinking.. just having it as a 
field in the ReseourceRequest as well as the Container (returned by the 
allocate call) would suffice.
>From the perspective of the Scheduler, yes, {{Map = >>}} was the direction I was thinking.. Correct me if I 
>am wrong, but, currently, there is an implicit understanding that all 
>resources requests for the same resource requirement should have the same 
>priority. Having an explicit request id would allow us to remove that 
>constraint as well..

> Providing raw container request information for fine scheduling
> ---
>
> Key: YARN-3870
> URL: https://issues.apache.org/jira/browse/YARN-3870
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, applications, capacityscheduler, fairscheduler, 
> resourcemanager, scheduler, yarn
>Reporter: Lei Guo
>
> Currently, when AM sends container requests to RM and scheduler, it expands 
> individual container requests into host/rack/any format. For instance, if I 
> am asking for container request with preference "host1, host2, host3", 
> assuming all are in the same rack rack1, instead of sending one raw container 
> request to RM/Scheduler with raw preference list, it basically expand it to 
> become 5 different objects with host1, host2, host3, rack1 and any in there. 
> When scheduler receives information, it basically already lost the raw 
> request. This is ok for single container request, but it will cause trouble 
> when dealing with multiple container requests from the same application. 
> Consider this case:
> 6 hosts, two racks:
> rack1 (host1, host2, host3) rack2 (host4, host5, host6)
> When application requests two containers with different data locality 
> preference:
> c1: host1, host2, host4
> c2: host2, host3, host5
> This will end up with following container request list when client sending 
> request to RM/Scheduler:
> host1: 1 instance
> host2: 2 instances
> host3: 1 instance
> host4: 1 instance
> host5: 1 instance
> rack1: 2 instances
> rack2: 2 instances
> any: 2 instances
> Fundamentally, it is hard for scheduler to make a right judgement without 
> knowing the raw container request. The situation will get worse when dealing 
> with affinity and anti-affinity or even gang scheduling etc.
> We need some way to provide raw container request information for fine 
> scheduling purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4502) Sometimes Two AM containers get launched

2015-12-23 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4502:
-
Labels: 2.6.4-candidate  (was: )

> Sometimes Two AM containers get launched
> 
>
> Key: YARN-4502
> URL: https://issues.apache.org/jira/browse/YARN-4502
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Wangda Tan
>Priority: Critical
>  Labels: 2.6.4-candidate
>
> Scenario : 
> * set yarn.resourcemanager.am.max-attempts = 2
> * start dshell application
> {code}
>  yarn  org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
> hadoop-yarn-applications-distributedshell-*.jar 
> -attempt_failures_validity_interval 6 -shell_command "sleep 150" 
> -num_containers 16
> {code}
> * Kill AM pid
> * Print container list for 2nd attempt
> {code}
> yarn container -list appattempt_1450825622869_0001_02
> INFO impl.TimelineClientImpl: Timeline service address: 
> http://xxx:port/ws/v1/timeline/
> INFO client.RMProxy: Connecting to ResourceManager at xxx/10.10.10.10:
> Total number of containers :2
> Container-Id Start Time Finish Time   
> StateHost   Node Http Address 
>LOG-URL
> container_e12_1450825622869_0001_02_02 Tue Dec 22 23:07:35 + 2015 
>   N/A RUNNINGxxx:25454   http://xxx:8042 
> http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_02/hrt_qa
> container_e12_1450825622869_0001_02_01 Tue Dec 22 23:07:34 + 2015 
>   N/A RUNNINGxxx:25454   http://xxx:8042 
> http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_01/hrt_qa
> {code}
> * look for new AM pid 
> Here, 2nd AM container was suppose to be started on  
> container_e12_1450825622869_0001_02_01. But AM was not launched on 
> container_e12_1450825622869_0001_02_01. It was in AQUIRED state. 
> On other hand, container_e12_1450825622869_0001_02_02 got the AM running. 
> Expected behavior: RM should not start 2 containers for starting AM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4502) Sometimes Two AM containers get launched

2015-12-23 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4502:
-
Labels:   (was: 2.6.4-candidate)

> Sometimes Two AM containers get launched
> 
>
> Key: YARN-4502
> URL: https://issues.apache.org/jira/browse/YARN-4502
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Wangda Tan
>Priority: Critical
>
> Scenario : 
> * set yarn.resourcemanager.am.max-attempts = 2
> * start dshell application
> {code}
>  yarn  org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
> hadoop-yarn-applications-distributedshell-*.jar 
> -attempt_failures_validity_interval 6 -shell_command "sleep 150" 
> -num_containers 16
> {code}
> * Kill AM pid
> * Print container list for 2nd attempt
> {code}
> yarn container -list appattempt_1450825622869_0001_02
> INFO impl.TimelineClientImpl: Timeline service address: 
> http://xxx:port/ws/v1/timeline/
> INFO client.RMProxy: Connecting to ResourceManager at xxx/10.10.10.10:
> Total number of containers :2
> Container-Id Start Time Finish Time   
> StateHost   Node Http Address 
>LOG-URL
> container_e12_1450825622869_0001_02_02 Tue Dec 22 23:07:35 + 2015 
>   N/A RUNNINGxxx:25454   http://xxx:8042 
> http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_02/hrt_qa
> container_e12_1450825622869_0001_02_01 Tue Dec 22 23:07:34 + 2015 
>   N/A RUNNINGxxx:25454   http://xxx:8042 
> http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_01/hrt_qa
> {code}
> * look for new AM pid 
> Here, 2nd AM container was suppose to be started on  
> container_e12_1450825622869_0001_02_01. But AM was not launched on 
> container_e12_1450825622869_0001_02_01. It was in AQUIRED state. 
> On other hand, container_e12_1450825622869_0001_02_02 got the AM running. 
> Expected behavior: RM should not start 2 containers for starting AM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4083) Add a discovery mechanism for the scheduler address

2015-12-23 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070131#comment-15070131
 ] 

Arun Suresh commented on YARN-4083:
---

Hoping to get some consensus on this.. since it is required for YARN-2877 as 
well.

I feel having the ContainerExecutor expose a *YARN_SCHEDULER_ADDRESS* 
environment variable (and as [~jianhe] mentioned, maybe let it be a list, with 
the first entry being the local NM and the remaining a list of RM addresses to 
allow for failover) should work across across Java and non-java applications.

It would also be somewhat dynamic, as [~steve_l] mentioned, since the value is 
decided by the NM right before it launches a container, but unlike a global/ZK 
based registry, it can be different for different containers / applications 
(although it would not change during the lifetime of the container).

bq. how do AM IP filters know when to bounce an HTTP Request over to the proxy
My understanding (atleast our requirement for YARN-2877) is that this would be 
used by the AM specifically for resolving the address for the server end of the 
ApplicationMasterProtocol, so HTTP addresses can be specified probably via 
another env variable maybe ?

bq. How does this work when the container is actually a Linux container and not 
a fake yarn-level container ?
[~aw], apologize if I did not fully understand, but I feel an environment 
variable should be accessible by linux, windows and other containers.

Thoughts ?


> Add a discovery mechanism for the scheduler address
> ---
>
> Key: YARN-4083
> URL: https://issues.apache.org/jira/browse/YARN-4083
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>
> Today many apps like Distributed Shell, REEF, etc rely on the fact that the 
> HADOOP_CONF_DIR of the NM is on the classpath to discover the scheduler 
> address. This JIRA proposes the addition of an explicit discovery mechanism 
> for the scheduler address



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-23 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070133#comment-15070133
 ] 

Varun Saxena commented on YARN-4224:


Based on tonight's discussion, UID endpoints can look as under :

{panel}
*Query multiple flows* : Endpoint is */ws/v2/timeline/flows or 
/ws/v2/timeline/\{clusterid\}/flows*. This query will return a UID of the form 
*cluster:user:flowname* for each flow name.
*Query multiple flowruns* : Endpoint is */ws/v2/timeline/flows/\{flow 
UID\}/runs* where flow UID is of the form *cluster:user:flowname* i.e. the one 
returned in query above. This query returns a UID of the form 
*cluster:user:flowname:runid* for each flow run.
*Query single flowrun* : Endpoint is */ws/v2/timeline/runs/\{flowrun UID\}* 
where flowrun UID is of the form *cluster:user:flowname:runid* i.e. the one 
returned in query above. This query also returns a UID of the form 
*cluster:user:flowname:runid* for the flowrun returned.
*Query multiple apps in a flowrun* : Endpoint can be 
*/ws/v2/timeline/runs/\{flowrun UID\}/apps* where flowrun UID is of the form 
*cluster:user:flowname:runid*. This query also returns a UID of the form 
*cluster:user:flowname:runid:appid* for each app ret
urned.
*Query single app* : Endpoint can be */ws/v2/timeline/apps/\{app UID\}* where 
app UID is of the form *cluster:user:flowname:runid:appid* i.e. the one 
returned in query above.
*Query Entities* : Endpoint can be */ws/v2/timeline/apps/\{app 
UID\}/entities/\{entitytype\}* or */ws/v2/timeline/apps/\{app 
UID\}/\{entitytype\}*. Thoughts ? Entity type is separate because we cannot 
know entity type when we query apps.This query also returns a UID of the form 
*cluster:user:flowname:runid:appid:entitytype:entityid* for each entity 
returned.
*Query Entity* : Endpoint can be */ws/v2/timeline/entities/\{entity UID\}* 
where entity UID is of the form 
*cluster:user:flowname:runid:appid:entitytype:entityid*
{panel}

* One more question we need to discuss is whether UID is really important to be 
sent from timeline reader ? Or client can construct it. Basically can Ember 
construct it ? Please note that things like users, flows, etc. i.e. flow 
context information will not be available in app query or entity query 
response. So Ember cannot easily fetch it from REST response. Or would it be 
easier for Ember if UID came in response.

If UID has to come in response, we can probably elevate it to TimelineEntity as 
an extra field. Also as discussed, construction of UID can be done in Timeline 
Reader Manager instead of storage layer.

cc [~sjlee0], [~gtCarrera9], [~leftnoteasy]
Lets reach a consensus and conclude this before holidays.

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4424) Fix deadlock in RMAppImpl

2015-12-23 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070180#comment-15070180
 ] 

Vinod Kumar Vavilapalli commented on YARN-4424:
---

This originally never made it to branch-2.7.2 even though the fix version is 
set so. Tx to [~djp] for catching this.

I just cherry-picked it for rolling a new RC for 2.7.2. FYI.

> Fix deadlock in RMAppImpl
> -
>
> Key: YARN-4424
> URL: https://issues.apache.org/jira/browse/YARN-4424
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Jian He
>Priority: Blocker
> Fix For: 2.7.2, 2.6.3
>
> Attachments: YARN-4424.1.patch
>
>
> {code}
> yarn@XXX:/mnt/hadoopqe$ /usr/hdp/current/hadoop-yarn-client/bin/yarn 
> application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING
> 15/12/04 21:59:54 INFO impl.TimelineClientImpl: Timeline service address: 
> http://XXX:8188/ws/v1/timeline/
> 15/12/04 21:59:54 INFO client.RMProxy: Connecting to ResourceManager at 
> XXX/0.0.0.0:8050
> 15/12/04 21:59:55 INFO client.AHSProxy: Connecting to Application History 
> server at XXX/0.0.0.0:10200
> {code}
> {code:title=RM log}
> 2015-12-04 21:59:19,744 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 237000
> 2015-12-04 22:00:50,945 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 238000
> 2015-12-04 22:02:22,416 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 239000
> 2015-12-04 22:03:53,593 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 24
> 2015-12-04 22:05:24,856 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 241000
> 2015-12-04 22:06:56,235 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 242000
> 2015-12-04 22:08:27,510 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 243000
> 2015-12-04 22:09:58,786 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 244000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-23 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070246#comment-15070246
 ] 

Li Lu commented on YARN-4224:
-

After all these discussion, I think it will be helpful to come up with a write 
up for our REST API designs. We can post the write up here so that it's much 
simpler to have a big picture of our reader REST APIs? I can certainly help on 
this. 

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4502) Sometimes Two AM containers get launched

2015-12-23 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4502:
-
Target Version/s: 2.6.4

> Sometimes Two AM containers get launched
> 
>
> Key: YARN-4502
> URL: https://issues.apache.org/jira/browse/YARN-4502
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Wangda Tan
>Priority: Critical
>
> Scenario : 
> * set yarn.resourcemanager.am.max-attempts = 2
> * start dshell application
> {code}
>  yarn  org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
> hadoop-yarn-applications-distributedshell-*.jar 
> -attempt_failures_validity_interval 6 -shell_command "sleep 150" 
> -num_containers 16
> {code}
> * Kill AM pid
> * Print container list for 2nd attempt
> {code}
> yarn container -list appattempt_1450825622869_0001_02
> INFO impl.TimelineClientImpl: Timeline service address: 
> http://xxx:port/ws/v1/timeline/
> INFO client.RMProxy: Connecting to ResourceManager at xxx/10.10.10.10:
> Total number of containers :2
> Container-Id Start Time Finish Time   
> StateHost   Node Http Address 
>LOG-URL
> container_e12_1450825622869_0001_02_02 Tue Dec 22 23:07:35 + 2015 
>   N/A RUNNINGxxx:25454   http://xxx:8042 
> http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_02/hrt_qa
> container_e12_1450825622869_0001_02_01 Tue Dec 22 23:07:34 + 2015 
>   N/A RUNNINGxxx:25454   http://xxx:8042 
> http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_01/hrt_qa
> {code}
> * look for new AM pid 
> Here, 2nd AM container was suppose to be started on  
> container_e12_1450825622869_0001_02_01. But AM was not launched on 
> container_e12_1450825622869_0001_02_01. It was in AQUIRED state. 
> On other hand, container_e12_1450825622869_0001_02_02 got the AM running. 
> Expected behavior: RM should not start 2 containers for starting AM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3995) Some of the NM events are not getting published due race condition when AM container finishes in NM

2015-12-23 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070185#comment-15070185
 ] 

Vrushali C commented on YARN-3995:
--

Hi [~Naganarasimha]

Thanks for the thoughts on the jira. I was wondering if the following is a 
feasible solution:

- can the NM container maintain a list/map info of  “zombie app ids” for 
AMs/collectors that it is removing?  That way when metrics arrive at the NM 
from other NMs for those zombie app ids, it can see if this was for an app that 
previously had a collector and hence most likely still a valid metric/entity 
and then somehow write that to the backend, perhaps via a “common parent 
collector” process or something.

- we can have the NM periodically prune  this zombie list, perhaps say a few 
days after app completion, remove the info for that app from the zombie app 
list.

I am not too knowledgeable about the NM and so not sure if this is 
complicated/infeasible. 


> Some of the NM events are not getting published due race condition when AM 
> container finishes in NM 
> 
>
> Key: YARN-3995
> URL: https://issues.apache.org/jira/browse/YARN-3995
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Affects Versions: YARN-2928
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
>
> As discussed in YARN-3045:  While testing in TestDistributedShell found out 
> that few of the container metrics events were failing as there will be race 
> condition. When the AM container finishes and removes the collector for the 
> app, still there is possibility that all the events published for the app by 
> the current NM and other NM are still in pipeline, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4502) Sometimes Two AM containers get launched

2015-12-23 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned YARN-4502:


Assignee: Wangda Tan

> Sometimes Two AM containers get launched
> 
>
> Key: YARN-4502
> URL: https://issues.apache.org/jira/browse/YARN-4502
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Wangda Tan
>Priority: Critical
>
> Scenario : 
> * set yarn.resourcemanager.am.max-attempts = 2
> * start dshell application
> {code}
>  yarn  org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
> hadoop-yarn-applications-distributedshell-*.jar 
> -attempt_failures_validity_interval 6 -shell_command "sleep 150" 
> -num_containers 16
> {code}
> * Kill AM pid
> * Print container list for 2nd attempt
> {code}
> yarn container -list appattempt_1450825622869_0001_02
> INFO impl.TimelineClientImpl: Timeline service address: 
> http://xxx:port/ws/v1/timeline/
> INFO client.RMProxy: Connecting to ResourceManager at xxx/10.10.10.10:
> Total number of containers :2
> Container-Id Start Time Finish Time   
> StateHost   Node Http Address 
>LOG-URL
> container_e12_1450825622869_0001_02_02 Tue Dec 22 23:07:35 + 2015 
>   N/A RUNNINGxxx:25454   http://xxx:8042 
> http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_02/hrt_qa
> container_e12_1450825622869_0001_02_01 Tue Dec 22 23:07:34 + 2015 
>   N/A RUNNINGxxx:25454   http://xxx:8042 
> http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_01/hrt_qa
> {code}
> * look for new AM pid 
> Here, 2nd AM container was suppose to be started on  
> container_e12_1450825622869_0001_02_01. But AM was not launched on 
> container_e12_1450825622869_0001_02_01. It was in AQUIRED state. 
> On other hand, container_e12_1450825622869_0001_02_02 got the AM running. 
> Expected behavior: RM should not start 2 containers for starting AM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-23 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070306#comment-15070306
 ] 

Li Lu commented on YARN-4224:
-

Thanks [~leftnoteasy]! I agree that we should separate the semantics and 
implementations. Our web UI, as one user of the REST API, does not really need 
general queries for timeline entities (I can always attach an entity type if 
needed). However, as from the API design perspective, I'd hope our API to be 
general enough. Having APIs like "list all entities within one application" may 
seems too ambitious for implementations, but something like "on this end point 
I assume you want all entities for this application, but to avoid crash myself 
I'm only returning a part of it" looks fine. However, enforcing an entity type 
to all such queries and add them as part of the end point looks a little bit 
suboptimal (it also changes the way we organize resources). 

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-23 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070221#comment-15070221
 ] 

Varun Saxena commented on YARN-4224:


Correction - "For UID, we can put entity type as query param and for 
hierarchical endpoint put entity type a path param." But thats not consistent.

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications

2015-12-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070242#comment-15070242
 ] 

Hadoop QA commented on YARN-4479:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
5s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
36s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s 
{color} | {color:red} Patch generated 10 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 373, now 379). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 3 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 3m 10s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91
 with JDK v1.7.0_91 generated 1 new issues (was 2, now 3). {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 54s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 50s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 23s 
{color} | {color:red} Patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 152m 28s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 

[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-23 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070240#comment-15070240
 ] 

Varun Saxena commented on YARN-4224:


Pls note this is specific to entities endpoint only.

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-23 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070248#comment-15070248
 ] 

Varun Saxena commented on YARN-4224:


Yes, we can have a writeup. This will be useful during eventual documentation 
as well.

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-23 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070237#comment-15070237
 ] 

Varun Saxena commented on YARN-4224:


Or we can let it clash. If string has decided delimiters, we consider it as 
UID, otherwise app id.

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4343) Need to support Application History Server on ATSV2

2015-12-23 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070239#comment-15070239
 ] 

Vrushali C commented on YARN-4343:
--

Hi [~Naganarasimha]

Towards the end of today's call, you had mentioned this jira id as one of the 
jiras you wanted some feedback on. I think we discussed this in today's call, 
more or less? I looked through the previous comments and wanted to say that 
when you get the chance, do layout your proposal so that we can review this 
further.

thanks
Vrushali


> Need to support Application History Server on ATSV2
> ---
>
> Key: YARN-4343
> URL: https://issues.apache.org/jira/browse/YARN-4343
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>
> AHS is used by the CLI and Webproxy(REST), if the application related 
> information is not found in RM then it tries to fetch from AHS and show



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-23 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070256#comment-15070256
 ] 

Varun Saxena commented on YARN-4224:


bq.  For UID based queries we can pass type as a query parameter. For the 
hierarchical endpoints, type is modeled as a part of entity ids (we have to do 
this to uniquely id an entity).
IIUC, you mean that we can have endpoint as /ws/v2/timeline/apps/\{app 
UID\}/entities?entityType=... for UID 
and,
endpoint as  /ws/v2/timeline/apps/\{app UID\}/entities/\{entitytype\} for 
hierarchical REST URL. 
Lets reach an agreement on this then

Frankly a query without entity type wont be very useful, but lets do this for 
differentiation. Any issues in making a check for entityType not being supplied 
though(other than that it is a query param) ? Currently query without entity 
type is not supported. Some changes, although minor, will have to be made in 
storage layer for this.


> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4502) Sometimes Two AM containers get launched

2015-12-23 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-4502:


 Summary: Sometimes Two AM containers get launched
 Key: YARN-4502
 URL: https://issues.apache.org/jira/browse/YARN-4502
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora
Priority: Critical


Scenario : 
* set yarn.resourcemanager.am.max-attempts = 2
* start dshell application
{code}
 yarn  org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
hadoop-yarn-applications-distributedshell-*.jar 
-attempt_failures_validity_interval 6 -shell_command "sleep 150" 
-num_containers 16
{code}
* Kill AM pid
* Print container list for 2nd attempt
{code}
yarn container -list appattempt_1450825622869_0001_02
INFO impl.TimelineClientImpl: Timeline service address: 
http://xxx:port/ws/v1/timeline/
INFO client.RMProxy: Connecting to ResourceManager at xxx/10.10.10.10:
Total number of containers :2
Container-Id Start Time Finish Time 
  StateHost   Node Http Address 
   LOG-URL
container_e12_1450825622869_0001_02_02 Tue Dec 22 23:07:35 + 2015   
N/A RUNNINGxxx:25454   http://xxx:8042 
http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_02/hrt_qa
container_e12_1450825622869_0001_02_01 Tue Dec 22 23:07:34 + 2015   
N/A RUNNINGxxx:25454   http://xxx:8042 
http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_01/hrt_qa
{code}
* look for new AM pid 


Here, 2nd AM container was suppose to be started on  
container_e12_1450825622869_0001_02_01. But AM was not launched on 
container_e12_1450825622869_0001_02_01. It was in AQUIRED state. 
On other hand, container_e12_1450825622869_0001_02_02 got the AM running. 

Expected behavior: RM should not start 2 containers for starting AM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-23 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070266#comment-15070266
 ] 

Varun Saxena commented on YARN-4224:


An important point. For entity table, the row keys are not sorted by created 
time. So when we fetch records from HBase, a limit of 100 for instance does not 
mean that we stop after fetching first 100 records. We will continue fetching 
records till row prefix matches and keep on removing the last entity based on 
created time to limit entities to 100. So, quite a few rows are scanned. If we 
do not make entity type mandatory, this would mean scan of even more rows, 
especially when for generic entity table, entity type can be anything. So I 
would prefer to have a check for entity type being required mandatorily.

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4502) Sometimes Two AM containers get launched

2015-12-23 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070289#comment-15070289
 ] 

Wangda Tan commented on YARN-4502:
--

Thanks for [~yeshavora] reported this issue.

Looked at this issue with [~jianhe]/[~vinodkv], root cause of this problem is:
- After YARN-3535, all containers transition from ALLOCATED to KILLED state 
will be re-added to scheduler. And such resource request will be added to 
*current* scheduler application attempt.
- If some containers are in ALLOCATED state and AM crashes, resource requests 
of these containers could be added to *new* scheduler application attempt.
- When the new application attempt request AM container, it calls
{code}
// AM resource has been checked when submission
Allocation amContainerAllocation =
appAttempt.scheduler.allocate(appAttempt.applicationAttemptId,
Collections.singletonList(appAttempt.amReq),
EMPTY_CONTAINER_RELEASE_LIST, null, null);
if (amContainerAllocation != null
&& amContainerAllocation.getContainers() != null) {
  assert (amContainerAllocation.getContainers().size() == 0);
}
{code}
Some containers could be allocated of this scheduler.allocate call, these 
container will be ignored because the following *assert* is not enabled in 
production environment.
- So this results to some container could be possibly leaked when we allocating 
retried AM containers.

*Possible fixes*:
1) Release all allocated container of {{amContainerAllocation.getContainers()}}
OR
2) Instead of using {{getCurrentAttemptForContainer}} in 
{{AbstractYarnScheduler#recoverResourceRequestForContainer}}, we should only 
recover ResourceRequest to the attempt which includes the container.

> Sometimes Two AM containers get launched
> 
>
> Key: YARN-4502
> URL: https://issues.apache.org/jira/browse/YARN-4502
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Wangda Tan
>Priority: Critical
>
> Scenario : 
> * set yarn.resourcemanager.am.max-attempts = 2
> * start dshell application
> {code}
>  yarn  org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
> hadoop-yarn-applications-distributedshell-*.jar 
> -attempt_failures_validity_interval 6 -shell_command "sleep 150" 
> -num_containers 16
> {code}
> * Kill AM pid
> * Print container list for 2nd attempt
> {code}
> yarn container -list appattempt_1450825622869_0001_02
> INFO impl.TimelineClientImpl: Timeline service address: 
> http://xxx:port/ws/v1/timeline/
> INFO client.RMProxy: Connecting to ResourceManager at xxx/10.10.10.10:
> Total number of containers :2
> Container-Id Start Time Finish Time   
> StateHost   Node Http Address 
>LOG-URL
> container_e12_1450825622869_0001_02_02 Tue Dec 22 23:07:35 + 2015 
>   N/A RUNNINGxxx:25454   http://xxx:8042 
> http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_02/hrt_qa
> container_e12_1450825622869_0001_02_01 Tue Dec 22 23:07:34 + 2015 
>   N/A RUNNINGxxx:25454   http://xxx:8042 
> http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_01/hrt_qa
> {code}
> * look for new AM pid 
> Here, 2nd AM container was suppose to be started on  
> container_e12_1450825622869_0001_02_01. But AM was not launched on 
> container_e12_1450825622869_0001_02_01. It was in AQUIRED state. 
> On other hand, container_e12_1450825622869_0001_02_02 got the AM running. 
> Expected behavior: RM should not start 2 containers for starting AM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-23 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070295#comment-15070295
 ] 

Li Lu commented on YARN-4224:
-

Well, return first 100 entities is just one example (we can even say return one 
random entity within the given application, for example). For API design, we 
don't want implementations to affect our interfaces too much. Entity type is 
not a mandatory part of an entity query, so we can keep it as optional for 
entity queries. 

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-23 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070632#comment-15070632
 ] 

Varun Saxena commented on YARN-4224:


bq. At any rate, I agree that due to the possibility of omission ambiguities 
are perhaps possible. In that case, I suspect using different query nouns might 
be the ultimate solution (e.g. "apps" for the hierachical and "apps-uid" for 
UIDs).
Although, it sounds awkward, I am leaning towards it as well

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-23 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070658#comment-15070658
 ] 

Varun Saxena commented on YARN-4224:


BTW, even in ATSv1 REST endpoint for fetching multiple entities looks like 
{{/ws/v1/timeline/\{entitytype\}}} which means multiple entities are returned 
within the scope of entity type. So there might not be a use case for this. 
Anyways in v2 we can change that with the knowledge that queries without entity 
type maybe slow with HBase implementation. 

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3995) Some of the NM events are not getting published due race condition when AM container finishes in NM

2015-12-23 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070635#comment-15070635
 ] 

Naganarasimha G R commented on YARN-3995:
-

Thanks for the comments [~sjlee0], 
IIUC 2nd point is continuation of the first idea right ?
bq. I am not too knowledgeable about the NM and so not sure if this is 
complicated/infeasible.
{{PerNodeTimelineCollectorsAuxService}} can take this responsibility so i don't 
see any problem to it with NM, right ?

I can think of little modification on top of your idea,
* Once NM notifies the Auxillary service that the app is finished (by container 
finished call in the existing way), {{PerNodeTimelineCollectorsAuxService}} can 
add move this collector to a  zombie collector Map. 
* This map stores the last event published time for the zombie collector. 
* We can have one thread running to check which zombie collector is inactive 
for configurable time period and then remove it

 Thus none of the events are lost till the end. like we can keep this period as 
2 mins and if the collector in the zombie list not active for 2 mins then 
remove it and close it   ?

> Some of the NM events are not getting published due race condition when AM 
> container finishes in NM 
> 
>
> Key: YARN-3995
> URL: https://issues.apache.org/jira/browse/YARN-3995
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Affects Versions: YARN-2928
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
>
> As discussed in YARN-3045:  While testing in TestDistributedShell found out 
> that few of the container metrics events were failing as there will be race 
> condition. When the AM container finishes and removes the collector for the 
> app, still there is possibility that all the events published for the app by 
> the current NM and other NM are still in pipeline, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-23 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070634#comment-15070634
 ] 

Varun Saxena commented on YARN-4224:


In short, limit to number of entities to return wont have any impact on number 
of rows to scan. We will have to scan all possible rows for that row prefix.

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-23 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070649#comment-15070649
 ] 

Varun Saxena commented on YARN-4224:


bq. For API design, we don't want implementations to affect our interfaces too 
much
That is a fair point.
But then our main implementation of HBase may not be able to support it with 
good performance.
And frankly if we keep entity type as optional query param, shouldn't we keep 
it optional even for hierarchical endpoint ? Why only for UID endpoint.

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient

2015-12-23 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070652#comment-15070652
 ] 

Naganarasimha G R commented on YARN-3367:
-

Thanks for the comments [~sjlee0],
bq. I would very much advocate using the JDK's ExecutorService (single-threaded 
executor in this case) over using a raw thread and its own blocking queue 
management.
May be i dint get your thoughts completely but let me explain the reason i have 
taken this approach, Some points what i have considered :
* We require all the events to be pushed in the order with which its submitted. 
Not sure whether we require order to be maintained across sync and async events 
but definitely within sycn/ async its required based on the last [~djp]'s 
comment. (for ex:  metric events which are sent as async requires to be order 
to ensure aggregation logic works properly and sync events like container 
started / stopped so that state of it can be determined if there is any 
intermittent daemon failures)
* As its single threaded better to merge the related events and push it once 
(like all the waiting async events can be clubbed and pushed at once )
* for the Sync events we need to throw an exception on failure. so that caller 
is informed that it failed.

Considering this i thought of maintaining a blocking queue and thread. so that  
whenever the data is available then code in the thread can take some action, 
(and by the time thread finishes publishing and comes back to read the queue if 
multiple async entities are there it can merge and publish in next round.).
May be the complexity will get reduced if we *need not maintain the order* 
across sync and async events.
Or please inform if i increased the scope of the jira than what is required.
bq. On TimelineEntities.java,
Yep can incorporate those changes, i just relied on eclipse auto code 
generation for hash and equals for a given class.
 


> Replace starting a separate thread for post entity with event loop in 
> TimelineClient
> 
>
> Key: YARN-3367
> URL: https://issues.apache.org/jira/browse/YARN-3367
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3367-feature-YARN-2928.003.patch, 
> YARN-3367-feature-YARN-2928.v1.002.patch, 
> YARN-3367-feature-YARN-2928.v1.004.patch, YARN-3367.YARN-2928.001.patch
>
>
> Since YARN-3039, we add loop in TimelineClient to wait for 
> collectorServiceAddress ready before posting any entity. In consumer of  
> TimelineClient (like AM), we are starting a new thread for each call to get 
> rid of potential deadlock in main thread. This way has at least 3 major 
> defects:
> 1. The consumer need some additional code to wrap a thread before calling 
> putEntities() in TimelineClient.
> 2. It cost many thread resources which is unnecessary.
> 3. The sequence of events could be out of order because each posting 
> operation thread get out of waiting loop randomly.
> We should have something like event loop in TimelineClient side, 
> putEntities() only put related entities into a queue of entities and a 
> separated thread handle to deliver entities in queue to collector via REST 
> call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3995) Some of the NM events are not getting published due race condition when AM container finishes in NM

2015-12-23 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070709#comment-15070709
 ] 

Naganarasimha G R commented on YARN-3995:
-

bq. If I recall, this window of opportunity is going to be quite small because 
any non-AM container will be completed before the app can be finished (and the 
AM container is completed).
This is true in most of the cases, unless and untill AM doesn't wait for the 
containers launched/requested by it to go down before it goes down. 
I ran TestDistributedShell and cross verified the logs for any errors due to 
collector being not there and din't find any for the containers launched by it. 
But TestDistributedShell launches only 2 containers if we run with more 
container then can find the impact.

bq. I suspect a simple linger might be sufficient, but do we see a case where 
we might miss writes otherwise?
Yes simple linger should be sufficient, shall i make this configurable period ? 
so that there is backup option in case of any issues and if required in future 
we can handle it in a better way ? Also is launching one thread per collector 
for closing it is fine ? IMO configurable linger period is sufficient 


> Some of the NM events are not getting published due race condition when AM 
> container finishes in NM 
> 
>
> Key: YARN-3995
> URL: https://issues.apache.org/jira/browse/YARN-3995
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Affects Versions: YARN-2928
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
>
> As discussed in YARN-3045:  While testing in TestDistributedShell found out 
> that few of the container metrics events were failing as there will be race 
> condition. When the AM container finishes and removes the collector for the 
> app, still there is possibility that all the events published for the app by 
> the current NM and other NM are still in pipeline, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3995) Some of the NM events are not getting published due race condition when AM container finishes in NM

2015-12-23 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070690#comment-15070690
 ] 

Sangjin Lee commented on YARN-3995:
---

If I recall, this window of opportunity is going to be quite small because any 
non-AM container will be completed before the app can be finished (and the AM 
container is completed). For this inversion to occur, there would have to be 
writes that originate from a remote NM that had a container (which had already 
been completed) but get delayed in reaching the timeline collector for some 
reason.

I suspect a simple linger might be sufficient, but do we see a case where we 
might miss writes otherwise?

> Some of the NM events are not getting published due race condition when AM 
> container finishes in NM 
> 
>
> Key: YARN-3995
> URL: https://issues.apache.org/jira/browse/YARN-3995
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Affects Versions: YARN-2928
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
>
> As discussed in YARN-3045:  While testing in TestDistributedShell found out 
> that few of the container metrics events were failing as there will be race 
> condition. When the AM container finishes and removes the collector for the 
> app, still there is possibility that all the events published for the app by 
> the current NM and other NM are still in pipeline, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4462) FairScheduler: Disallow preemption from a queue

2015-12-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070707#comment-15070707
 ] 

Hadoop QA commented on YARN-4462:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
47s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
34s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 12s 
{color} | {color:red} Patch generated 4 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 76, now 77). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 8 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 7s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 31s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 23s 
{color} | {color:red} Patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 138m 46s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA 

[jira] [Commented] (YARN-3995) Some of the NM events are not getting published due race condition when AM container finishes in NM

2015-12-23 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070716#comment-15070716
 ] 

Varun Saxena commented on YARN-3995:


bq. what i am trying to suggest is close/remove the collector only after a 
period of inactivity in the collector
That would be better. I guess what you mean is that instead of hard timeout, we 
will have rolling timeout i.e. timeout will keep on being pushed as entities 
are written. It will only timeout once no entities are being written for the 
specified period.

> Some of the NM events are not getting published due race condition when AM 
> container finishes in NM 
> 
>
> Key: YARN-3995
> URL: https://issues.apache.org/jira/browse/YARN-3995
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Affects Versions: YARN-2928
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
>
> As discussed in YARN-3045:  While testing in TestDistributedShell found out 
> that few of the container metrics events were failing as there will be race 
> condition. When the AM container finishes and removes the collector for the 
> app, still there is possibility that all the events published for the app by 
> the current NM and other NM are still in pipeline, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient

2015-12-23 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070715#comment-15070715
 ] 

Sangjin Lee commented on YARN-3367:
---

I haven't fully digested the code so I might be off-base. But when I see 
variables such as stopped, waitForDrained, and drained along with the thread 
and the queue, it feels rather like reinventing the wheel. Also, I see two sets 
of wait-notify pairs. Using the executor service should take care of the need 
for using those, and hopefully we can wrap the code between taking the item off 
of the queue and looping back into a callable/runnable.

> Replace starting a separate thread for post entity with event loop in 
> TimelineClient
> 
>
> Key: YARN-3367
> URL: https://issues.apache.org/jira/browse/YARN-3367
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3367-feature-YARN-2928.003.patch, 
> YARN-3367-feature-YARN-2928.v1.002.patch, 
> YARN-3367-feature-YARN-2928.v1.004.patch, YARN-3367.YARN-2928.001.patch
>
>
> Since YARN-3039, we add loop in TimelineClient to wait for 
> collectorServiceAddress ready before posting any entity. In consumer of  
> TimelineClient (like AM), we are starting a new thread for each call to get 
> rid of potential deadlock in main thread. This way has at least 3 major 
> defects:
> 1. The consumer need some additional code to wrap a thread before calling 
> putEntities() in TimelineClient.
> 2. It cost many thread resources which is unnecessary.
> 3. The sequence of events could be out of order because each posting 
> operation thread get out of waiting loop randomly.
> We should have something like event loop in TimelineClient side, 
> putEntities() only put related entities into a queue of entities and a 
> separated thread handle to deliver entities in queue to collector via REST 
> call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires

2015-12-23 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070594#comment-15070594
 ] 

Jian He commented on YARN-4138:
---

sorry, my bad. I don't why the AllocationExpirationInfo.java previously ended 
up in hadoop-yarn/ directory.

Seems the patch not applying on trunk any more.. 

> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch, 
> YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4343) Need to support Application History Server on ATSV2

2015-12-23 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070600#comment-15070600
 ] 

Naganarasimha G R commented on YARN-4343:
-

Hi [~vrushalic], thanks for getting back on this, may be i confused you guys a 
bit, what i mentioned was me and varun discussed about this issue and i will 
try to come up with rough approach or WIP to discuss further on this !

> Need to support Application History Server on ATSV2
> ---
>
> Key: YARN-4343
> URL: https://issues.apache.org/jira/browse/YARN-4343
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>
> AHS is used by the CLI and Webproxy(REST), if the application related 
> information is not found in RM then it tries to fetch from AHS and show



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient

2015-12-23 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070611#comment-15070611
 ] 

Sangjin Lee commented on YARN-3367:
---

Thanks for the patch [~Naganarasimha]! I took a fairly quick look at the latest 
patch.

One high level comment: as for the async dispatcher, I would very much advocate 
using the JDK's ExecutorService (single-threaded executor in this case) over 
using a raw thread and its own blocking queue management. It will definitely 
reduce the amount of code (and room for errors), and we can focus on the actual 
unit of work that needs to be done. Can you please consider using an 
ExecutorService over the thread + queue? If there is a compelling reason that 
an ExecutorService cannot work, I'd be curious to learn.

On TimelineEntities.java,
- hashCode(): why not simply return entities.hashCode()? entities is never null
- equals(): again, note that entities is never null; that will simplify the 
implementation here

> Replace starting a separate thread for post entity with event loop in 
> TimelineClient
> 
>
> Key: YARN-3367
> URL: https://issues.apache.org/jira/browse/YARN-3367
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3367-feature-YARN-2928.003.patch, 
> YARN-3367-feature-YARN-2928.v1.002.patch, 
> YARN-3367-feature-YARN-2928.v1.004.patch, YARN-3367.YARN-2928.001.patch
>
>
> Since YARN-3039, we add loop in TimelineClient to wait for 
> collectorServiceAddress ready before posting any entity. In consumer of  
> TimelineClient (like AM), we are starting a new thread for each call to get 
> rid of potential deadlock in main thread. This way has at least 3 major 
> defects:
> 1. The consumer need some additional code to wrap a thread before calling 
> putEntities() in TimelineClient.
> 2. It cost many thread resources which is unnecessary.
> 3. The sequence of events could be out of order because each posting 
> operation thread get out of waiting loop randomly.
> We should have something like event loop in TimelineClient side, 
> putEntities() only put related entities into a queue of entities and a 
> separated thread handle to deliver entities in queue to collector via REST 
> call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3995) Some of the NM events are not getting published due race condition when AM container finishes in NM

2015-12-23 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070660#comment-15070660
 ] 

Naganarasimha G R commented on YARN-3995:
-

Oops, Sorry my mistake ,
Thanks [~sjlee0] for correcting me. [~sjlee0] current code is already waiting 
for a second in a separate thread after AM container is closed (in 
PerNodeTimelineCollectorsAuxService.stopContainer), but the issue with that 
approach is: it just closes after 1 second though the events are still coming, 
but what i am trying to suggest is close/remove the collector only after a 
period of inactivity in the collector. Will that be good considering it will be 
usually getting delayed for metrics ?
if above approach is not required then already existing approach waits for a 
second in separate thread, does it req any change ? (least i can think is few  
threads will be there if more AM's are run from a single NM )

> Some of the NM events are not getting published due race condition when AM 
> container finishes in NM 
> 
>
> Key: YARN-3995
> URL: https://issues.apache.org/jira/browse/YARN-3995
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Affects Versions: YARN-2928
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
>
> As discussed in YARN-3045:  While testing in TestDistributedShell found out 
> that few of the container metrics events were failing as there will be race 
> condition. When the AM container finishes and removes the collector for the 
> app, still there is possibility that all the events published for the app by 
> the current NM and other NM are still in pipeline, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2599) Standby RM should also expose some jmx and metrics

2015-12-23 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070678#comment-15070678
 ] 

Rohith Sharma K S commented on YARN-2599:
-

Gearing up on this issue, I assume that just avoiding  cluster metrics and jmx 
from redirecting to active would be sufficient. But one thing I noticed that in 
YARN-1898, JMX and metrics were removed from NON_REDIRECTED_URIS  in addendum 
patch. Is this JIRA intention is to revert that or need to add more JMX metrics 
like for supporting HA like YARN-2442?
[~kasha] would  you provide your thoughts please?

> Standby RM should also expose some jmx and metrics
> --
>
> Key: YARN-2599
> URL: https://issues.apache.org/jira/browse/YARN-2599
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.1
>Reporter: Karthik Kambatla
>Assignee: Rohith Sharma K S
>
> YARN-1898 redirects jmx and metrics to the Active. As discussed there, we 
> need to separate out metrics displayed so the Standby RM can also be 
> monitored. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3995) Some of the NM events are not getting published due race condition when AM container finishes in NM

2015-12-23 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070723#comment-15070723
 ] 

Sangjin Lee commented on YARN-3995:
---

bq. This is true in most of the cases, unless and untill AM doesn't wait for 
the containers launched/requested by it to go down before it goes down.

Are you thinking of cases where the AM crashes? If the app finishes normally, 
this sequence does not happen, right?

bq. Yes simple linger should be sufficient, shall i make this configurable 
period ? so that there is backup option in case of any issues and if required 
in future we can handle it in a better way ?

Making it configurable sounds fine to me.

bq. Also is launching one thread per collector for closing it is fine ?

I suspect it would be fine. Note that there would be a few collectors per NM at 
most.

> Some of the NM events are not getting published due race condition when AM 
> container finishes in NM 
> 
>
> Key: YARN-3995
> URL: https://issues.apache.org/jira/browse/YARN-3995
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Affects Versions: YARN-2928
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
>
> As discussed in YARN-3045:  While testing in TestDistributedShell found out 
> that few of the container metrics events were failing as there will be race 
> condition. When the AM container finishes and removes the collector for the 
> app, still there is possibility that all the events published for the app by 
> the current NM and other NM are still in pipeline, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >