date:20151210

[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version

2015-12-10 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051236#comment-15051236
 ] 

Junping Du commented on YARN-3623:
--

bq. "it means the cluster will and should bring up the timeline service v.1.5 
(and nothing else)." 
Think over it again. I am still uncomfortable for the description - "(and 
nothing else)." It could be against for our possible solutions for YARN-4368. I 
prefer to remove this in an addendum patch or it could provide unnecessary 
restrictions if it get released in 2.8.0. Thoughts?

> We should have a config to indicate the Timeline Service version
> 
>
> Key: YARN-3623
> URL: https://issues.apache.org/jira/browse/YARN-3623
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-3623-2015-11-19.1.patch, YARN-3623-2015-12-09.patch
>
>
> So far RM, MR AM, DA AM added/changed new config to enable the feature to 
> write the timeline data to v2 server. It's good to have a YARN 
> timeline-service.version config like timeline-service.enable to indicate the 
> version of the running timeline service with the given YARN cluster. It's 
> beneficial for users to more smoothly move from v1 to v2, as they don't need 
> to change the existing config, but switch this config from v1 to v2. And each 
> framework doesn't need to have their own v1/v2 config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4434) NodeManager Disk Checker parameter documentation is not correct

2015-12-10 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051363#comment-15051363
 ] 

Junping Du commented on YARN-4434:
--

Hi [~ajisakaa], we are moving on to 2.6.3-RC0. Would you hold on any commit bit 
(unless they are agreed blockers) to branch-2.6.3 in later commit effort? 
Thanks.

> NodeManager Disk Checker parameter documentation is not correct
> ---
>
> Key: YARN-4434
> URL: https://issues.apache.org/jira/browse/YARN-4434
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, nodemanager
>Affects Versions: 2.6.0, 2.7.1
>Reporter: Takashi Ohnishi
>Assignee: Weiwei Yang
>Priority: Minor
> Fix For: 2.8.0, 2.6.3, 2.7.3
>
> Attachments: YARN-4434.001.patch, YARN-4434.branch-2.6.patch
>
>
> In the description of 
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage,
>  it says
> {noformat}
> The default value is 100 i.e. the entire disk can be used.
> {noformat}
> But, in yarn-default.xml and source code, the default value is 90.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

2015-12-10 Thread Naganarasimha G R (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4416:

Description: 
While debugging in eclipse came across a scenario where in i had to get to know 
the name of the queue but every time i tried to see the queue it was getting 
hung. On seeing the stack realized there was a deadlock but on analysis found 
out that it was only due to *queue.toString()* during debugging as 
{{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
Hence we need to ensure following :
# queueCapacity, resource-usage has their own read/write lock hence 
synchronization is not req
# numContainers is volatile hence synchronization is not req.
# read/write lock could be added to Ordering Policy. Read operations don't need 
synchronized. So {{getNumApplications}} doesn't need synchronized. 
(First 2 will be handled in this jira and the third will be handled in 
YARN-4443)


  was:
While debugging in eclipse came across a scenario where in i had to get to know 
the name of the queue but every time i tried to see the queue it was getting 
hung. On seeing the stack realized there was a deadlock but on analysis found 
out that it was only due to *queue.toString()* during debugging as 
{{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
Hence we need to ensure following :
# queueCapacity, resource-usage has their own read/write lock.
# 



> Deadlock due to synchronised get Methods in AbstractCSQueue
> ---
>
> Key: YARN-4416
> URL: https://issues.apache.org/jira/browse/YARN-4416
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, 
> deadlock.log
>
>
> While debugging in eclipse came across a scenario where in i had to get to 
> know the name of the queue but every time i tried to see the queue it was 
> getting hung. On seeing the stack realized there was a deadlock but on 
> analysis found out that it was only due to *queue.toString()* during 
> debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
> Hence we need to ensure following :
> # queueCapacity, resource-usage has their own read/write lock hence 
> synchronization is not req
> # numContainers is volatile hence synchronization is not req.
> # read/write lock could be added to Ordering Policy. Read operations don't 
> need synchronized. So {{getNumApplications}} doesn't need synchronized. 
> (First 2 will be handled in this jira and the third will be handled in 
> YARN-4443)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4443) Improve locks of OrderingPolicy

2015-12-10 Thread Naganarasimha G R (JIRA)

Naganarasimha G R created YARN-4443:
---

 Summary: Improve locks of OrderingPolicy 
 Key: YARN-4443
 URL: https://issues.apache.org/jira/browse/YARN-4443
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R


Improve locks of OrderingPolicy as its tightly coupled with Leaf queue for its 
consistency. we should decouple it from LeafQueue to better API design. 
Potentially we need to rethink API of OrderingPolicy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4399) FairScheduler allocated container should resetSchedulingOpportunities count of its priority

2015-12-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051470#comment-15051470
 ] 

Hadoop QA commented on YARN-4399:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
54s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
19s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 25s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
29s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 27s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 11s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 8s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
23s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 153m 27s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12776828/YARN-4399.003.patch |
| JIRA Issue | YARN-4399 |
| Optional Tests |  asflicense  compile  javac

[jira] [Updated] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS

2015-12-10 Thread Naganarasimha G R (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3946:

Attachment: YARN-3946.v1.008.patch

Thanks for pointing it out [~wangda], have corrected the test case and the 
applicable and correct checkstyle issues

> Allow fetching exact reason as to why a submitted app is in ACCEPTED state in 
> CS
> 
>
> Key: YARN-3946
> URL: https://issues.apache.org/jira/browse/YARN-3946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sumit Nigam
>Assignee: Naganarasimha G R
> Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, 
> YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch, 
> YARN-3946.v1.004.patch, YARN-3946.v1.005.patch, YARN-3946.v1.006.patch, 
> YARN-3946.v1.007.patch, YARN-3946.v1.008.patch
>
>
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

2015-12-10 Thread Naganarasimha G R (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4416:

Attachment: YARN-4416.v2.001.patch

Hi [~wangda], as per your earlier comment i have split the jira into 2 and have 
only focused on correcting by removing synchronization on the methods which are 
used in {{toString}} excluding the changes required for ordering policy.
Further i have taken care of two things
* absoluteCapacityResource was not used hence the variable and the method i 
have removed
* getNumContainers was unnecessarily overridded in leaf queue hence removed it.
Please provide your feed back 

> Deadlock due to synchronised get Methods in AbstractCSQueue
> ---
>
> Key: YARN-4416
> URL: https://issues.apache.org/jira/browse/YARN-4416
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, 
> YARN-4416.v2.001.patch, deadlock.log
>
>
> While debugging in eclipse came across a scenario where in i had to get to 
> know the name of the queue but every time i tried to see the queue it was 
> getting hung. On seeing the stack realized there was a deadlock but on 
> analysis found out that it was only due to *queue.toString()* during 
> debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
> Hence we need to ensure following :
> # queueCapacity, resource-usage has their own read/write lock hence 
> synchronization is not req
> # numContainers is volatile hence synchronization is not req.
> # read/write lock could be added to Ordering Policy. Read operations don't 
> need synchronized. So {{getNumApplications}} doesn't need synchronized. 
> (First 2 will be handled in this jira and the third will be handled in 
> YARN-4443)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version

2015-12-10 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051478#comment-15051478
 ] 

Hudson commented on YARN-3623:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8951 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8951/])
YARN-3623-Addendum: Improve the description for Timeline Service Version 
(xgong: rev 21daa6c68a0bff51a14e748bf14d56b2f5a5580f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


> We should have a config to indicate the Timeline Service version
> 
>
> Key: YARN-3623
> URL: https://issues.apache.org/jira/browse/YARN-3623
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-3623-2015-11-19.1.patch, 
> YARN-3623-2015-12-09.patch, YARN-3623-addendum.patch
>
>
> So far RM, MR AM, DA AM added/changed new config to enable the feature to 
> write the timeline data to v2 server. It's good to have a YARN 
> timeline-service.version config like timeline-service.enable to indicate the 
> version of the running timeline service with the given YARN cluster. It's 
> beneficial for users to more smoothly move from v1 to v2, as they don't need 
> to change the existing config, but switch this config from v1 to v2. And each 
> framework doesn't need to have their own v1/v2 config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2015-12-10 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051498#comment-15051498
 ] 

Sangjin Lee commented on YARN-4356:
---

Thanks for the feedback. Now that YARN-3623 has been committed, I'll first 
cherry-pick that commit over to our branch.

I'll also update the patch based on comments from [~djp] and [~Naganarasimha].

> ensure the timeline service v.2 is disabled cleanly and has no impact when 
> it's turned off
> --
>
> Key: YARN-4356
> URL: https://issues.apache.org/jira/browse/YARN-4356
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4356-feature-YARN-2928.002.patch, 
> YARN-4356-feature-YARN-2928.003.patch, YARN-4356-feature-YARN-2928.004.patch, 
> YARN-4356-feature-YARN-2928.005.patch, 
> YARN-4356-feature-YARN-2928.poc.001.patch
>
>
> For us to be able to merge the first milestone drop to trunk, we want to 
> ensure that once disabled the timeline service v.2 has no impact from the 
> server side to the client side. If the timeline service is not enabled, no 
> action should be done. If v.1 is enabled but not v.2, v.1 should behave the 
> same as it does before the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4422) Generic AHS sometimes doesn't show started, node, or logs on App page

2015-12-10 Thread Ming Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051548#comment-15051548
 ] 

Ming Ma commented on YARN-4422:
---

Thanks [~eepayne].

> Generic AHS sometimes doesn't show started, node, or logs on App page
> -
>
> Key: YARN-4422
> URL: https://issues.apache.org/jira/browse/YARN-4422
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 3.0.0, 2.8.0, 2.7.3
>
> Attachments: AppAttemptPage no container or node.jpg, AppPage no logs 
> or node.jpg, YARN-4422.001.patch
>
>
> Sometimes the AM container for an app isn't able to start the JVM. This can 
> happen if bogus JVM options are given to the AM container ( 
> {{-Dyarn.app.mapreduce.am.command-opts=-InvalidJvmOption}}) or when 
> misconfiguring the AM container's environment variables 
> ({{-Dyarn.app.mapreduce.am.env="JAVA_HOME=/foo/bar/baz}})
> When the AM container for an app isn't able to start the JVM, the Application 
> page for that application shows {{N/A}} for the {{Started}}, {{Node}}, and 
> {{Logs}} columns. It _does_ have links for each app attempt, and if you click 
> on one of them, you go to the Application Attempt page, where you can see all 
> containers with links to their logs and nodes, including the AM container. 
> But none of that shows up for the app attempts on the Application page.
> Also, on the Application Attempt page, in the {{Application Attempt 
> Overview}} section, the {{AM Container}} value is {{null}} and the {{Node}} 
> value is {{N/A}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version

2015-12-10 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051459#comment-15051459
 ] 

Xuan Gong commented on YARN-3623:
-

+1 Checking this in

> We should have a config to indicate the Timeline Service version
> 
>
> Key: YARN-3623
> URL: https://issues.apache.org/jira/browse/YARN-3623
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-3623-2015-11-19.1.patch, 
> YARN-3623-2015-12-09.patch, YARN-3623-addendum.patch
>
>
> So far RM, MR AM, DA AM added/changed new config to enable the feature to 
> write the timeline data to v2 server. It's good to have a YARN 
> timeline-service.version config like timeline-service.enable to indicate the 
> version of the running timeline service with the given YARN cluster. It's 
> beneficial for users to more smoothly move from v1 to v2, as they don't need 
> to change the existing config, but switch this config from v1 to v2. And each 
> framework doesn't need to have their own v1/v2 config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list

2015-12-10 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051542#comment-15051542
 ] 

Kuhu Shukla commented on YARN-4311:
---

[~eepayne], request for comments and review. Thanks a lot.

> Removing nodes from include and exclude lists will not remove them from 
> decommissioned nodes list
> -
>
> Key: YARN-4311
> URL: https://issues.apache.org/jira/browse/YARN-4311
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-4311-v1.patch, YARN-4311-v2.patch, 
> YARN-4311-v3.patch
>
>
> In order to fully forget about a node, removing the node from include and 
> exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The 
> tricky part that [~jlowe] pointed out was the case when include lists are not 
> used, in that case we don't want the nodes to fall off if they are not active.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version

2015-12-10 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051463#comment-15051463
 ] 

Xuan Gong commented on YARN-3623:
-

Committed the addendum patch in trunk/branch-2. Thanks, junping.

> We should have a config to indicate the Timeline Service version
> 
>
> Key: YARN-3623
> URL: https://issues.apache.org/jira/browse/YARN-3623
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-3623-2015-11-19.1.patch, 
> YARN-3623-2015-12-09.patch, YARN-3623-addendum.patch
>
>
> So far RM, MR AM, DA AM added/changed new config to enable the feature to 
> write the timeline data to v2 server. It's good to have a YARN 
> timeline-service.version config like timeline-service.enable to indicate the 
> version of the running timeline service with the given YARN cluster. It's 
> beneficial for users to more smoothly move from v1 to v2, as they don't need 
> to change the existing config, but switch this config from v1 to v2. And each 
> framework doesn't need to have their own v1/v2 config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3623) We should have a config to indicate the Timeline Service version

2015-12-10 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3623:
-
Attachment: YARN-3623-addendum.patch

Sounds good. I put an addendum patch here. Please check if it make sense.

> We should have a config to indicate the Timeline Service version
> 
>
> Key: YARN-3623
> URL: https://issues.apache.org/jira/browse/YARN-3623
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-3623-2015-11-19.1.patch, 
> YARN-3623-2015-12-09.patch, YARN-3623-addendum.patch
>
>
> So far RM, MR AM, DA AM added/changed new config to enable the feature to 
> write the timeline data to v2 server. It's good to have a YARN 
> timeline-service.version config like timeline-service.enable to indicate the 
> version of the running timeline service with the given YARN cluster. It's 
> beneficial for users to more smoothly move from v1 to v2, as they don't need 
> to change the existing config, but switch this config from v1 to v2. And each 
> framework doesn't need to have their own v1/v2 config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version

2015-12-10 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051335#comment-15051335
 ] 

Naganarasimha G R commented on YARN-3623:
-

+1 lgtm

> We should have a config to indicate the Timeline Service version
> 
>
> Key: YARN-3623
> URL: https://issues.apache.org/jira/browse/YARN-3623
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-3623-2015-11-19.1.patch, 
> YARN-3623-2015-12-09.patch, YARN-3623-addendum.patch
>
>
> So far RM, MR AM, DA AM added/changed new config to enable the feature to 
> write the timeline data to v2 server. It's good to have a YARN 
> timeline-service.version config like timeline-service.enable to indicate the 
> version of the running timeline service with the given YARN cluster. It's 
> beneficial for users to more smoothly move from v1 to v2, as they don't need 
> to change the existing config, but switch this config from v1 to v2. And each 
> framework doesn't need to have their own v1/v2 config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request

2015-12-10 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051332#comment-15051332
 ] 

Sunil G commented on YARN-4108:
---

Hi [~leftnoteasy]
Thank you for sharing the detailed doc and patch. Its a wonderful effort and 
came out nicely. I went through the doc mainly, and checked few part of the 
patch. I will got through patch in detail soon and share more comments if I 
have. 

Few major doubts:
1. With different {{PreemptionType}}, are we planning to handle preemption 
across queue, within queue (fifo/priority), within user etc? YARN-2009 was 
trying to handling the preemption within a queue adhering to priority. 
2. Currently all containers from a node is selected and tried to find which all 
are matching the preemption type. Later {{selectContainersToPreempt}} helps to 
clear out the non-valid containers.
>> I would see it will be a great help if flexibility is provided with some 
>> interface to sort containers eventhough a great deal of validation is done. 
   Sorting parameter can be
- submitted time 
- priority of app (since we take bunch of containers from a node first, 
only few apps in cluster will come in one shot)
- priority of containers
- time remaining for the container to finish (% of completion)

With these flexibility, user can tune which containers will be his first choice 
for preemption provided all the size/user limit/locality are matched.
3. Could we get a choice to kill container based on data locality, I could see 
the changes but cudnt see how its achieved in preemption manager end.


> CapacityScheduler: Improve preemption to preempt only those containers that 
> would satisfy the incoming request
> --
>
> Key: YARN-4108
> URL: https://issues.apache.org/jira/browse/YARN-4108
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4108-design-doc-v1.pdf, YARN-4108.poc.1.patch
>
>
> This is sibling JIRA for YARN-2154. We should make sure container preemption 
> is more effective.
> *Requirements:*:
> 1) Can handle case of user-limit preemption
> 2) Can handle case of resource placement requirements, such as: hard-locality 
> (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I 
> don't want to use rack1 and host\[1-3\])
> 3) Can handle preemption within a queue: cross user preemption (YARN-2113), 
> cross applicaiton preemption (such as priority-based (YARN-1963) / 
> fairness-based (YARN-3319)).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version

2015-12-10 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051419#comment-15051419
 ] 

Junping Du commented on YARN-3623:
--

Yes. That's the flexibility we want here - we probably bring up old version ATS 
service for legacy running apps during upgrade. So we should pay more 
attentions here given configurations (especially in yarn-default.xml) serve as 
a public protocol to hadoop users.

> We should have a config to indicate the Timeline Service version
> 
>
> Key: YARN-3623
> URL: https://issues.apache.org/jira/browse/YARN-3623
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-3623-2015-11-19.1.patch, 
> YARN-3623-2015-12-09.patch, YARN-3623-addendum.patch
>
>
> So far RM, MR AM, DA AM added/changed new config to enable the feature to 
> write the timeline data to v2 server. It's good to have a YARN 
> timeline-service.version config like timeline-service.enable to indicate the 
> version of the running timeline service with the given YARN cluster. It's 
> beneficial for users to more smoothly move from v1 to v2, as they don't need 
> to change the existing config, but switch this config from v1 to v2. And each 
> framework doesn't need to have their own v1/v2 config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1856) cgroups based memory monitoring for containers

2015-12-10 Thread Varun Vasudev (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-1856:

Attachment: YARN-1856.004.patch

{quote}
Should add all the configs to yarn-default.xml, saying they are still early 
configs?
{quote}

I don't think we've figured out how to specify the various resource isolation 
pieces from a config perspective. I'd like to keep to private for now and I'll 
file a follow up JIRA to document the configs once we've figured it out. The 
remaining points all relate to this so I'll address them when as part of that 
JIRA.

{quote}
ResourceHandlerModule - Formatting of new code is a little off: the declaration 
of getCgroupsMemoryResourceHandler(). There are other occurrences like this in 
that class before in this patch, you may want to fix those.
{quote}
Fixed.

{quote}
BUG! getCgroupsMemoryResourceHandler() incorrectly locks 
DiskResourceHandler instead of MemoryResourceHandler.
CGroupsMemoryResourceHandlerImpl
{quote}
Not a bug! But a *bad* typo nonetheless. Fixed

{quote}
What is this doing? {{ CGroupsHandler.CGroupController MEMORY = 
CGroupsHandler.CGroupController.MEMORY; }} Is it forcing a class-load or 
something? Not sure if this is needed. If this is needed, you may want to add a 
comment here.
{quote}
No just a shorthand instead of specifying the entire qualified variable every 
time.

{quote}
NM_MEMORY_RESOURCE_CGROUPS_SOFT_LIMIT_PERC -> 
NM_MEMORY_RESOURCE_CGROUPS_SOFT_LIMIT_PERCENTAGE. Similarly the default 
constant.
{quote}

Fixed.

{quote}
CGROUP_PARAM_MEMORY_HARD_LIMIT_BYTES / CGROUP_PARAM_MEMORY_SOFT_LIMIT_BYTES 
/ CGROUP_PARAM_MEMORY_SWAPPINESS can all be static and final.
{quote}
Interface variables are public static final by default. Any reason you want to 
add static final?


> cgroups based memory monitoring for containers
> --
>
> Key: YARN-1856
> URL: https://issues.apache.org/jira/browse/YARN-1856
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Varun Vasudev
> Attachments: YARN-1856.001.patch, YARN-1856.002.patch, 
> YARN-1856.003.patch, YARN-1856.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4358) Improve relationship between SharingPolicy and ReservationAgent

2015-12-10 Thread Arun Suresh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-4358:
--
Attachment: YARN-4358.addendum-2.patch

Updated based on [~subru]'s suggestion

> Improve relationship between SharingPolicy and ReservationAgent
> ---
>
> Key: YARN-4358
> URL: https://issues.apache.org/jira/browse/YARN-4358
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4358.2.patch, YARN-4358.3.patch, YARN-4358.4.patch, 
> YARN-4358.addendum-2.patch, YARN-4358.addendum.patch, YARN-4358.patch
>
>
> At the moment an agent places based on available resources, but has no 
> visibility to extra constraints imposed by the SharingPolicy. While not all 
> constraints are easily represented some (e.g., max-instantaneous resources) 
> are easily represented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3226) UI changes for decommissioning node

2015-12-10 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050449#comment-15050449
 ] 

Rohith Sharma K S commented on YARN-3226:
-

Sorry for coming late, looking into UI part of the code. UI looks good and 
testing in once node cluster too for CS.
One nit, can name heading like {{ClusterNodesMetrics}} instead of 
{{ClusterMetricsOnMetrics}}


> UI changes for decommissioning node
> ---
>
> Key: YARN-3226
> URL: https://issues.apache.org/jira/browse/YARN-3226
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Junping Du
>Assignee: Sunil G
> Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, 
> 0003-YARN-3226.patch, ClusterMetricsOnNodes_UI.png
>
>
> Some initial thought is:
> decommissioning nodes should still show up in the active nodes list since 
> they are still running containers. 
> A separate decommissioning tab to filter for those nodes would be nice, 
> although I suppose users can also just use the jquery table to sort/search for
> nodes in that state from the active nodes list if it's too crowded to add yet 
> another node
> state tab (or maybe get rid of some effectively dead tabs like the reboot 
> state tab).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4438) Implement RM leader election with curator

2015-12-10 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050391#comment-15050391
 ] 

Karthik Kambatla commented on YARN-4438:


Would very much like for us to use Curator for leader election. May be, HDFS 
could also do the same in the future. 

Quickly skimmed through the patch. High-level comments:
# IIRR we use the same ZK-quorum for both leader election and the store. Can we 
re-use the CuratorFramework so leader-election and store-operations are fully 
consistent. Otherwise, the clients (and their individual timeouts etc.) could 
lead to inconsistencies? 
# Would it be possible to hide the implementation of the leader-election - 
ActiveStandbyElector vs CuratorElector - behind EmbeddedElector? AdminService 
or RM don't need to know the details? 

In any case, having written some Curator code in the past, I would like to 
review the code more closely. 

> Implement RM leader election with curator
> -
>
> Key: YARN-4438
> URL: https://issues.apache.org/jira/browse/YARN-4438
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4438.1.patch
>
>
> This is to implement the leader election with curator instead of the 
> ActiveStandbyElector from common package,  this also avoids adding more 
> configs in common to suit RM's own needs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

2015-12-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051679#comment-15051679
 ] 

Hadoop QA commented on YARN-4416:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 1s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 9s 
{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 20s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
33s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 20s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 introduced 1 new FindBugs issues. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 21s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 16s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 53s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
23s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 146m 48s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
|  |  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.getAbsoluteUsedCapacity()
 is unsynchronized,

[jira] [Updated] (YARN-4414) Nodemanager connection errors are retried at multiple levels

2015-12-10 Thread Chang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4414:
---
Attachment: YARN-4414.1.patch

> Nodemanager connection errors are retried at multiple levels
> 
>
> Key: YARN-4414
> URL: https://issues.apache.org/jira/browse/YARN-4414
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Jason Lowe
>Assignee: Chang Li
> Attachments: YARN-4414.1.patch
>
>
> This is related to YARN-3238.  Ran into more scenarios where connection 
> errors are being retried at multiple levels, like NoRouteToHostException.  
> The fix for YARN-3238 was too specific, and I think we need a more general 
> solution to catch a wider array of connection errors that can occur to avoid 
> retrying them both at the RPC layer and at the NM proxy layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2015-12-10 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051966#comment-15051966
 ] 

Li Lu commented on YARN-4356:
-

I can see in the v.6 patch both concerns are addressed. Any other 
comments/suggestions/concerns? 

> ensure the timeline service v.2 is disabled cleanly and has no impact when 
> it's turned off
> --
>
> Key: YARN-4356
> URL: https://issues.apache.org/jira/browse/YARN-4356
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4356-feature-YARN-2928.002.patch, 
> YARN-4356-feature-YARN-2928.003.patch, YARN-4356-feature-YARN-2928.004.patch, 
> YARN-4356-feature-YARN-2928.005.patch, YARN-4356-feature-YARN-2928.006.patch, 
> YARN-4356-feature-YARN-2928.poc.001.patch
>
>
> For us to be able to merge the first milestone drop to trunk, we want to 
> ensure that once disabled the timeline service v.2 has no impact from the 
> server side to the client side. If the timeline service is not enabled, no 
> action should be done. If v.1 is enabled but not v.2, v.1 should behave the 
> same as it does before the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4414) Nodemanager connection errors are retried at multiple levels

2015-12-10 Thread Chang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4414:
---
Attachment: YARN-4414.1.2.patch

> Nodemanager connection errors are retried at multiple levels
> 
>
> Key: YARN-4414
> URL: https://issues.apache.org/jira/browse/YARN-4414
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Jason Lowe
>Assignee: Chang Li
> Attachments: YARN-4414.1.2.patch, YARN-4414.1.patch
>
>
> This is related to YARN-3238.  Ran into more scenarios where connection 
> errors are being retried at multiple levels, like NoRouteToHostException.  
> The fix for YARN-3238 was too specific, and I think we need a more general 
> solution to catch a wider array of connection errors that can occur to avoid 
> retrying them both at the RPC layer and at the NM proxy layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2015-12-10 Thread Sangjin Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-4356:
--
Attachment: YARN-4356-feature-YARN-2928.006.patch

Posted patch v.6.

Addressed Junping's comment for moving the YarnConfiguration to the argument 
for PerNodeAuxService and TimelineReaderServer.

Also, per Naga's comment I deprecated RM_SYSTEM_METRIC_PUBLISHER_ENABLED.


> ensure the timeline service v.2 is disabled cleanly and has no impact when 
> it's turned off
> --
>
> Key: YARN-4356
> URL: https://issues.apache.org/jira/browse/YARN-4356
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4356-feature-YARN-2928.002.patch, 
> YARN-4356-feature-YARN-2928.003.patch, YARN-4356-feature-YARN-2928.004.patch, 
> YARN-4356-feature-YARN-2928.005.patch, YARN-4356-feature-YARN-2928.006.patch, 
> YARN-4356-feature-YARN-2928.poc.001.patch
>
>
> For us to be able to merge the first milestone drop to trunk, we want to 
> ensure that once disabled the timeline service v.2 has no impact from the 
> server side to the client side. If the timeline service is not enabled, no 
> action should be done. If v.1 is enabled but not v.2, v.1 should behave the 
> same as it does before the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-10 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051956#comment-15051956
 ] 

Li Lu commented on YARN-4224:
-

I scanned through the patch. The patch itself looks fine, but I would like to 
check with the broader community about the patterns proposed in this JIRA. 
Since this is blocking the ongoing web UI work, we really want to have a 
relatively stable interface on this before we proceed changing the UI side. 

IIUC we're putting the mandatory parameters in a hierarchical order in the URL, 
and adding optional parameters as query parameters. This approach looks fine 
with me. For naming conventions, we really want to be consistent with the rest 
part of the codebase. [~wangda] any suggestions/comments here, given this is 
quite related with the next-gen UI? Thanks! 

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4224-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3817) [Aggregation] Flow and User level aggregation on Application States table

2015-12-10 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3817:

Attachment: YARN-3817-poc-v1-rebase.patch

Rebase the POC patch to the feature-YARN-2928 branch. The new patch is based on 
YARN-3816-feature-YARN-2928.v4.1.patch. 

> [Aggregation] Flow and User level aggregation on Application States table
> -
>
> Key: YARN-3817
> URL: https://issues.apache.org/jira/browse/YARN-3817
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Li Lu
> Attachments: Detail Design for Flow and User Level Aggregation.pdf, 
> YARN-3817-poc-v1-rebase.patch, YARN-3817-poc-v1.patch
>
>
> We need time-based flow/user level aggregation to present flow/user related 
> states to end users.
> Flow level represents summary info of a specific flow. User level aggregation 
> represents summary info of a specific user, it should include summary info of 
> accumulated and statistic means (by two levels: application and flow), like: 
> number of Flows, applications, resource consumption, resource means per app 
> or flow, etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4219) New levelDB cache storage for timeline v1.5

2015-12-10 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-4219:

Attachment: YARN-4219-YARN-4265.003.patch

Fixed the synchronization issue on service stop. 

> New levelDB cache storage for timeline v1.5
> ---
>
> Key: YARN-4219
> URL: https://issues.apache.org/jira/browse/YARN-4219
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.8.0
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4219-YARN-4265.001.patch, 
> YARN-4219-YARN-4265.002.patch, YARN-4219-YARN-4265.003.patch, 
> YARN-4219-trunk.001.patch, YARN-4219-trunk.002.patch, 
> YARN-4219-trunk.003.patch
>
>
> We need to have an "offline" caching storage for timeline server v1.5 after 
> the changes in YARN-3942. The in memory timeline storage may run into OOM 
> issues when used as a cache storage for entity file timeline storage. We can 
> refactor the code and have a level db based caching storage for this use 
> case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version

2015-12-10 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050551#comment-15050551
 ] 

Junping Du commented on YARN-3623:
--

Ok. I think I already find the answer from YARN-4234 that ATS 1.5 depends on 
this patch. Will go ahead to commit it to 2.8.

> We should have a config to indicate the Timeline Service version
> 
>
> Key: YARN-3623
> URL: https://issues.apache.org/jira/browse/YARN-3623
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Attachments: YARN-3623-2015-11-19.1.patch, YARN-3623-2015-12-09.patch
>
>
> So far RM, MR AM, DA AM added/changed new config to enable the feature to 
> write the timeline data to v2 server. It's good to have a YARN 
> timeline-service.version config like timeline-service.enable to indicate the 
> version of the running timeline service with the given YARN cluster. It's 
> beneficial for users to more smoothly move from v1 to v2, as they don't need 
> to change the existing config, but switch this config from v1 to v2. And each 
> framework doesn't need to have their own v1/v2 config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2015-12-10 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050645#comment-15050645
 ] 

Junping Du commented on YARN-4356:
--

Thanks [~sjlee0] for updating the patch! 
bq. This is the java 7 diamond operator (<>) which is a shorthand for inferring 
types. The type information is NOT removed. It's inferred by the compiler, and 
the compiler produces the same bytecode as specifying the types.
Got it. Sounds like my coffee is stale and I need a new cup. :)

bq. Got it. Can we proceed with the current patch and get that fix once 
YARN-3586 goes in?
Sure. Just a reminder here as I also forget it myself.

bq. This is addressing a javadoc error. The ampersand ("&") is a special 
character for javadoc, and it breaks javadoc. It needs to be entity-escaped.
Oh. I see. Let's keep it here.

bq. This is used for the test method launchServer(). This method is invoked 
directly by a unit test (thus the @VisibleForTesting annotation). The same for 
TimelineReaderServer.
If so, I would prefer not to update configuration in launchServer(), but pass a 
updated configuration to the method instead, just as other daemons (RM, NM) 
doing. The reason is it could be hard to track the real configurations for 
daemons/services if we override them in internal logic. May be we should follow 
the same practices?

bq. That's fine. I still put up the patch that includes a version of that 
because without it things won't even compile. I will wait until YARN-3623 goes 
in before I remove that piece from this patch, then this can get committed.
Sounds good. I just commit YARN-3623 to trunk.

Other looks good to me in 005 patch.

> ensure the timeline service v.2 is disabled cleanly and has no impact when 
> it's turned off
> --
>
> Key: YARN-4356
> URL: https://issues.apache.org/jira/browse/YARN-4356
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4356-feature-YARN-2928.002.patch, 
> YARN-4356-feature-YARN-2928.003.patch, YARN-4356-feature-YARN-2928.004.patch, 
> YARN-4356-feature-YARN-2928.005.patch, 
> YARN-4356-feature-YARN-2928.poc.001.patch
>
>
> For us to be able to merge the first milestone drop to trunk, we want to 
> ensure that once disabled the timeline service v.2 has no impact from the 
> server side to the client side. If the timeline service is not enabled, no 
> action should be done. If v.1 is enabled but not v.2, v.1 should behave the 
> same as it does before the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3623) We should have a config to indicate the Timeline Service version

2015-12-10 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3623:
-
Target Version/s: 2.8.0  (was: YARN-2928)

> We should have a config to indicate the Timeline Service version
> 
>
> Key: YARN-3623
> URL: https://issues.apache.org/jira/browse/YARN-3623
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Attachments: YARN-3623-2015-11-19.1.patch, YARN-3623-2015-12-09.patch
>
>
> So far RM, MR AM, DA AM added/changed new config to enable the feature to 
> write the timeline data to v2 server. It's good to have a YARN 
> timeline-service.version config like timeline-service.enable to indicate the 
> version of the running timeline service with the given YARN cluster. It's 
> beneficial for users to more smoothly move from v1 to v2, as they don't need 
> to change the existing config, but switch this config from v1 to v2. And each 
> framework doesn't need to have their own v1/v2 config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4309) Add debug information to application logs when a container fails

2015-12-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050641#comment-15050641
 ] 

Hadoop QA commented on YARN-4309:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 11s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 19s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
31s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 37s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
49s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 46s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 11s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 11s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 17s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 29s 
{color} | {color:red} Patch generated 3 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 358, now 358). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
40s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 27s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 46s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 0s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 54s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 17s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 20s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_91. {color} |
|

[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version

2015-12-10 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050496#comment-15050496
 ] 

Junping Du commented on YARN-3623:
--

Oh. Just noticed target version is YARN-2928 but I think ATS 1.5 need this 
also. Isn't it? Shall we commit the patch to trunk/branch-2?

> We should have a config to indicate the Timeline Service version
> 
>
> Key: YARN-3623
> URL: https://issues.apache.org/jira/browse/YARN-3623
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Attachments: YARN-3623-2015-11-19.1.patch, YARN-3623-2015-12-09.patch
>
>
> So far RM, MR AM, DA AM added/changed new config to enable the feature to 
> write the timeline data to v2 server. It's good to have a YARN 
> timeline-service.version config like timeline-service.enable to indicate the 
> version of the running timeline service with the given YARN cluster. It's 
> beneficial for users to more smoothly move from v1 to v2, as they don't need 
> to change the existing config, but switch this config from v1 to v2. And each 
> framework doesn't need to have their own v1/v2 config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2934) Improve handling of container's stderr

2015-12-10 Thread Varun Vasudev (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050495#comment-15050495
 ] 

Varun Vasudev commented on YARN-2934:
-

[~Naganarasimha] - one minor comment - can we make the tail size a config 
parameter? Currently it's hard-coded in the code.

> Improve handling of container's stderr 
> ---
>
> Key: YARN-2934
> URL: https://issues.apache.org/jira/browse/YARN-2934
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Gera Shegalov
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-2934.v1.001.patch, YARN-2934.v1.002.patch, 
> YARN-2934.v1.003.patch, YARN-2934.v1.004.patch, YARN-2934.v1.005.patch
>
>
> Most YARN applications redirect stderr to some file. That's why when 
> container launch fails with {{ExitCodeException}} the message is empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version

2015-12-10 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050592#comment-15050592
 ] 

Hudson commented on YARN-3623:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8950 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8950/])
YARN-3623. Add a config to indicate the Timeline Service version. (junping_du: 
rev f910e4f639dc311fcb257bfcb869b1aa8b2c0643)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


> We should have a config to indicate the Timeline Service version
> 
>
> Key: YARN-3623
> URL: https://issues.apache.org/jira/browse/YARN-3623
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-3623-2015-11-19.1.patch, YARN-3623-2015-12-09.patch
>
>
> So far RM, MR AM, DA AM added/changed new config to enable the feature to 
> write the timeline data to v2 server. It's good to have a YARN 
> timeline-service.version config like timeline-service.enable to indicate the 
> version of the running timeline service with the given YARN cluster. It's 
> beneficial for users to more smoothly move from v1 to v2, as they don't need 
> to change the existing config, but switch this config from v1 to v2. And each 
> framework doesn't need to have their own v1/v2 config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2934) Improve handling of container's stderr

2015-12-10 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050887#comment-15050887
 ] 

Naganarasimha G R commented on YARN-2934:
-

Hi [~vvasudev],
Hi as per the 
[comment|https://issues.apache.org/jira/browse/YARN-2934?focusedCommentId=14983970=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14983970]
 from [~steve_l], he was of the opinion that it might be less chances that some 
one would configure it, but i am open to add additional configuration if not an 
issue !

> Improve handling of container's stderr 
> ---
>
> Key: YARN-2934
> URL: https://issues.apache.org/jira/browse/YARN-2934
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Gera Shegalov
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-2934.v1.001.patch, YARN-2934.v1.002.patch, 
> YARN-2934.v1.003.patch, YARN-2934.v1.004.patch, YARN-2934.v1.005.patch
>
>
> Most YARN applications redirect stderr to some file. That's why when 
> container launch fails with {{ExitCodeException}} the message is empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1856) cgroups based memory monitoring for containers

2015-12-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050492#comment-15050492
 ] 

Hadoop QA commented on YARN-1856:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
46s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 14s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 49s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
54s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 35s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
4s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 11s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 11s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 47s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 47s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 36s 
{color} | {color:red} Patch generated 11 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 240, now 248). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 35s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 34s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 26s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 33s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 7s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
30s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 67m 0s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12776756/YARN-1856.004.patch |
| JIRA Issue | YARN-1856 |
| Optional Tests |  asflicense

[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version

2015-12-10 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050494#comment-15050494
 ] 

Junping Du commented on YARN-3623:
--

+1. Committing.

> We should have a config to indicate the Timeline Service version
> 
>
> Key: YARN-3623
> URL: https://issues.apache.org/jira/browse/YARN-3623
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Attachments: YARN-3623-2015-11-19.1.patch, YARN-3623-2015-12-09.patch
>
>
> So far RM, MR AM, DA AM added/changed new config to enable the feature to 
> write the timeline data to v2 server. It's good to have a YARN 
> timeline-service.version config like timeline-service.enable to indicate the 
> version of the running timeline service with the given YARN cluster. It's 
> beneficial for users to more smoothly move from v1 to v2, as they don't need 
> to change the existing config, but switch this config from v1 to v2. And each 
> framework doesn't need to have their own v1/v2 config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

2015-12-10 Thread Naganarasimha G R (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4416:

Description: 
While debugging in eclipse came across a scenario where in i had to get to know 
the name of the queue but every time i tried to see the queue it was getting 
hung. On seeing the stack realized there was a deadlock but on analysis found 
out that it was only due to *queue.toString()* during debugging as 
{{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
Hence we need to ensure following :



  was:
While debugging in eclipse came across a scenario where in i had to get to know 
the name of the queue but every time i tried to see the queue it was getting 
hung. On seeing the stack realized there was a deadlock but on analysis found 
out that it was only due to *queue.toString()* during debugging as 
{{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized and 
better be handled through read and write locks.



> Deadlock due to synchronised get Methods in AbstractCSQueue
> ---
>
> Key: YARN-4416
> URL: https://issues.apache.org/jira/browse/YARN-4416
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, 
> deadlock.log
>
>
> While debugging in eclipse came across a scenario where in i had to get to 
> know the name of the queue but every time i tried to see the queue it was 
> getting hung. On seeing the stack realized there was a deadlock but on 
> analysis found out that it was only due to *queue.toString()* during 
> debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
> Hence we need to ensure following :



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

2015-12-10 Thread Naganarasimha G R (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4416:

Issue Type: Sub-task  (was: Bug)
Parent: YARN-3091

> Deadlock due to synchronised get Methods in AbstractCSQueue
> ---
>
> Key: YARN-4416
> URL: https://issues.apache.org/jira/browse/YARN-4416
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, 
> deadlock.log
>
>
> While debugging in eclipse came across a scenario where in i had to get to 
> know the name of the queue but every time i tried to see the queue it was 
> getting hung. On seeing the stack realized there was a deadlock but on 
> analysis found out that it was only due to *queue.toString()* during 
> debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
> Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized 
> and better be handled through read and write locks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2015-12-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051306#comment-15051306
 ] 

Hadoop QA commented on YARN-4356:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 12 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 11m 
28s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 7s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
55s {color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} 
|
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
27s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 7m 3s 
{color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 3m 
18s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 54s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in 
feature-YARN-2928 has 3 extant Findbugs warnings. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 50s 
{color} | {color:red} hadoop-yarn-common in feature-YARN-2928 failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 42s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in feature-YARN-2928 
failed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 7m 59s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
13s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 36m 15s 
{color} | {color:red} root-jdk1.8.0_66 with JDK v1.8.0_66 generated 5 new 
issues (was 779, now 779). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
55s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 49m 10s 
{color} | {color:red} root-jdk1.7.0_91 with JDK v1.7.0_91 generated 5 new 
issues (was 772, now 772). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 55s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 29s 
{color} | {color:red} Patch generated 7 new checkstyle issues in root (total 
was 1970, now 1942). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 6m 53s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 3m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 14m 
9s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 7m 8s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common-jdk1.8.0_66 with JDK 
v1.8.0_66 generated 1 new issues (was 100, now 100). {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 7m 8s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66
 with JDK v1.8.0_66 generated 1 new issues (was 100, now 100). {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 49s 
{color} | {color:green} the patch passed with

[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version

2015-12-10 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051264#comment-15051264
 ] 

Sangjin Lee commented on YARN-3623:
---

That's why I asked whether you were OK with the wording.  How about "it means 
the cluster should bring up the timeline service specified by the version" 
(dropping the exclusive phrase)?

> We should have a config to indicate the Timeline Service version
> 
>
> Key: YARN-3623
> URL: https://issues.apache.org/jira/browse/YARN-3623
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-3623-2015-11-19.1.patch, YARN-3623-2015-12-09.patch
>
>
> So far RM, MR AM, DA AM added/changed new config to enable the feature to 
> write the timeline data to v2 server. It's good to have a YARN 
> timeline-service.version config like timeline-service.enable to indicate the 
> version of the running timeline service with the given YARN cluster. It's 
> beneficial for users to more smoothly move from v1 to v2, as they don't need 
> to change the existing config, but switch this config from v1 to v2. And each 
> framework doesn't need to have their own v1/v2 config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4424) Fix deadlock in RMAppImpl

2015-12-10 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051309#comment-15051309
 ] 

Junping Du commented on YARN-4424:
--

Hi [~leftnoteasy] and [~jianhe], Is this blocker for 2.6.3? If so, we should 
commit it to 2.6.3.

> Fix deadlock in RMAppImpl
> -
>
> Key: YARN-4424
> URL: https://issues.apache.org/jira/browse/YARN-4424
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Jian He
>Priority: Blocker
> Fix For: 2.7.2, 2.6.3
>
> Attachments: YARN-4424.1.patch
>
>
> {code}
> yarn@XXX:/mnt/hadoopqe$ /usr/hdp/current/hadoop-yarn-client/bin/yarn 
> application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING
> 15/12/04 21:59:54 INFO impl.TimelineClientImpl: Timeline service address: 
> http://XXX:8188/ws/v1/timeline/
> 15/12/04 21:59:54 INFO client.RMProxy: Connecting to ResourceManager at 
> XXX/0.0.0.0:8050
> 15/12/04 21:59:55 INFO client.AHSProxy: Connecting to Application History 
> server at XXX/0.0.0.0:10200
> {code}
> {code:title=RM log}
> 2015-12-04 21:59:19,744 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 237000
> 2015-12-04 22:00:50,945 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 238000
> 2015-12-04 22:02:22,416 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 239000
> 2015-12-04 22:03:53,593 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 24
> 2015-12-04 22:05:24,856 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 241000
> 2015-12-04 22:06:56,235 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 242000
> 2015-12-04 22:08:27,510 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 243000
> 2015-12-04 22:09:58,786 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 244000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version

2015-12-10 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051377#comment-15051377
 ] 

Li Lu commented on YARN-3623:
-

I'd like to make sure I understand this change correctly. Specifically, what 
kind of new cases do we allow by removing the "nothing else" part. IIUC, this 
allows us to bring up an old ATS server after the system has been upgraded. 
This looks fine with me. Am I missing some other cases introduced by this 
change? 

> We should have a config to indicate the Timeline Service version
> 
>
> Key: YARN-3623
> URL: https://issues.apache.org/jira/browse/YARN-3623
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-3623-2015-11-19.1.patch, 
> YARN-3623-2015-12-09.patch, YARN-3623-addendum.patch
>
>
> So far RM, MR AM, DA AM added/changed new config to enable the feature to 
> write the timeline data to v2 server. It's good to have a YARN 
> timeline-service.version config like timeline-service.enable to indicate the 
> version of the running timeline service with the given YARN cluster. It's 
> beneficial for users to more smoothly move from v1 to v2, as they don't need 
> to change the existing config, but switch this config from v1 to v2. And each 
> framework doesn't need to have their own v1/v2 config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4438) Implement RM leader election with curator

2015-12-10 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052001#comment-15052001
 ] 

Jian He commented on YARN-4438:
---

bq. Otherwise, the clients (and their individual timeouts etc.) could lead to 
inconsistencies?
agree, I was actually talking about the same with [~xgong] offline, didn't do 
that just because wanted to keep the change minimal. I'll make the change 
accordingly. 
bq. Would it be possible to hide the implementation of the leader-election - 
ActiveStandbyElector vs CuratorElector - behind EmbeddedElector?
I'm thinking to remove EmbeddedElectorService later on and separate the 
LeaderElectorService out from the AdminService. does that make sense ?


> Implement RM leader election with curator
> -
>
> Key: YARN-4438
> URL: https://issues.apache.org/jira/browse/YARN-4438
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4438.1.patch
>
>
> This is to implement the leader election with curator instead of the 
> ActiveStandbyElector from common package,  this also avoids adding more 
> configs in common to suit RM's own needs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-10 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052022#comment-15052022
 ] 

Vinod Kumar Vavilapalli commented on YARN-4224:
---

We should model this around resources (as REST specifies) instead of around 
queries. Special purpose queries can be treated as shortcuts to existing 
resource hierarchy.

Also, user is missing from the above set of examples. 

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4224-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4399) FairScheduler allocated container should resetSchedulingOpportunities count of its priority

2015-12-10 Thread Lin Yiqun (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052026#comment-15052026
 ] 

Lin Yiqun commented on YARN-4399:
-

I test the failed unit tests and these error is not related.

> FairScheduler allocated container should resetSchedulingOpportunities count 
> of its priority
> ---
>
> Key: YARN-4399
> URL: https://issues.apache.org/jira/browse/YARN-4399
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4399.001.patch, YARN-4399.002.patch, 
> YARN-4399.003.patch
>
>
> There is a bug on fairScheduler allocating containers when you configurate 
> the locality configs.When you attempt to assigned a container,it will invoke 
> {{FSAppAttempt#addSchedulingOpportunity}} whenever it can be assigned 
> successfully or not. And if you configurate the 
> yarn.scheduler.fair.locality.threshold.node and 
> yarn.scheduler.fair.locality.threshold.rack, the schedulingOpportunity value 
> will influence the locality of containers.Because if one container is 
> assigned successfully and its  priority schedulingOpportunity count will be 
> increased, and second container will be increased again.This will may be let 
> their priority of allowedLocality degrade. And this will let this container 
> dealt by rackRequest. So I think in fairScheduler allocating container, if 
> the previous container was dealt, its priority of schedulerCount should be 
> reset to 0, and don't let its value influence container's allocating in next 
> iteration and this will increased the locality of containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-10 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052038#comment-15052038
 ] 

Li Lu commented on YARN-4224:
-

And, BTW, under the current resource model of ATS v2, maybe a full path to 
locate an entity can be like:
{code}
/clusters/{clusterid}/users/{userid}/flows/{flowid}/flowruns/{flowrunid}/apps/{appid}/entities/{entityid}
{code}

Any stages in between will return the info of a specific entity (cluster, user, 
flow, flowrun, app), or list the next level of resources of it. 

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4224-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4350) TestDistributedShell fails

2015-12-10 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051998#comment-15051998
 ] 

Naganarasimha G R commented on YARN-4350:
-

[~sjlee0] can you cherry pick YARN-2859 into our branch, so that the code that 
it will be simpler while merging the branch ?

> TestDistributedShell fails
> --
>
> Key: YARN-4350
> URL: https://issues.apache.org/jira/browse/YARN-4350
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-4350-feature-YARN-2928.001.patch
>
>
> Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. 
> There seem to be 2 distinct issues.
> (1) testDSShellWithoutDomainV2* tests fail sporadically
> These test fail more often than not if tested by themselves:
> {noformat}
> testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 30.998 sec  <<< FAILURE!
> java.lang.AssertionError: Application created event should be published 
> atleast once expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207)
> {noformat}
> They start happening after YARN-4129. I suspect this might have to do with 
> some timing issue.
> (2) the whole test times out
> If you run the whole TestDistributedShell test, it times out without fail. 
> This may or may not have to do with the port change introduced by YARN-2859 
> (just a hunch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-10 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052030#comment-15052030
 ] 

Li Lu commented on YARN-4224:
-

OK I've got a few references for the discussion. I looked at the WebHDFS REST 
APIs but the use case there is not quite similar to our use case here. The RM 
REST APIs mostly only have one mandatory parameter, such as 
"/apps/{appid}/appattempt". AHS web services is probably the most similar use 
case here, so we can borrow much of its resource model. For multiple parameters 
we organize them as an ordered sequence, each one following their parameter 
names, such as 
"/apps/{appid}/appattempts/{appattemptid}/containers/{containerid}". Any APIs 
that do not end on a parameter (such as "/apps/{appid}/appattempts") is treated 
as a list. This appears to be the typical resource model in YARN. The MapReduce 
AMWebService is another example for this. 

Another thing is, for special queries like flowapps, we can add them as short 
cuts on the flow level, such as 
"/cluster/{clusterid}/user/{userid}/flow/{flowid}/apps". 

Could somebody please remind me why we decide to remove user from the path? 
Thanks! 

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4224-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3998) Add retry-times to let NM re-launch container when it fails to run

2015-12-10 Thread Jun Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052043#comment-15052043
 ] 

Jun Gong commented on YARN-3998:


[~vvasudev] Thanks for the attention and suggestion. 

We have a patch for it, it implemented the policy "restart on all errors". Will 
attach a patch if it is OK.

> Add retry-times to let NM re-launch container when it fails to run
> --
>
> Key: YARN-3998
> URL: https://issues.apache.org/jira/browse/YARN-3998
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Jun Gong
>Assignee: Jun Gong
>
> I'd like to add a field(retry-times) in ContainerLaunchContext. When AM 
> launches containers, it could specify the value. Then NM will re-launch the 
> container 'retry-times' times when it fails to run(e.g.exit code is not 0). 
> It will save a lot of time. It avoids container localization. RM does not 
> need to re-schedule the container. And local files in container's working 
> directory will be left for re-use.(If container have downloaded some big 
> files, it does not need to re-download them when running again.) 
> We find it is useful in systems like Storm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4340) Add "list" API to reservation system

2015-12-10 Thread Subru Krishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052096#comment-15052096
 ] 

Subru Krishnan commented on YARN-4340:
--

[~seanpo03], I noticed there are couple of minor checkstyle & Javadoc issues. 
Can you kindly fix those. 

Also the *TestReservationInputValidator::testSubmitReservationDoesnotExist* is 
failing, seems like a minor fix.

It will be great if you can try running it e2e on at least a single node setup.

Thanks for working on this.

> Add "list" API to reservation system
> 
>
> Key: YARN-4340
> URL: https://issues.apache.org/jira/browse/YARN-4340
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Sean Po
> Attachments: YARN-4340.v1.patch, YARN-4340.v2.patch, 
> YARN-4340.v3.patch, YARN-4340.v4.patch, YARN-4340.v5.patch, 
> YARN-4340.v6.patch, YARN-4340.v7.patch, YARN-4340.v8.patch
>
>
> This JIRA tracks changes to the APIs of the reservation system, and enables 
> querying the reservation system on which reservation exists by "time-range, 
> reservation-id".
> YARN-4420 has a dependency on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS

2015-12-10 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051985#comment-15051985
 ] 

Naganarasimha G R commented on YARN-3946:
-

Hi [~wangda],
Java doc, checkstyle and unit test case issues are either not related to the 
patch or not valid to be taken care 

> Allow fetching exact reason as to why a submitted app is in ACCEPTED state in 
> CS
> 
>
> Key: YARN-3946
> URL: https://issues.apache.org/jira/browse/YARN-3946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sumit Nigam
>Assignee: Naganarasimha G R
> Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, 
> YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch, 
> YARN-3946.v1.004.patch, YARN-3946.v1.005.patch, YARN-3946.v1.006.patch, 
> YARN-3946.v1.007.patch, YARN-3946.v1.008.patch
>
>
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4440) FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time

2015-12-10 Thread Lin Yiqun (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4440:

Attachment: YARN-4440.002.patch

Resolve the remained error.

> FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time
> -
>
> Key: YARN-4440
> URL: https://issues.apache.org/jira/browse/YARN-4440
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4440.001.patch, YARN-4440.002.patch
>
>
> It seems there is a bug on {{FSAppAttempt#getAllowedLocalityLevelByTime}} 
> method
> {code}
> // default level is NODE_LOCAL
> if (! allowedLocalityLevel.containsKey(priority)) {
>   allowedLocalityLevel.put(priority, NodeType.NODE_LOCAL);
>   return NodeType.NODE_LOCAL;
> }
> {code}
> If you first invoke this method, it doesn't init  time in 
> lastScheduledContainer and this will lead to execute these code for next 
> invokation:
> {code}
> // check waiting time
> long waitTime = currentTimeMs;
> if (lastScheduledContainer.containsKey(priority)) {
>   waitTime -= lastScheduledContainer.get(priority);
> } else {
>   waitTime -= getStartTime();
> }
> {code}
> the waitTime will subtract to FsApp startTime, and this will be easily more 
> than the delay time and allowedLocality degrade. Because FsApp startTime will 
> be start earlier than currentTimeMs. So we should add the initial time of 
> priority to prevent comparing with FsApp startTime and allowedLocalityLevel 
> degrade. And this problem will have more negative influence for small-jobs. 
> The YARN-4399 also discuss some problem in aspect of locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3542) Re-factor support for CPU as a resource using the new ResourceHandler mechanism

2015-12-10 Thread Sidharta Seethana (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052108#comment-15052108
 ] 

Sidharta Seethana commented on YARN-3542:
-

hi [~vvasudev], thanks for the updated patch. Please see comments below.

h4. CGroupsCpuResourceHandlerImpl.java

{code}
import 
org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler;
import 
org.apache.hadoop.yarn.server.nodemanager.util.DefaultLCEResourcesHandler;
import org.apache.hadoop.yarn.server.nodemanager.util.LCEResourcesHandler;
{code}
These are unused imports in CGroupsCpuResourceHandlerImpl

{code}
@VisibleForTesting
static final String CPU_PERIOD_US = "cfs_period_us";
@VisibleForTesting
static final String CPU_QUOTA_US = "cfs_quota_us";
@VisibleForTesting
static final String CPU_SHARES = "shares”;
{code}

Move these to CGroupsHandler ?

{code}
int quotaUS = MAX_QUOTA_US;
int periodUS = (int) (MAX_QUOTA_US / yarnProcessors);
{code}

About how shares/cfs_period_us/cfs_quota_us are used : additional 
comments/documentation and examples (as unit tests?) would be useful. It took 
me a while to trace through the code using some examples. 

h4. CGroupsLCEResourcesHandler.java

Since the class has been marked deprecated, is it necessary to make the rest of 
the changes that are included ?

h4. LinuxContainerExecutor.java
{code}
private LCEResourcesHandler getResourcesHandler(Configuration conf) {
  LCEResourcesHandler handler = ReflectionUtils.newInstance(
  conf.getClass(YarnConfiguration.NM_LINUX_CONTAINER_RESOURCES_HANDLER,
  DefaultLCEResourcesHandler.class, LCEResourcesHandler.class), conf);

  // Stop using CgroupsLCEResourcesHandler
  // use the resource handler chain instead
  // ResourceHandlerModule will create the cgroup cpu module if
  // CgroupsLCEResourcesHandler is set
  if (handler instanceof CgroupsLCEResourcesHandler) {
handler =
ReflectionUtils.newInstance(DefaultLCEResourcesHandler.class, conf);
  }
  handler.setConf(conf);
  return handler;
}
{code}

Since all resource handling is now in the resource handler chain - there is no 
longer a need to have references to LCEResourcesHandler in 
LinuxContainerExeuctor - all related config handling etc should be in 
ResourceHandlerModule.java (which already seems to the case). IMO, all 
references to LCEResourcesHandler (and sub-classes) should be removed from 
LinuxContainerExecutor. 

> Re-factor support for CPU as a resource using the new ResourceHandler 
> mechanism
> ---
>
> Key: YARN-3542
> URL: https://issues.apache.org/jira/browse/YARN-3542
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
>Priority: Critical
> Attachments: YARN-3542.001.patch, YARN-3542.002.patch, 
> YARN-3542.003.patch
>
>
> In YARN-3443 , a new ResourceHandler mechanism was added which enabled easier 
> addition of new resource types in the nodemanager (this was used for network 
> as a resource - See YARN-2140 ). We should refactor the existing CPU 
> implementation ( LinuxContainerExecutor/CgroupsLCEResourcesHandler ) using 
> the new ResourceHandler mechanism. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-10 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052035#comment-15052035
 ] 

Li Lu commented on YARN-4224:
-

Reformat:

OK I've got a few references for the discussion. I looked at the WebHDFS REST 
APIs but the use case there is not quite similar to our use case here. The RM 
REST APIs mostly only have one mandatory parameter, such as 
{{/apps/\{appid\}/appattempt}}. AHS web services is probably the most similar 
use case here, so we can borrow much of its resource model. For multiple 
parameters we organize them as an ordered sequence, each one following their 
parameter names, such as 
{{/apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\}}}. 
Any APIs that do not end on a parameter (such as 
{{/apps/\{appid\}/appattempts}}) is treated as a list. This appears to be the 
typical resource model in YARN. The MapReduce AMWebService is another example 
for this.

Another thing is, for special queries like flowapps, we can add them as short 
cuts on the flow level, such as 
{{/cluster/\{clusterid\}/user/\{userid\}/flow/\{flowid\}/apps}}.

Could somebody please remind me why we decide to remove user from the path? 
Thanks!

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4224-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4340) Add "list" API to reservation system

2015-12-10 Thread Subru Krishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052067#comment-15052067
 ] 

Subru Krishnan commented on YARN-4340:
--

Thanks for addressing my feedback [~seanpo03]. The latest patch (v8) LGTM.

[~curino], can you take a look and commit.

> Add "list" API to reservation system
> 
>
> Key: YARN-4340
> URL: https://issues.apache.org/jira/browse/YARN-4340
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Sean Po
> Attachments: YARN-4340.v1.patch, YARN-4340.v2.patch, 
> YARN-4340.v3.patch, YARN-4340.v4.patch, YARN-4340.v5.patch, 
> YARN-4340.v6.patch, YARN-4340.v7.patch, YARN-4340.v8.patch
>
>
> This JIRA tracks changes to the APIs of the reservation system, and enables 
> querying the reservation system on which reservation exists by "time-range, 
> reservation-id".
> YARN-4420 has a dependency on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3226) UI changes for decommissioning node

2015-12-10 Thread Sunil G (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3226:
--
Attachment: 0004-YARN-3226.patch

Thank You [~rohithsharma] for the comments. Addressing the same in the new 
patch. Also addressing the comments given by [~djp] earlier.

Kindly help to review.

> UI changes for decommissioning node
> ---
>
> Key: YARN-3226
> URL: https://issues.apache.org/jira/browse/YARN-3226
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Junping Du
>Assignee: Sunil G
> Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, 
> 0003-YARN-3226.patch, 0004-YARN-3226.patch, ClusterMetricsOnNodes_UI.png
>
>
> Some initial thought is:
> decommissioning nodes should still show up in the active nodes list since 
> they are still running containers. 
> A separate decommissioning tab to filter for those nodes would be nice, 
> although I suppose users can also just use the jquery table to sort/search for
> nodes in that state from the active nodes list if it's too crowded to add yet 
> another node
> state tab (or maybe get rid of some effectively dead tabs like the reboot 
> state tab).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4415) Scheduler Web Ui shows max capacity for the queue is 100% but when we submit application doesnt get assigned

2015-12-10 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051245#comment-15051245
 ] 

Naganarasimha G R commented on YARN-4415:
-

Hi [~wangda],
bq. User doesn't need to update configurations a lot if new labels added 
(Assume partition will be shared to all queues)  User has to change 
configurations a lot if new labels added (Assume partition will be shared to 
few queues only)
Sorry was not able to get your thoughts here ... whats the difference you are 
trying to indicate between update and change configurations ?
If  Maximum-capacity for partitions is set to 100 what needs to be modified ? 
how is it different from the default max capacity configuration for the default 
partition ? I understand guaranteed capacity needs to be set to zero, but why 
max cap needs to be modified when its shared to few queues?


> Scheduler Web Ui shows max capacity for the queue is 100% but when we submit 
> application doesnt get assigned
> 
>
> Key: YARN-4415
> URL: https://issues.apache.org/jira/browse/YARN-4415
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: App info with diagnostics info.png, 
> capacity-scheduler.xml, screenshot-1.png
>
>
> Steps to reproduce the issue :
> Scenario 1:
> # Configure a queue(default) with accessible node labels as *
> # create a exclusive partition *xxx* and map a NM to it
> # ensure no capacities are configured for default for label xxx
> # start an RM app with queue as default and label as xxx
> # application is stuck but scheduler ui shows 100% as max capacity for that 
> queue
> Scenario 2:
> # create a nonexclusive partition *sharedPartition* and map a NM to it
> # ensure no capacities are configured for default queue
> # start an RM app with queue as *default* and label as *sharedPartition*
> # application is stuck but scheduler ui shows 100% as max capacity for that 
> queue for *sharedPartition*
> For both issues cause is the same default max capacity and abs max capacity 
> is set to Zero %



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

2015-12-10 Thread Naganarasimha G R (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4416:

Description: 
While debugging in eclipse came across a scenario where in i had to get to know 
the name of the queue but every time i tried to see the queue it was getting 
hung. On seeing the stack realized there was a deadlock but on analysis found 
out that it was only due to *queue.toString()* during debugging as 
{{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
Hence we need to ensure following :
# queueCapacity, resource-usage has their own read/write lock.
# 


  was:
While debugging in eclipse came across a scenario where in i had to get to know 
the name of the queue but every time i tried to see the queue it was getting 
hung. On seeing the stack realized there was a deadlock but on analysis found 
out that it was only due to *queue.toString()* during debugging as 
{{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
Hence we need to ensure following :




> Deadlock due to synchronised get Methods in AbstractCSQueue
> ---
>
> Key: YARN-4416
> URL: https://issues.apache.org/jira/browse/YARN-4416
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, 
> deadlock.log
>
>
> While debugging in eclipse came across a scenario where in i had to get to 
> know the name of the queue but every time i tried to see the queue it was 
> getting hung. On seeing the stack realized there was a deadlock but on 
> analysis found out that it was only due to *queue.toString()* during 
> debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
> Hence we need to ensure following :
> # queueCapacity, resource-usage has their own read/write lock.
> # 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version

2015-12-10 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051426#comment-15051426
 ] 

Li Lu commented on YARN-3623:
-

Agree. +1 for the change. 

> We should have a config to indicate the Timeline Service version
> 
>
> Key: YARN-3623
> URL: https://issues.apache.org/jira/browse/YARN-3623
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-3623-2015-11-19.1.patch, 
> YARN-3623-2015-12-09.patch, YARN-3623-addendum.patch
>
>
> So far RM, MR AM, DA AM added/changed new config to enable the feature to 
> write the timeline data to v2 server. It's good to have a YARN 
> timeline-service.version config like timeline-service.enable to indicate the 
> version of the running timeline service with the given YARN cluster. It's 
> beneficial for users to more smoothly move from v1 to v2, as they don't need 
> to change the existing config, but switch this config from v1 to v2. And each 
> framework doesn't need to have their own v1/v2 config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4350) TestDistributedShell fails

2015-12-10 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052333#comment-15052333
 ] 

Sangjin Lee commented on YARN-4350:
---

I pushed that commit. Let me know if you need anything else. Thanks.

> TestDistributedShell fails
> --
>
> Key: YARN-4350
> URL: https://issues.apache.org/jira/browse/YARN-4350
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-4350-feature-YARN-2928.001.patch
>
>
> Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. 
> There seem to be 2 distinct issues.
> (1) testDSShellWithoutDomainV2* tests fail sporadically
> These test fail more often than not if tested by themselves:
> {noformat}
> testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 30.998 sec  <<< FAILURE!
> java.lang.AssertionError: Application created event should be published 
> atleast once expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207)
> {noformat}
> They start happening after YARN-4129. I suspect this might have to do with 
> some timing issue.
> (2) the whole test times out
> If you run the whole TestDistributedShell test, it times out without fail. 
> This may or may not have to do with the port change introduced by YARN-2859 
> (just a hunch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4399) FairScheduler allocated container should resetSchedulingOpportunities count of its priority

2015-12-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051132#comment-15051132
 ] 

Hadoop QA commented on YARN-4399:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
54s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
15s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 23s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 31s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 28s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 28s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 30s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 30s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 32s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 30s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 27s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 31s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 35s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
26s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 18m 45s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12776824/YARN-4399.002.patch |
| JIRA Issue | YARN-4399 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux

[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version

2015-12-10 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051133#comment-15051133
 ] 

Hudson commented on YARN-3623:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #682 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/682/])
YARN-3623. Add a config to indicate the Timeline Service version. (junping_du: 
rev f910e4f639dc311fcb257bfcb869b1aa8b2c0643)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt


> We should have a config to indicate the Timeline Service version
> 
>
> Key: YARN-3623
> URL: https://issues.apache.org/jira/browse/YARN-3623
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-3623-2015-11-19.1.patch, YARN-3623-2015-12-09.patch
>
>
> So far RM, MR AM, DA AM added/changed new config to enable the feature to 
> write the timeline data to v2 server. It's good to have a YARN 
> timeline-service.version config like timeline-service.enable to indicate the 
> version of the running timeline service with the given YARN cluster. It's 
> beneficial for users to more smoothly move from v1 to v2, as they don't need 
> to change the existing config, but switch this config from v1 to v2. And each 
> framework doesn't need to have their own v1/v2 config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4442) create new application operation from the Rest API should be logged in the audit log

2015-12-10 Thread Varun Vasudev (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051136#comment-15051136
 ] 

Varun Vasudev commented on YARN-4442:
-

Any reason why it should be logged in the REST API but not for the RPC API?

> create new application operation from the Rest API should be logged in the 
> audit log
> 
>
> Key: YARN-4442
> URL: https://issues.apache.org/jira/browse/YARN-4442
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Mohammad Shahid Khan
>Assignee: Mohammad Shahid Khan
>
> create new application operation from the Rest API 
> ("/ws/v1/cluster/apps/new-application")  should be logged in 
> the audit log



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4350) TestDistributedShell fails

2015-12-10 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052179#comment-15052179
 ] 

Sangjin Lee commented on YARN-4350:
---

You mean commit 
[f114e728da6e19f3d35ff0cfef9fceea26aa5d46|https://github.com/apache/hadoop/commit/f114e728da6e19f3d35ff0cfef9fceea26aa5d46]
 specifically?

> TestDistributedShell fails
> --
>
> Key: YARN-4350
> URL: https://issues.apache.org/jira/browse/YARN-4350
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-4350-feature-YARN-2928.001.patch
>
>
> Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. 
> There seem to be 2 distinct issues.
> (1) testDSShellWithoutDomainV2* tests fail sporadically
> These test fail more often than not if tested by themselves:
> {noformat}
> testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 30.998 sec  <<< FAILURE!
> java.lang.AssertionError: Application created event should be published 
> atleast once expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207)
> {noformat}
> They start happening after YARN-4129. I suspect this might have to do with 
> some timing issue.
> (2) the whole test times out
> If you run the whole TestDistributedShell test, it times out without fail. 
> This may or may not have to do with the port change introduced by YARN-2859 
> (just a hunch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3432) Cluster metrics have wrong Total Memory when there is reserved memory on CS

2015-12-10 Thread Akira AJISAKA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052251#comment-15052251
 ] 

Akira AJISAKA commented on YARN-3432:
-

bq. Rethinking this, it's better to change how to calculate availableMB in 
CapacityScheduler. availableMB should include reservedMB for consistency.
or to change how to calculate availableMB in FairScheduler to exclude 
reservedMB. As [~skaterxu] commented,  we need to take care of vcores as well.

> Cluster metrics have wrong Total Memory when there is reserved memory on CS
> ---
>
> Key: YARN-3432
> URL: https://issues.apache.org/jira/browse/YARN-3432
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Thomas Graves
>Assignee: Brahma Reddy Battula
> Attachments: YARN-3432-002.patch, YARN-3432.patch
>
>
> I noticed that when reservations happen when using the Capacity Scheduler, 
> the UI and web services report the wrong total memory.
> For example.  I have a 300GB of total memory in my cluster.  I allocate 50 
> and I reserve 10.  The cluster metrics for total memory get reported as 290GB.
> This was broken by https://issues.apache.org/jira/browse/YARN-656 so perhaps 
> there is a difference between fair scheduler and capacity scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4441) Kill application request from the webservice(ui) is showing success even for the finished applications

2015-12-10 Thread Mohammad Shahid Khan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052256#comment-15052256
 ] 

Mohammad Shahid Khan commented on YARN-4441:


If an application is already finished, then the killing the application does 
does not make any sense.
>From UI we can call the kill operation and that event is logged as below.
And in the audit log we are getting the success message for the operation Kill 
Application Request is successful.

in the below code the problem is there at the first if check. 
Suppose application_xyz is already finished and when the kill request will come 
the target state will be KILLED and actual will be finished, the first if check 
will be true. 
Second if check also will be true as the target is KILLED and killApp method 
will be called and same will log the 
 USER=dr.whoOPERATION=Kill Application Request  TARGET=ClientRMService  
RESULT=SUCCESS  APPID=application_xyz  | RMAuditLogger.java:91
 USER=dr.whoOPERATION=Kill Application Request  TARGET=RMWebService 
RESULT=SUCCESS  APPID=application_xyz  | MAuditLogger.java:91

{Code}
 if (!app.getState().toString().equals(targetState.getState())) {
  // user is attempting to change state. right we only
  // allow users to kill the app

  if 
(targetState.getState().equals(YarnApplicationState.KILLED.toString())) {
return killApp(app, callerUGI, hsr);
  }
  throw new BadRequestException("Only '"
  + YarnApplicationState.KILLED.toString()
  + "' is allowed as a target state.");
}
{Code}

I think we should not allow to call the killApp if the application is already 
finished.

> Kill application request from the webservice(ui) is showing success even for 
> the finished applications
> --
>
> Key: YARN-4441
> URL: https://issues.apache.org/jira/browse/YARN-4441
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Mohammad Shahid Khan
>Assignee: Mohammad Shahid Khan
>
> If the application is already finished ie either failled, killed, or succeded
> the kill operation should not be logged as success. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4194) Extend Reservation Definition Langauge (RDL) extensions to support node labels

2015-12-10 Thread Alexey Tumanov (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Tumanov updated YARN-4194:
-
Attachment: YARN-4194-v2.patch

My patch test returns success:
+1 overall

 __ 
< Success! >
 -- 

| Vote |   Subsystem |  Runtime   | Comment
|   0  |  pre-patch  |  21m 6s| Pre-patch trunk compilation is 
|  | || healthy.
|  +1  |@author  |  0m 0s | The patch does not contain any 
|  | || @author tags.
|  +1  | tests included  |  0m 0s | The patch appears to include 2 new 
|  | || or modified test files.
|  +1  |  javac  |  8m 56s| There were no new javac warning 
|  | || messages.
|  +1  |javadoc  |  11m 13s   | There were no new javadoc warning 
|  | || messages.
|  +1  |  release audit  |  0m 23s| The applied patch does not increase 
|  | || the total number of release audit
|  | || warnings.
|  +1  | checkstyle  |  2m 3s | There were no new checkstyle 
|  | || issues.
|  +1  | whitespace  |  0m 1s | The patch has no lines that end in 
|  | || whitespace.
|  +1  |install  |  1m 40s| mvn install still works. 
|  +1  |eclipse:eclipse  |  0m 45s| The patch built with 
|  | || eclipse:eclipse.
|  +1  |   findbugs  |  3m 20s| The patch does not introduce any 
|  | || new Findbugs (version 3.0.0)
|  | || warnings.
|  | |  49m 28s   | 


|| Subsystem || Report/Notes ||

| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 50edcb9 |
| Java | 1.7.0_91 |


> Extend Reservation Definition Langauge (RDL) extensions to support node labels
> --
>
> Key: YARN-4194
> URL: https://issues.apache.org/jira/browse/YARN-4194
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Alexey Tumanov
> Attachments: YARN-4194-v1.patch, YARN-4194-v2.patch
>
>
> This JIRA tracks changes to the APIs to the reservation system to support
> the expressivity of node-labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4358) Improve relationship between SharingPolicy and ReservationAgent

2015-12-10 Thread Akira AJISAKA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052270#comment-15052270
 ] 

Akira AJISAKA commented on YARN-4358:
-

+1 pending Jenkins. Thanks Arun, Carlo, and Subru.

bq. Interestingly for both me and him the mvn package works just fine (maybe 
different JDK?).
Only JDK8 hits this issue.

> Improve relationship between SharingPolicy and ReservationAgent
> ---
>
> Key: YARN-4358
> URL: https://issues.apache.org/jira/browse/YARN-4358
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4358.2.patch, YARN-4358.3.patch, YARN-4358.4.patch, 
> YARN-4358.addendum-2.patch, YARN-4358.addendum.patch, YARN-4358.patch
>
>
> At the moment an agent places based on available resources, but has no 
> visibility to extra constraints imposed by the SharingPolicy. While not all 
> constraints are easily represented some (e.g., max-instantaneous resources) 
> are easily represented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Yarn Running Application Logs

2015-12-10 Thread kishore alajangi

Hi,

I want to collect running yarn application logs, when I try

# yarn logs -applicationId 

it is giving error

"application has not completed logs are only available after an application
completes yarn"

but I can see the logs through resourcemanager webui

Anybody can help me how to collect the logs in a file ?

-- 
Sincere Regards,
A.Kishore Kumar,
Ph: +91 9246274575

[jira] [Reopened] (YARN-3998) Add retry-times to let NM re-launch container when it fails to run

2015-12-10 Thread Varun Vasudev (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev reopened YARN-3998:
-

Re-opening - I think this is a useful feature for YARN to support.

> Add retry-times to let NM re-launch container when it fails to run
> --
>
> Key: YARN-3998
> URL: https://issues.apache.org/jira/browse/YARN-3998
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Jun Gong
>Assignee: Jun Gong
>
> I'd like to add a field(retry-times) in ContainerLaunchContext. When AM 
> launches containers, it could specify the value. Then NM will re-launch the 
> container 'retry-times' times when it fails to run(e.g.exit code is not 0). 
> It will save a lot of time. It avoids container localization. RM does not 
> need to re-schedule the container. And local files in container's working 
> directory will be left for re-use.(If container have downloaded some big 
> files, it does not need to re-download them when running again.) 
> We find it is useful in systems like Storm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4340) Add "list" API to reservation system

2015-12-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051043#comment-15051043
 ] 

Hadoop QA commented on YARN-4340:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 10 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
7s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 2s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
2s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 9s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 
16s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 24s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 53s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
52s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 51s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red} 19m 34s {color} | 
{color:red} root-jdk1.8.0_66 with JDK v1.8.0_66 generated 1 new issues (was 13, 
now 13). {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 8m 51s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 51s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 9m 24s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 28m 59s 
{color} | {color:red} root-jdk1.7.0_91 with JDK v1.7.0_91 generated 1 new 
issues (was 729, now 729). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 24s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 0s 
{color} | {color:red} Patch generated 1 new checkstyle issues in root (total 
was 353, now 352). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 8s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 
29s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 43s 
{color} | {color:red} hadoop-yarn-api in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 24s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 15s 
{color} | {color:red} hadoop-yarn-client in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 8m 48s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-jdk1.7.0_91 with JDK v1.7.0_91 
generated 3 new issues (was 0, now 3). {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 8m 48s 
{color} | {color:red}

[jira] [Commented] (YARN-3876) get_executable() assumes everything is Linux

2015-12-10 Thread Alan Burlison (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051038#comment-15051038
 ] 

Alan Burlison commented on YARN-3876:
-

In addition {{get_executable()}} returns a buffer that's never freed by its 
caller, so there is a memory leak,

> get_executable() assumes everything is Linux
> 
>
> Key: YARN-3876
> URL: https://issues.apache.org/jira/browse/YARN-3876
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: 2.7.0
>Reporter: Alan Burlison
>
> get_executable() in container-executor.c is non-portable and is hard-coded to 
> assume Linux's /proc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4422) Generic AHS sometimes doesn't show started, node, or logs on App page

2015-12-10 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051061#comment-15051061
 ] 

Eric Payne commented on YARN-4422:
--

bq. Thanks! Will this fix address MAPREDUCE-5502 or MAPREDUCE-4428? It doesn't 
seem so, but would like to confirm.

[~mingma], thanks for your interest. No, this JIRA does not fix the issue 
documented in MAPREDUCE-5502 or MAPREDUCE-4428. This JIRA only affects the 
Generic application history server's GUI and not the RM Application GUI. Also, 
as documented in those JIRAs, the problem is not a missing link in the GUI, but 
that the log history is missing altogether.

> Generic AHS sometimes doesn't show started, node, or logs on App page
> -
>
> Key: YARN-4422
> URL: https://issues.apache.org/jira/browse/YARN-4422
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 3.0.0, 2.8.0, 2.7.3
>
> Attachments: AppAttemptPage no container or node.jpg, AppPage no logs 
> or node.jpg, YARN-4422.001.patch
>
>
> Sometimes the AM container for an app isn't able to start the JVM. This can 
> happen if bogus JVM options are given to the AM container ( 
> {{-Dyarn.app.mapreduce.am.command-opts=-InvalidJvmOption}}) or when 
> misconfiguring the AM container's environment variables 
> ({{-Dyarn.app.mapreduce.am.env="JAVA_HOME=/foo/bar/baz}})
> When the AM container for an app isn't able to start the JVM, the Application 
> page for that application shows {{N/A}} for the {{Started}}, {{Node}}, and 
> {{Logs}} columns. It _does_ have links for each app attempt, and if you click 
> on one of them, you go to the Application Attempt page, where you can see all 
> containers with links to their logs and nodes, including the AM container. 
> But none of that shows up for the app attempts on the Application page.
> Also, on the Application Attempt page, in the {{Application Attempt 
> Overview}} section, the {{AM Container}} value is {{null}} and the {{Node}} 
> value is {{N/A}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3998) Add retry-times to let NM re-launch container when it fails to run

2015-12-10 Thread Varun Vasudev (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051024#comment-15051024
 ] 

Varun Vasudev commented on YARN-3998:
-

[~hex108] - do you wish to work on this? Do you have a patch that adds support 
for this?

> Add retry-times to let NM re-launch container when it fails to run
> --
>
> Key: YARN-3998
> URL: https://issues.apache.org/jira/browse/YARN-3998
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Jun Gong
>Assignee: Jun Gong
>
> I'd like to add a field(retry-times) in ContainerLaunchContext. When AM 
> launches containers, it could specify the value. Then NM will re-launch the 
> container 'retry-times' times when it fails to run(e.g.exit code is not 0). 
> It will save a lot of time. It avoids container localization. RM does not 
> need to re-schedule the container. And local files in container's working 
> directory will be left for re-use.(If container have downloaded some big 
> files, it does not need to re-download them when running again.) 
> We find it is useful in systems like Storm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4440) FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time

2015-12-10 Thread Lin Yiqun (JIRA)

Lin Yiqun created YARN-4440:
---

 Summary: FSAppAttempt#getAllowedLocalityLevelByTime should init 
the lastScheduler time
 Key: YARN-4440
 URL: https://issues.apache.org/jira/browse/YARN-4440
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.1
Reporter: Lin Yiqun
Assignee: Lin Yiqun


It seems there is a bug on {{FSAppAttempt#getAllowedLocalityLevelByTime}} method
{code}
// default level is NODE_LOCAL
if (! allowedLocalityLevel.containsKey(priority)) {
  allowedLocalityLevel.put(priority, NodeType.NODE_LOCAL);
  return NodeType.NODE_LOCAL;
}
{code}
If you first invoke this method, it doesn't init  time in 
lastScheduledContainer and this will lead to execute these code for next 
invokation:
{code}
// check waiting time
long waitTime = currentTimeMs;
if (lastScheduledContainer.containsKey(priority)) {
  waitTime -= lastScheduledContainer.get(priority);
} else {
  waitTime -= getStartTime();
}
{code}
the waitTime will subtract to FsApp startTime, and this will be easily more 
than the delay time and allowedLocality degrade. Because FsApp startTime will 
be start earlier than currentTimeMs. So we should add the initial time of 
priority to prevent comparing with FsApp startTime and allowedLocalityLevel 
degrade. And this problem will have more negative influence for small-jobs. The 
YARN-4399 also discuss some problem in aspect of locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4440) FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time

2015-12-10 Thread Lin Yiqun (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4440:

Attachment: YARN-4440.001.patch

> FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time
> -
>
> Key: YARN-4440
> URL: https://issues.apache.org/jira/browse/YARN-4440
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4440.001.patch
>
>
> It seems there is a bug on {{FSAppAttempt#getAllowedLocalityLevelByTime}} 
> method
> {code}
> // default level is NODE_LOCAL
> if (! allowedLocalityLevel.containsKey(priority)) {
>   allowedLocalityLevel.put(priority, NodeType.NODE_LOCAL);
>   return NodeType.NODE_LOCAL;
> }
> {code}
> If you first invoke this method, it doesn't init  time in 
> lastScheduledContainer and this will lead to execute these code for next 
> invokation:
> {code}
> // check waiting time
> long waitTime = currentTimeMs;
> if (lastScheduledContainer.containsKey(priority)) {
>   waitTime -= lastScheduledContainer.get(priority);
> } else {
>   waitTime -= getStartTime();
> }
> {code}
> the waitTime will subtract to FsApp startTime, and this will be easily more 
> than the delay time and allowedLocality degrade. Because FsApp startTime will 
> be start earlier than currentTimeMs. So we should add the initial time of 
> priority to prevent comparing with FsApp startTime and allowedLocalityLevel 
> degrade. And this problem will have more negative influence for small-jobs. 
> The YARN-4399 also discuss some problem in aspect of locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4440) FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time

2015-12-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051072#comment-15051072
 ] 

Hadoop QA commented on YARN-4440:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
1s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
14s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 22s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 31s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 29s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 29s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 31s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 31s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 32s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 31s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 24s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 27s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 31s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
22s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 18m 32s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12776819/YARN-4440.001.patch |
| JIRA Issue | YARN-4440 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux

[jira] [Assigned] (YARN-4441) Kill application request from the webservice(ui) is showing success even for the finished applications

2015-12-10 Thread Sunil G (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reassigned YARN-4441:
-

Assignee: Sunil G  (was: Mohammad Shahid Khan)

> Kill application request from the webservice(ui) is showing success even for 
> the finished applications
> --
>
> Key: YARN-4441
> URL: https://issues.apache.org/jira/browse/YARN-4441
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Mohammad Shahid Khan
>Assignee: Sunil G
>
> If the application is already finished ie either failled, killed, or succeded
> the kill operation should not be logged as success. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4442) create new application operation from the Rest API should be logged in the audit log

2015-12-10 Thread Mohammad Shahid Khan (JIRA)

Mohammad Shahid Khan created YARN-4442:
--

 Summary: create new application operation from the Rest API should 
be logged in the audit log
 Key: YARN-4442
 URL: https://issues.apache.org/jira/browse/YARN-4442
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 3.0.0
Reporter: Mohammad Shahid Khan
Assignee: Mohammad Shahid Khan


create new application operation from the Rest API 
("/ws/v1/cluster/apps/new-application")  should be logged in 
the audit log



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4441) Kill application request from the webservice(ui) is showing success even for the finished applications

2015-12-10 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051117#comment-15051117
 ] 

Sunil G commented on YARN-4441:
---

Hi [~mohdshahidkhan]
This is intentional. If application state is Final State stored, we will not 
return any error as its going to finished/completed immediately.

I am not really sure this is what you were also expecting. Kindly share details 
otherwise.

> Kill application request from the webservice(ui) is showing success even for 
> the finished applications
> --
>
> Key: YARN-4441
> URL: https://issues.apache.org/jira/browse/YARN-4441
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Mohammad Shahid Khan
>Assignee: Mohammad Shahid Khan
>
> If the application is already finished ie either failled, killed, or succeded
> the kill operation should not be logged as success. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3102) Decommisioned Nodes not listed in Web UI

2015-12-10 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051121#comment-15051121
 ] 

Kuhu Shukla commented on YARN-3102:
---

Following the discussion from YARN-4402:

Given that we consider exclude list as canonical truth of decomm-ed nodes, 
which means during serviceInit, {{setDecomissionedNMsMetrics}} call is kept as 
is, the only way to have these nodes be part of inactiveNodes map, which today 
gets reinitialized to a new empty concurrent map in {{RMActiveServiceContext}} 
during startup, is to have node hostnames/ips read in from exclude list and 
added to this map even though we lose the port information. This is because the 
node would ideally not have the NM process running and we don't keep that state 
across RM restarts. What that means is, we add a (NodeId,RMNode) entry where 
the hostname is legal but the ports are a defined invalid value like -1. This 
allows us to track the nodes that were decommissioned in the previous life 
cycle of the RM. We can also tweak the GUI to display N/A when the port is -1. 
Since the check of {{isValidNode}} is only on the basis of hostname/ip , this 
does not affect the rejoining behavior of the node. Requesting [~eepayne] for 
comments and ideas. 

> Decommisioned Nodes not listed in Web UI
> 
>
> Key: YARN-3102
> URL: https://issues.apache.org/jira/browse/YARN-3102
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
> Environment: 2 Node Manager and 1 Resource Manager 
>Reporter: Bibin A Chundatt
>Assignee: Kuhu Shukla
>Priority: Minor
>
> Configure yarn.resourcemanager.nodes.exclude-path in yarn-site.xml to 
> yarn.exlude file In RM1 machine
> Add Yarn.exclude with NM1 Host Name 
> Start the node as listed below NM1,NM2 Resource manager
> Now check Nodes decommisioned in /cluster/nodes
> Number of decommisioned node is listed as 1 but Table is empty in 
> /cluster/nodes/decommissioned (detail of Decommision node not shown)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4441) Kill application request from the webservice(ui) is showing success even for the finished applications

2015-12-10 Thread Mohammad Shahid Khan (JIRA)

Mohammad Shahid Khan created YARN-4441:
--

 Summary: Kill application request from the webservice(ui) is 
showing success even for the finished applications
 Key: YARN-4441
 URL: https://issues.apache.org/jira/browse/YARN-4441
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Mohammad Shahid Khan
Assignee: Mohammad Shahid Khan



If the application is already finished ie either failled, killed, or succeded
the kill operation should not be logged as success. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4399) FairScheduler allocated container should resetSchedulingOpportunities count of its priority

2015-12-10 Thread Lin Yiqun (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4399:

Attachment: YARN-4399.002.patch

Resolve the comile error.

> FairScheduler allocated container should resetSchedulingOpportunities count 
> of its priority
> ---
>
> Key: YARN-4399
> URL: https://issues.apache.org/jira/browse/YARN-4399
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4399.001.patch, YARN-4399.002.patch
>
>
> There is a bug on fairScheduler allocating containers when you configurate 
> the locality configs.When you attempt to assigned a container,it will invoke 
> {{FSAppAttempt#addSchedulingOpportunity}} whenever it can be assigned 
> successfully or not. And if you configurate the 
> yarn.scheduler.fair.locality.threshold.node and 
> yarn.scheduler.fair.locality.threshold.rack, the schedulingOpportunity value 
> will influence the locality of containers.Because if one container is 
> assigned successfully and its  priority schedulingOpportunity count will be 
> increased, and second container will be increased again.This will may be let 
> their priority of allowedLocality degrade. And this will let this container 
> dealt by rackRequest. So I think in fairScheduler allocating container, if 
> the previous container was dealt, its priority of schedulerCount should be 
> reset to 0, and don't let its value influence container's allocating in next 
> iteration and this will increased the locality of containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4441) Kill application request from the webservice(ui) is showing success even for the finished applications

2015-12-10 Thread Sunil G (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4441:
--
Assignee: Mohammad Shahid Khan  (was: Sunil G)

> Kill application request from the webservice(ui) is showing success even for 
> the finished applications
> --
>
> Key: YARN-4441
> URL: https://issues.apache.org/jira/browse/YARN-4441
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Mohammad Shahid Khan
>Assignee: Mohammad Shahid Khan
>
> If the application is already finished ie either failled, killed, or succeded
> the kill operation should not be logged as success. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3876) get_executable() assumes everything is Linux

2015-12-10 Thread Alan Burlison (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Burlison reassigned YARN-3876:
---

Assignee: Alan Burlison

> get_executable() assumes everything is Linux
> 
>
> Key: YARN-3876
> URL: https://issues.apache.org/jira/browse/YARN-3876
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: 2.7.0
>Reporter: Alan Burlison
>Assignee: Alan Burlison
>
> get_executable() in container-executor.c is non-portable and is hard-coded to 
> assume Linux's /proc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4399) FairScheduler allocated container should resetSchedulingOpportunities count of its priority

2015-12-10 Thread Lin Yiqun (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4399:

Attachment: YARN-4399.003.patch

Resolve the remained compile error

> FairScheduler allocated container should resetSchedulingOpportunities count 
> of its priority
> ---
>
> Key: YARN-4399
> URL: https://issues.apache.org/jira/browse/YARN-4399
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4399.001.patch, YARN-4399.002.patch, 
> YARN-4399.003.patch
>
>
> There is a bug on fairScheduler allocating containers when you configurate 
> the locality configs.When you attempt to assigned a container,it will invoke 
> {{FSAppAttempt#addSchedulingOpportunity}} whenever it can be assigned 
> successfully or not. And if you configurate the 
> yarn.scheduler.fair.locality.threshold.node and 
> yarn.scheduler.fair.locality.threshold.rack, the schedulingOpportunity value 
> will influence the locality of containers.Because if one container is 
> assigned successfully and its  priority schedulingOpportunity count will be 
> increased, and second container will be increased again.This will may be let 
> their priority of allowedLocality degrade. And this will let this container 
> dealt by rackRequest. So I think in fairScheduler allocating container, if 
> the previous container was dealt, its priority of schedulerCount should be 
> reset to 0, and don't let its value influence container's allocating in next 
> iteration and this will increased the locality of containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4100) Add Documentation for Distributed Node Labels feature

2015-12-10 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051180#comment-15051180
 ] 

Naganarasimha G R commented on YARN-4100:
-

Hi [~dian.fu], [~devaraj.k], [~wangda], & [~rohithsharma],
Can one/all of you guys take a look at the documentation update in this jira, 
it would be better to go as part of 2.8.0 as the features are already checked 
in.

> Add Documentation for Distributed Node Labels feature
> -
>
> Key: YARN-4100
> URL: https://issues.apache.org/jira/browse/YARN-4100
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: NodeLabel.html, YARN-4100.v1.001.patch
>
>
> Add Documentation for Distributed Node Labels feature



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4100) Add Documentation for Distributed and Delegated-Centralized Node Labels feature

2015-12-10 Thread Naganarasimha G R (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4100:

Summary: Add Documentation for Distributed and Delegated-Centralized Node 
Labels feature  (was: Add Documentation for Distributed Node Labels feature)

> Add Documentation for Distributed and Delegated-Centralized Node Labels 
> feature
> ---
>
> Key: YARN-4100
> URL: https://issues.apache.org/jira/browse/YARN-4100
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: NodeLabel.html, YARN-4100.v1.001.patch
>
>
> Add Documentation for Distributed Node Labels feature



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3226) UI changes for decommissioning node

2015-12-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051186#comment-15051186
 ] 

Hadoop QA commented on YARN-3226:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
53s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
12s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 26s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
34s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s 
{color} | {color:red} Patch generated 3 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 105, now 106). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
23s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 24s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 28s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 16s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
25s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 149m 22s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL |

92 matches

Mail list logo