[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging

2015-03-05 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14350035#comment-14350035
 ] 

Rohith commented on YARN-3273:
--

There are 4 metrics are displayed.
# Application current headroom : this is the headroom being sent to application 
master in heartbeat. Once the application is over, this value displayed as 0.
# Used Application Master Resource : fetched from queueUsage. If the 1st 
application AM itself has greater than Max AM Resources then the application is 
submitted with higher resource only. This is current behaviour. Because of this 
if user is submitted with higher resource(1st app) then the User AM Resource 
display higher value than Max AM resource. 
# Used Application Master Resource per user : fetched from UserInfo. Display 
list of active users used resources per queue level
# Max Resources Per User : This is calculated with the calculation below.
{code}
 public synchronized Resource getUserResourceLimit() {
return Resources.multiplyAndNormalizeUp(resourceCalculator,
absoluteCapacityResource, (userLimit / 100.0f) * userLimitFactor,
minimumAllocation);
  }
{code}

Kindly review the patch

> Improve web UI to facilitate scheduling analysis and debugging
> --
>
> Key: YARN-3273
> URL: https://issues.apache.org/jira/browse/YARN-3273
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Rohith
> Attachments: 0001-YARN-3273-v1.patch, 
> YARN-3273-am-resource-used-AND-User-limit.PNG, 
> YARN-3273-application-headroom.PNG
>
>
> Job may be stuck for reasons such as:
> - hitting queue capacity 
> - hitting user-limit, 
> - hitting AM-resource-percentage 
> The  first queueCapacity is already shown on the UI.
> We may surface things like:
> - what is user's current usage and user-limit; 
> - what is the AM resource usage and limit;
> - what is the application's current HeadRoom;
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging

2015-03-05 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14350027#comment-14350027
 ] 

Rohith commented on YARN-3273:
--

Attached screenshot for the WebUi changes. And version-1 patch attached too.

> Improve web UI to facilitate scheduling analysis and debugging
> --
>
> Key: YARN-3273
> URL: https://issues.apache.org/jira/browse/YARN-3273
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Rohith
> Attachments: 0001-YARN-3273-v1.patch, 
> YARN-3273-am-resource-used-AND-User-limit.PNG, 
> YARN-3273-application-headroom.PNG
>
>
> Job may be stuck for reasons such as:
> - hitting queue capacity 
> - hitting user-limit, 
> - hitting AM-resource-percentage 
> The  first queueCapacity is already shown on the UI.
> We may surface things like:
> - what is user's current usage and user-limit; 
> - what is the AM resource usage and limit;
> - what is the application's current HeadRoom;
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging

2015-03-05 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-3273:
-
Attachment: YARN-3273-am-resource-used-AND-User-limit.PNG

> Improve web UI to facilitate scheduling analysis and debugging
> --
>
> Key: YARN-3273
> URL: https://issues.apache.org/jira/browse/YARN-3273
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Rohith
> Attachments: 0001-YARN-3273-v1.patch, 
> YARN-3273-am-resource-used-AND-User-limit.PNG, 
> YARN-3273-application-headroom.PNG
>
>
> Job may be stuck for reasons such as:
> - hitting queue capacity 
> - hitting user-limit, 
> - hitting AM-resource-percentage 
> The  first queueCapacity is already shown on the UI.
> We may surface things like:
> - what is user's current usage and user-limit; 
> - what is the AM resource usage and limit;
> - what is the application's current HeadRoom;
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging

2015-03-05 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-3273:
-
Attachment: 0001-YARN-3273-v1.patch

> Improve web UI to facilitate scheduling analysis and debugging
> --
>
> Key: YARN-3273
> URL: https://issues.apache.org/jira/browse/YARN-3273
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Rohith
> Attachments: 0001-YARN-3273-v1.patch, 
> YARN-3273-application-headroom.PNG
>
>
> Job may be stuck for reasons such as:
> - hitting queue capacity 
> - hitting user-limit, 
> - hitting AM-resource-percentage 
> The  first queueCapacity is already shown on the UI.
> We may surface things like:
> - what is user's current usage and user-limit; 
> - what is the AM resource usage and limit;
> - what is the application's current HeadRoom;
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging

2015-03-05 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-3273:
-
Attachment: YARN-3273-application-headroom.PNG

> Improve web UI to facilitate scheduling analysis and debugging
> --
>
> Key: YARN-3273
> URL: https://issues.apache.org/jira/browse/YARN-3273
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Rohith
> Attachments: YARN-3273-application-headroom.PNG
>
>
> Job may be stuck for reasons such as:
> - hitting queue capacity 
> - hitting user-limit, 
> - hitting AM-resource-percentage 
> The  first queueCapacity is already shown on the UI.
> We may surface things like:
> - what is user's current usage and user-limit; 
> - what is the AM resource usage and limit;
> - what is the application's current HeadRoom;
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-05 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14350023#comment-14350023
 ] 

Jian He commented on YARN-3021:
---

bq.  it seems that "mapreduce.job.hdfs-servers.token-renewal.exclude" is still 
going to be a user-facing API only used for short-term
Yes,  it is a MR land config. The difference is that YARN won't need to expose 
an API used for MR only. After all, we are now giving a temp solution for MR 
itself, right ?

> YARN's delegation-token handling disallows certain trust setups to operate 
> properly over DistCp
> ---
>
> Key: YARN-3021
> URL: https://issues.apache.org/jira/browse/YARN-3021
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.3.0
>Reporter: Harsh J
> Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
> YARN-3021.003.patch, YARN-3021.patch
>
>
> Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
> and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
> clusters.
> Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
> needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
> as it attempts a renewDelegationToken(…) synchronously during application 
> submission (to validate the managed token before it adds it to a scheduler 
> for automatic renewal). The call obviously fails cause B realm will not trust 
> A's credentials (here, the RM's principal is the renewer).
> In the 1.x JobTracker the same call is present, but it is done asynchronously 
> and once the renewal attempt failed we simply ceased to schedule any further 
> attempts of renewals, rather than fail the job immediately.
> We should change the logic such that we attempt the renewal but go easy on 
> the failure and skip the scheduling alone, rather than bubble back an error 
> to the client, failing the app submission. This way the old behaviour is 
> retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3111) Fix ratio problem on FairScheduler page

2015-03-05 Thread Ashwin Shankar (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14350017#comment-14350017
 ] 

Ashwin Shankar commented on YARN-3111:
--

[~ka...@cloudera.com],
Yes when cluster capacity is 0, it would be nice to show the ratio as 0.
Yes it makes sense to show both resources as percentages for the shares, but I 
can't think of a good way of displaying it on the bars.
Did you have anything in mind ?

> Fix ratio problem on FairScheduler page
> ---
>
> Key: YARN-3111
> URL: https://issues.apache.org/jira/browse/YARN-3111
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Assignee: Peng Zhang
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-3111.1.patch, YARN-3111.png
>
>
> Found 3 problems on FairScheduler page:
> 1. Only compute memory for ratio even when queue schedulingPolicy is DRF.
> 2. When min resources is configured larger than real resources, the steady 
> fair share ratio is so long that it is out the page.
> 3. When cluster resources is 0(no nodemanager start), ratio is displayed as 
> "NaN% used"
> Attached image shows the snapshot of above problems. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-05 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14350013#comment-14350013
 ] 

Yongjun Zhang commented on YARN-3021:
-

Hi [~jianhe],

{quote}
My only concern is not to add a user-facing API only used for short-term.
{quote}

I thought about it a bit more, it seems that  
"mapreduce.job.hdfs-servers.token-renewal.exclude" is still going to be a 
user-facing API only used for short-term, because when we introduce external 
renewer, the tokens need to be assigned to the renewer, after all, we want the 
tokens to be renewed. Right? Or there are use cases that we really don't want 
to renew?

That said, I think the solution would solve our current problem.

Thanks.
 

> YARN's delegation-token handling disallows certain trust setups to operate 
> properly over DistCp
> ---
>
> Key: YARN-3021
> URL: https://issues.apache.org/jira/browse/YARN-3021
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.3.0
>Reporter: Harsh J
> Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
> YARN-3021.003.patch, YARN-3021.patch
>
>
> Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
> and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
> clusters.
> Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
> needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
> as it attempts a renewDelegationToken(…) synchronously during application 
> submission (to validate the managed token before it adds it to a scheduler 
> for automatic renewal). The call obviously fails cause B realm will not trust 
> A's credentials (here, the RM's principal is the renewer).
> In the 1.x JobTracker the same call is present, but it is done asynchronously 
> and once the renewal attempt failed we simply ceased to schedule any further 
> attempts of renewals, rather than fail the job immediately.
> We should change the logic such that we attempt the renewal but go easy on 
> the failure and skip the scheduling alone, rather than bubble back an error 
> to the client, failing the app submission. This way the old behaviour is 
> retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1809) Synchronize RM and Generic History Service Web-UIs

2015-03-05 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14350005#comment-14350005
 ] 

Tsuyoshi Ozawa commented on YARN-1809:
--

Great job, Xuan, Zhijie, and Jian!

> Synchronize RM and Generic History Service Web-UIs
> --
>
> Key: YARN-1809
> URL: https://issues.apache.org/jira/browse/YARN-1809
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Fix For: 2.7.0
>
> Attachments: YARN-1809.1.patch, YARN-1809.10.patch, 
> YARN-1809.11.patch, YARN-1809.12.patch, YARN-1809.13.patch, 
> YARN-1809.14.patch, YARN-1809.15-rebase.patch, YARN-1809.15.patch, 
> YARN-1809.16.patch, YARN-1809.17.patch, YARN-1809.17.rebase.patch, 
> YARN-1809.17.rebase.patch, YARN-1809.2.patch, YARN-1809.3.patch, 
> YARN-1809.4.patch, YARN-1809.5.patch, YARN-1809.5.patch, YARN-1809.6.patch, 
> YARN-1809.7.patch, YARN-1809.8.patch, YARN-1809.9.patch
>
>
> After YARN-953, the web-UI of generic history service is provide more 
> information than that of RM, the details about app attempt and container. 
> It's good to provide similar web-UIs, but retrieve the data from separate 
> source, i.e., RM cache and history store respectively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3300) outstanding_resource_requests table should not be shown in AHS

2015-03-05 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349998#comment-14349998
 ] 

Xuan Gong commented on YARN-3300:
-

I think we should move locality_table from app page to attempt page. Will fix 
this issue here, too

> outstanding_resource_requests table should not be shown in AHS
> --
>
> Key: YARN-3300
> URL: https://issues.apache.org/jira/browse/YARN-3300
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1809) Synchronize RM and Generic History Service Web-UIs

2015-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349996#comment-14349996
 ] 

Hudson commented on YARN-1809:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #7270 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7270/])
YARN-1809. Synchronize RM and TimeLineServer Web-UIs. Contributed by Zhijie 
Shen and Xuan Gong (jianhe: rev 95bfd087dc89e57a93340604cc8b96042fa1a05a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/ContainerBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppsBlockWithMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationHistoryProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryClientService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/ContainerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebAppFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RmController.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/YarnWebParams.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/DefaultSchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppAttemptPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSView.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryClientService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestAppPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/ResponseInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/

[jira] [Created] (YARN-3301) Fix the format issue of the new RM web UI and AHS web UI

2015-03-05 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-3301:
---

 Summary: Fix the format issue of the new RM web UI and AHS web UI
 Key: YARN-3301
 URL: https://issues.apache.org/jira/browse/YARN-3301
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3300) outstanding_resource_requests table should not be shown in AHS

2015-03-05 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-3300:
---

 Summary: outstanding_resource_requests table should not be shown 
in AHS
 Key: YARN-3300
 URL: https://issues.apache.org/jira/browse/YARN-3300
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1809) Synchronize RM and Generic History Service Web-UIs

2015-03-05 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1809:

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-3299

> Synchronize RM and Generic History Service Web-UIs
> --
>
> Key: YARN-1809
> URL: https://issues.apache.org/jira/browse/YARN-1809
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Fix For: 2.7.0
>
> Attachments: YARN-1809.1.patch, YARN-1809.10.patch, 
> YARN-1809.11.patch, YARN-1809.12.patch, YARN-1809.13.patch, 
> YARN-1809.14.patch, YARN-1809.15-rebase.patch, YARN-1809.15.patch, 
> YARN-1809.16.patch, YARN-1809.17.patch, YARN-1809.17.rebase.patch, 
> YARN-1809.17.rebase.patch, YARN-1809.2.patch, YARN-1809.3.patch, 
> YARN-1809.4.patch, YARN-1809.5.patch, YARN-1809.5.patch, YARN-1809.6.patch, 
> YARN-1809.7.patch, YARN-1809.8.patch, YARN-1809.9.patch
>
>
> After YARN-953, the web-UI of generic history service is provide more 
> information than that of RM, the details about app attempt and container. 
> It's good to provide similar web-UIs, but retrieve the data from separate 
> source, i.e., RM cache and history store respectively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3284) Expose more ApplicationMetrics and ApplicationAttemptMetrics through YARN command

2015-03-05 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3284:

Issue Type: Sub-task  (was: Bug)
Parent: YARN-3299

> Expose more ApplicationMetrics and ApplicationAttemptMetrics through YARN 
> command
> -
>
> Key: YARN-3284
> URL: https://issues.apache.org/jira/browse/YARN-3284
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>
> Current, we have some extra metrics about the application and current attempt 
> in RM Web UI. We should expose that information through YARN Command, too.
> 1. Preemption metrics
> 2. application outstanding resource requests
> 3. container locality info



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1884) ContainerReport should have nodeHttpAddress

2015-03-05 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1884:

Issue Type: Sub-task  (was: Bug)
Parent: YARN-3299

> ContainerReport should have nodeHttpAddress
> ---
>
> Key: YARN-1884
> URL: https://issues.apache.org/jira/browse/YARN-1884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> In web UI, we're going to show the node, which used to be to link to the NM 
> web page. However, on AHS web UI, and RM web UI after YARN-1809, the node 
> field has to be set to nodeID where the container is allocated. We need to 
> add nodeHttpAddress to the containerReport to link users to NM web page



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3299) Synchronize RM and Generic History Service Web-UIs

2015-03-05 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-3299:
---

 Summary: Synchronize RM and Generic History Service Web-UIs
 Key: YARN-3299
 URL: https://issues.apache.org/jira/browse/YARN-3299
 Project: Hadoop YARN
  Issue Type: Task
  Components: resourcemanager, webapp, yarn
Reporter: Xuan Gong
Assignee: Xuan Gong


After YARN-1809, we are using the same protocol to fetch the information and 
display in their webUI. RM webUI will use ApplicationClientProtocol, and 
Generic History Service web ui will use ApplicationHistoryProtocol. Both of 
them extend the same protocol. 
Also, we have common appblock/attemptblock/containerblock shared by both RM 
webUI and ATS webUI.

But we are still missing some information, such as outstanding resource 
requests, preemption metrics, etc.

This ticket will be used as parent ticket to track all the remaining issues for 
RM webUI and ATS webUI.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1809) Synchronize RM and Generic History Service Web-UIs

2015-03-05 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349962#comment-14349962
 ] 

Jian He commented on YARN-1809:
---

RM and AHS web UI can not be completely in sync, because some information like 
the outstanding resource requests table only makes sense in RM.  We'll fix 
remaining issues in follow-up jiras.

committing this. 

> Synchronize RM and Generic History Service Web-UIs
> --
>
> Key: YARN-1809
> URL: https://issues.apache.org/jira/browse/YARN-1809
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Attachments: YARN-1809.1.patch, YARN-1809.10.patch, 
> YARN-1809.11.patch, YARN-1809.12.patch, YARN-1809.13.patch, 
> YARN-1809.14.patch, YARN-1809.15-rebase.patch, YARN-1809.15.patch, 
> YARN-1809.16.patch, YARN-1809.17.patch, YARN-1809.17.rebase.patch, 
> YARN-1809.17.rebase.patch, YARN-1809.2.patch, YARN-1809.3.patch, 
> YARN-1809.4.patch, YARN-1809.5.patch, YARN-1809.5.patch, YARN-1809.6.patch, 
> YARN-1809.7.patch, YARN-1809.8.patch, YARN-1809.9.patch
>
>
> After YARN-953, the web-UI of generic history service is provide more 
> information than that of RM, the details about app attempt and container. 
> It's good to provide similar web-UIs, but retrieve the data from separate 
> source, i.e., RM cache and history store respectively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2190) Add CPU and memory limit options to the default container executor for Windows containers

2015-03-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349880#comment-14349880
 ] 

Hadoop QA commented on YARN-2190:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702956/YARN-2190.13.patch
  against trunk revision 952640f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6878//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6878//console

This message is automatically generated.

> Add CPU and memory limit options to the default container executor for 
> Windows containers
> -
>
> Key: YARN-2190
> URL: https://issues.apache.org/jira/browse/YARN-2190
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Chuan Liu
>Assignee: Chuan Liu
> Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, 
> YARN-2190.10.patch, YARN-2190.11.patch, YARN-2190.12.patch, 
> YARN-2190.13.patch, YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, 
> YARN-2190.5.patch, YARN-2190.6.patch, YARN-2190.7.patch, YARN-2190.8.patch, 
> YARN-2190.9.patch
>
>
> Yarn default container executor on Windows does not set the resource limit on 
> the containers currently. The memory limit is enforced by a separate 
> monitoring thread. The container implementation on Windows uses Job Object 
> right now. The latest Windows (8 or later) API allows CPU and memory limits 
> on the job objects. We want to add the new options to the executor that can 
> set the limits on job objects thus provides resource enforcement at OS level.
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2190) Add CPU and memory limit options to the default container executor for Windows containers

2015-03-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349845#comment-14349845
 ] 

Hadoop QA commented on YARN-2190:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702948/YARN-2190.12.patch
  against trunk revision 952640f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6877//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6877//console

This message is automatically generated.

> Add CPU and memory limit options to the default container executor for 
> Windows containers
> -
>
> Key: YARN-2190
> URL: https://issues.apache.org/jira/browse/YARN-2190
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Chuan Liu
>Assignee: Chuan Liu
> Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, 
> YARN-2190.10.patch, YARN-2190.11.patch, YARN-2190.12.patch, 
> YARN-2190.13.patch, YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, 
> YARN-2190.5.patch, YARN-2190.6.patch, YARN-2190.7.patch, YARN-2190.8.patch, 
> YARN-2190.9.patch
>
>
> Yarn default container executor on Windows does not set the resource limit on 
> the containers currently. The memory limit is enforced by a separate 
> monitoring thread. The container implementation on Windows uses Job Object 
> right now. The latest Windows (8 or later) API allows CPU and memory limits 
> on the job objects. We want to add the new options to the executor that can 
> set the limits on job objects thus provides resource enforcement at OS level.
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2190) Add CPU and memory limit options to the default container executor for Windows containers

2015-03-05 Thread Chuan Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chuan Liu updated YARN-2190:

Attachment: YARN-2190.13.patch

Upload a new patch that updates "BUILDING.txt".

> Add CPU and memory limit options to the default container executor for 
> Windows containers
> -
>
> Key: YARN-2190
> URL: https://issues.apache.org/jira/browse/YARN-2190
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Chuan Liu
>Assignee: Chuan Liu
> Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, 
> YARN-2190.10.patch, YARN-2190.11.patch, YARN-2190.12.patch, 
> YARN-2190.13.patch, YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, 
> YARN-2190.5.patch, YARN-2190.6.patch, YARN-2190.7.patch, YARN-2190.8.patch, 
> YARN-2190.9.patch
>
>
> Yarn default container executor on Windows does not set the resource limit on 
> the containers currently. The memory limit is enforced by a separate 
> monitoring thread. The container implementation on Windows uses Job Object 
> right now. The latest Windows (8 or later) API allows CPU and memory limits 
> on the job objects. We want to add the new options to the executor that can 
> set the limits on job objects thus provides resource enforcement at OS level.
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2190) Add CPU and memory limit options to the default container executor for Windows containers

2015-03-05 Thread Chuan Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chuan Liu updated YARN-2190:

Description: 
Yarn default container executor on Windows does not set the resource limit on 
the containers currently. The memory limit is enforced by a separate monitoring 
thread. The container implementation on Windows uses Job Object right now. The 
latest Windows (8 or later) API allows CPU and memory limits on the job 
objects. We want to add the new options to the executor that can set the limits 
on job objects thus provides resource enforcement at OS level.

http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx

  was:
Yarn default container executor on Windows does not set the resource limit on 
the containers currently. The memory limit is enforced by a separate monitoring 
thread. The container implementation on Windows uses Job Object right now. The 
latest Windows (8 or later) API allows CPU and memory limits on the job 
objects. We want to create a Windows container executor that sets the limits on 
job objects thus provides resource enforcement at OS level.

http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx


> Add CPU and memory limit options to the default container executor for 
> Windows containers
> -
>
> Key: YARN-2190
> URL: https://issues.apache.org/jira/browse/YARN-2190
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Chuan Liu
>Assignee: Chuan Liu
> Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, 
> YARN-2190.10.patch, YARN-2190.11.patch, YARN-2190.12.patch, 
> YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, YARN-2190.5.patch, 
> YARN-2190.6.patch, YARN-2190.7.patch, YARN-2190.8.patch, YARN-2190.9.patch
>
>
> Yarn default container executor on Windows does not set the resource limit on 
> the containers currently. The memory limit is enforced by a separate 
> monitoring thread. The container implementation on Windows uses Job Object 
> right now. The latest Windows (8 or later) API allows CPU and memory limits 
> on the job objects. We want to add the new options to the executor that can 
> set the limits on job objects thus provides resource enforcement at OS level.
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2190) Add CPU and memory limit options to the default container executor for Windows containers

2015-03-05 Thread Chuan Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chuan Liu updated YARN-2190:

Summary: Add CPU and memory limit options to the default container executor 
for Windows containers  (was: Provide a Windows container executor that can 
limit memory and CPU)

> Add CPU and memory limit options to the default container executor for 
> Windows containers
> -
>
> Key: YARN-2190
> URL: https://issues.apache.org/jira/browse/YARN-2190
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Chuan Liu
>Assignee: Chuan Liu
> Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, 
> YARN-2190.10.patch, YARN-2190.11.patch, YARN-2190.12.patch, 
> YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, YARN-2190.5.patch, 
> YARN-2190.6.patch, YARN-2190.7.patch, YARN-2190.8.patch, YARN-2190.9.patch
>
>
> Yarn default container executor on Windows does not set the resource limit on 
> the containers currently. The memory limit is enforced by a separate 
> monitoring thread. The container implementation on Windows uses Job Object 
> right now. The latest Windows (8 or later) API allows CPU and memory limits 
> on the job objects. We want to create a Windows container executor that sets 
> the limits on job objects thus provides resource enforcement at OS level.
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3291) DockerContainerExecutor should run as a non-root user inside the container

2015-03-05 Thread Abin Shahab (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated YARN-3291:
--
Summary: DockerContainerExecutor should run as a non-root user inside the 
container  (was: DockerContainerExecutor should run as a non-root user)

> DockerContainerExecutor should run as a non-root user inside the container
> --
>
> Key: YARN-3291
> URL: https://issues.apache.org/jira/browse/YARN-3291
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Abin Shahab
>
> Currently DockerContainerExecutor runs container as root(inside the 
> container). Outside the container it runs as yarn. Inside the this can be run 
> as the user which is not root.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3291) DockerContainerExecutor should run as a non-root user

2015-03-05 Thread Abin Shahab (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated YARN-3291:
--
Description: Currently DockerContainerExecutor runs container as 
root(inside the container). Outside the container it runs as yarn. Inside the 
this can be run as the user which is not root.  (was: Currently 
DockerContainerExecutor runs container as root. This can be run as the user 
which is not root.)

> DockerContainerExecutor should run as a non-root user
> -
>
> Key: YARN-3291
> URL: https://issues.apache.org/jira/browse/YARN-3291
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Abin Shahab
>
> Currently DockerContainerExecutor runs container as root(inside the 
> container). Outside the container it runs as yarn. Inside the this can be run 
> as the user which is not root.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2190) Provide a Windows container executor that can limit memory and CPU

2015-03-05 Thread Chuan Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chuan Liu updated YARN-2190:

Attachment: YARN-2190.12.patch

Upload a new patch. With further discussion with [~venkateshrin] and [~jianhe], 
we decide to move the CPU and memory limits to the DefaultContainerExecutor. 
Both are turned off by default; so there should be no change to the existing 
behavior. 

> Provide a Windows container executor that can limit memory and CPU
> --
>
> Key: YARN-2190
> URL: https://issues.apache.org/jira/browse/YARN-2190
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Chuan Liu
>Assignee: Chuan Liu
> Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, 
> YARN-2190.10.patch, YARN-2190.11.patch, YARN-2190.12.patch, 
> YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, YARN-2190.5.patch, 
> YARN-2190.6.patch, YARN-2190.7.patch, YARN-2190.8.patch, YARN-2190.9.patch
>
>
> Yarn default container executor on Windows does not set the resource limit on 
> the containers currently. The memory limit is enforced by a separate 
> monitoring thread. The container implementation on Windows uses Job Object 
> right now. The latest Windows (8 or later) API allows CPU and memory limits 
> on the job objects. We want to create a Windows container executor that sets 
> the limits on job objects thus provides resource enforcement at OS level.
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2015-03-05 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349708#comment-14349708
 ] 

Wangda Tan commented on YARN-2495:
--

1) Changes of CommonNodeLalelsManager is uncessary

2) NodeHeartbeat's protobuf field name should consistent with java class method 
name:
nodeLabels-> areNodeLabelsSet

3) NodeHeartbeatRequestPBImpl: getter/setter should consistent, like 
get/setAreNodeLabelsSet

4) Same comment on RegisterNodeManagerRequestPBImpl

5) Integration with NM (NodeManager.java):
I think we may not need to check centralized/distributed configuration here, 
centralized/distributed is a config in RM side. In NM side, it should be how to 
get node labels, if user doesn't configure any script file for it, it should be 
null and no instance of NodeLabelProviderService will be added to NM.

So back to code, you can just leave getNodeLabelsProviderService(..), which 
will be implemented in YARN-2729.

If you agree, we need change the name {{isDistributedNodeLabelsConf}} to 

6) Why change this to notifyAll? I think it may be unnecessary:
{code}
synchronized (this.heartbeatMonitor) {
  this.heartbeatMonitor.notify();
}
{code}

7) isDistributedNodeLabels seems not so necessary here, and if you agree with 
5), it's better to remove the field.

8) Add null check or comment (provider returned node labels will always be 
not-null, for areNodeLabelsUpdated in NodeStatusUpdaterImpl

9) Since we already have TestNodeStatusUpdater, it's better to merge 
TestNodeStatusUpdaterForLabels to it.

10) ResourceTrackerService:
When register and found illegal labels, this should happen before instanting 
RMNodeImpl
Maybe you can still merge some common code for register/update?

Will include more tests reviewing in next turn. And could you take a look at 
failed tests/findbugs, etc.?


> Allow admin specify labels from each NM (Distributed configuration)
> ---
>
> Key: YARN-2495
> URL: https://issues.apache.org/jira/browse/YARN-2495
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
> YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
> YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
> YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
> YARN-2495.20150305-1.patch, YARN-2495_20141022.1.patch
>
>
> Target of this JIRA is to allow admin specify labels in each NM, this covers
> - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
> using script suggested by [~aw] (YARN-2729) )
> - NM will send labels to RM via ResourceTracker API
> - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.

2015-03-05 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349705#comment-14349705
 ] 

Zhijie Shen commented on YARN-3287:
---

Btw, {{getAuthenticationMethod() == 
UserGroupInformation.AuthenticationMethod.PROXY}} is equivalent to {{realUgi != 
null}}, right?

> TimelineClient kerberos authentication failure uses wrong login context.
> 
>
> Key: YARN-3287
> URL: https://issues.apache.org/jira/browse/YARN-3287
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Daryn Sharp
> Attachments: timeline.patch
>
>
> TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause 
> failure for yarn clients to create timeline domains during job submission.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.

2015-03-05 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349674#comment-14349674
 ] 

Zhijie Shen edited comment on YARN-3287 at 3/6/15 12:27 AM:


One concern: moving ugi computation into init will make the client only work 
for the user who inits it. It cannot be shared with the other users. For 
example, if RM want to reuse one client to publish data on behalf of different 
application users.


was (Author: zjshen):
One concern: moving ugi computation into init will make the client only work 
for the user who inits it. It cannot be shared with the other users.

> TimelineClient kerberos authentication failure uses wrong login context.
> 
>
> Key: YARN-3287
> URL: https://issues.apache.org/jira/browse/YARN-3287
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Daryn Sharp
> Attachments: timeline.patch
>
>
> TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause 
> failure for yarn clients to create timeline domains during job submission.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.

2015-03-05 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349674#comment-14349674
 ] 

Zhijie Shen commented on YARN-3287:
---

One concern: moving ugi computation into init will make the client only work 
for the user who inits it. It cannot be shared with the other users.

> TimelineClient kerberos authentication failure uses wrong login context.
> 
>
> Key: YARN-3287
> URL: https://issues.apache.org/jira/browse/YARN-3287
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Daryn Sharp
> Attachments: timeline.patch
>
>
> TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause 
> failure for yarn clients to create timeline domains during job submission.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3227) Timeline renew delegation token fails when RM user's TGT is expired

2015-03-05 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reassigned YARN-3227:
-

Assignee: Zhijie Shen

> Timeline renew delegation token fails when RM user's TGT is expired
> ---
>
> Key: YARN-3227
> URL: https://issues.apache.org/jira/browse/YARN-3227
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Zhijie Shen
>Priority: Critical
>
> When the RM user's kerberos TGT is expired, the RM renew delegation token 
> operation fails as part of job submission. Expected behavior is that RM will 
> relogin to get a new TGT.
> {quote}
> 2015-02-06 18:54:05,617 [DelegationTokenRenewer #25954] WARN
> security.DelegationTokenRenewer: Unable to add the application to the
> delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN,
> Service: timelineserver.example.com:4080, Ident: (owner=user,
> renewer=rmuser, realUser=oozie, issueDate=1423248845528,
> maxDate=1423853645528, sequenceNumber=9716, masterKeyId=9)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:443)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$800(DelegationTokenRenewer.java:77)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:808)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:789)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: java.io.IOException: HTTP status [401], message [Unauthorized]
> at
> org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:169)
> at
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:286)
> at
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.renewDelegationToken(DelegationTokenAuthenticator.java:211)
> at
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.renewDelegationToken(DelegationTokenAuthenticatedURL.java:414)
> at
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:374)
> at
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:360)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$4.run(TimelineClientImpl.java:429)
> at
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:161)
> at
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:444)
> at
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.renewDelegationToken(TimelineClientImpl.java:378)
> at
> org.apache.hadoop.yarn.security.client.TimelineDelegationTokenIdentifier$Renewer.renew(TimelineDelegationTokenIdentifier.java:81)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:532)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:529)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1809) Synchronize RM and Generic History Service Web-UIs

2015-03-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349658#comment-14349658
 ] 

Hadoop QA commented on YARN-1809:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12702918/YARN-1809.17.rebase.patch
  against trunk revision 952640f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 7 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6876//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6876//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6876//console

This message is automatically generated.

> Synchronize RM and Generic History Service Web-UIs
> --
>
> Key: YARN-1809
> URL: https://issues.apache.org/jira/browse/YARN-1809
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Attachments: YARN-1809.1.patch, YARN-1809.10.patch, 
> YARN-1809.11.patch, YARN-1809.12.patch, YARN-1809.13.patch, 
> YARN-1809.14.patch, YARN-1809.15-rebase.patch, YARN-1809.15.patch, 
> YARN-1809.16.patch, YARN-1809.17.patch, YARN-1809.17.rebase.patch, 
> YARN-1809.17.rebase.patch, YARN-1809.2.patch, YARN-1809.3.patch, 
> YARN-1809.4.patch, YARN-1809.5.patch, YARN-1809.5.patch, YARN-1809.6.patch, 
> YARN-1809.7.patch, YARN-1809.8.patch, YARN-1809.9.patch
>
>
> After YARN-953, the web-UI of generic history service is provide more 
> information than that of RM, the details about app attempt and container. 
> It's good to provide similar web-UIs, but retrieve the data from separate 
> source, i.e., RM cache and history store respectively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration

2015-03-05 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349644#comment-14349644
 ] 

Jian He commented on YARN-3136:
---

bq. Speaking of accessing the applications map without holding a proper lock, 
doesn't AbstractYarnScheduler.createReleaseCache() do exactly that?
createReleaseCache is only called In serviceInit, so I think should be fine.

I have a general question that,  is AbstractYarnScheduler supposed to be public 
for external use ?  I think AbstractYarnScheduler is just a common base class 
so as to avoid code duplication among CS/Fair/Fifo schedulers. we'll most 
likely add more changes in this class.  IMHO,  to not complicate things, we can 
just mark AbstractYarnScheduler as Private/Unstable.  this method was added a 
year earlier and now will be invoked only if work-preserving-am-restart is 
enabled. Maybe I'm wrong, I doubt we'll see compatibility issues in reality. 


> getTransferredContainers can be a bottleneck during AM registration
> ---
>
> Key: YARN-3136
> URL: https://issues.apache.org/jira/browse/YARN-3136
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 
> 0003-YARN-3136.patch, 0004-YARN-3136.patch
>
>
> While examining RM stack traces on a busy cluster I noticed a pattern of AMs 
> stuck waiting for the scheduler lock trying to call getTransferredContainers. 
>  The scheduler lock is highly contended, especially on a large cluster with 
> many nodes heartbeating, and it would be nice if we could find a way to 
> eliminate the need to grab this lock during this call.  We've already done 
> similar work during AM allocate calls to make sure they don't needlessly grab 
> the scheduler lock, and it would be good to do so here as well, if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3264) [Storage implementation] Create backing storage write interface and a POC only file based storage implementation

2015-03-05 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349578#comment-14349578
 ] 

Zhijie Shen commented on YARN-3264:
---

+1 LGTM, will commit the patch

> [Storage implementation] Create backing storage write interface and  a POC 
> only file based storage implementation
> -
>
> Key: YARN-3264
> URL: https://issues.apache.org/jira/browse/YARN-3264
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3264.001.patch, YARN-3264.002.patch, 
> YARN-3264.003.patch, YARN-3264.004.patch, YARN-3264.005.patch, 
> YARN-3264.006.patch, YARN-3264.007.patch, YARN-3264.008.patch
>
>
> For the PoC, need to create a backend impl for file based storage of entities 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration

2015-03-05 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349575#comment-14349575
 ] 

Jason Lowe commented on YARN-3136:
--

Thanks for updating the patch, Sunil.  Comments:

I'd like to see a comment on the method stating why it's synchronizing and that 
it should be overridden for performance if the underlying map supports 
concurrent access.  Also we're inconsistent about using the new method.  There 
are plenty of places, both in AbstractYarnScheduler and otherwise, that still 
call applications.get rather than the new method to abstract it.

The FairScheduler and FifoScheduler also use concurrent maps, shouldn't they 
override the method as well?

Speaking of accessing the applications map without holding a proper lock, 
doesn't AbstractYarnScheduler.createReleaseCache() do exactly that?  Seems like 
the underlying map needs to be concurrent for that code to iterate the map 
without holding a lock, and either it's safe to assume it is concurrent (and 
thus we don't need all this extra get method overhead and related changes), or 
createReleaseCache() is broken for third-party schedulers who didn't bother to 
use a concurrent map.

> getTransferredContainers can be a bottleneck during AM registration
> ---
>
> Key: YARN-3136
> URL: https://issues.apache.org/jira/browse/YARN-3136
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 
> 0003-YARN-3136.patch, 0004-YARN-3136.patch
>
>
> While examining RM stack traces on a busy cluster I noticed a pattern of AMs 
> stuck waiting for the scheduler lock trying to call getTransferredContainers. 
>  The scheduler lock is highly contended, especially on a large cluster with 
> many nodes heartbeating, and it would be nice if we could find a way to 
> eliminate the need to grab this lock during this call.  We've already done 
> similar work during AM allocate calls to make sure they don't needlessly grab 
> the scheduler lock, and it would be good to do so here as well, if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3298) User-limit should be enforced in CapacityScheduler

2015-03-05 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3298:


 Summary: User-limit should be enforced in CapacityScheduler
 Key: YARN-3298
 URL: https://issues.apache.org/jira/browse/YARN-3298
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, yarn
Reporter: Wangda Tan
Assignee: Wangda Tan


User-limit is not treat as a hard-limit for now, it will not consider 
required-resource (resource of being-allocated resource request). And also, 
when user's used resource equals to user-limit, it will still continue. This 
will generate jitter issues when we have YARN-2069 (preemption policy kills a 
container under an user, and scheduler allocate a container under the same user 
soon after).

The expected behavior should be as same as queue's capacity:
Only when user.usage + required <= user-limit, queue will continue to allocate 
container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1809) Synchronize RM and Generic History Service Web-UIs

2015-03-05 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1809:

Attachment: YARN-1809.17.rebase.patch

> Synchronize RM and Generic History Service Web-UIs
> --
>
> Key: YARN-1809
> URL: https://issues.apache.org/jira/browse/YARN-1809
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Attachments: YARN-1809.1.patch, YARN-1809.10.patch, 
> YARN-1809.11.patch, YARN-1809.12.patch, YARN-1809.13.patch, 
> YARN-1809.14.patch, YARN-1809.15-rebase.patch, YARN-1809.15.patch, 
> YARN-1809.16.patch, YARN-1809.17.patch, YARN-1809.17.rebase.patch, 
> YARN-1809.17.rebase.patch, YARN-1809.2.patch, YARN-1809.3.patch, 
> YARN-1809.4.patch, YARN-1809.5.patch, YARN-1809.5.patch, YARN-1809.6.patch, 
> YARN-1809.7.patch, YARN-1809.8.patch, YARN-1809.9.patch
>
>
> After YARN-953, the web-UI of generic history service is provide more 
> information than that of RM, the details about app attempt and container. 
> It's good to provide similar web-UIs, but retrieve the data from separate 
> source, i.e., RM cache and history store respectively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3275) CapacityScheduler: Preemption happening on non-preemptable queues

2015-03-05 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349540#comment-14349540
 ] 

Jason Lowe commented on YARN-3275:
--

+1 lgtm.  Findbug warnings appear to be unrelated.  Will commit this tomorrow 
if there are no further comments or objections.

> CapacityScheduler: Preemption happening on non-preemptable queues
> -
>
> Key: YARN-3275
> URL: https://issues.apache.org/jira/browse/YARN-3275
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>  Labels: capacity-scheduler
> Attachments: YARN-3275.v1.txt, YARN-3275.v2.txt
>
>
> YARN-2056 introduced the ability to turn preemption on and off at the queue 
> level. In cases where a queue goes over its absolute max capacity (YARN-3243, 
> for example), containers can be preempted from that queue, even though the 
> queue is marked as non-preemptable.
> We are using this feature in large, busy clusters and seeing this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3264) [Storage implementation] Create backing storage write interface and a POC only file based storage implementation

2015-03-05 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3264:
-
Attachment: YARN-3264.008.patch


[~rkanter] 
Thanks, updated patch with renamed file and constant name.

> [Storage implementation] Create backing storage write interface and  a POC 
> only file based storage implementation
> -
>
> Key: YARN-3264
> URL: https://issues.apache.org/jira/browse/YARN-3264
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3264.001.patch, YARN-3264.002.patch, 
> YARN-3264.003.patch, YARN-3264.004.patch, YARN-3264.005.patch, 
> YARN-3264.006.patch, YARN-3264.007.patch, YARN-3264.008.patch
>
>
> For the PoC, need to create a backend impl for file based storage of entities 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3296) yarn.nodemanager.container-monitor.process-tree.class is configurable but ResourceCalculatorProcessTree class is marked Private

2015-03-05 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349511#comment-14349511
 ] 

Karthik Kambatla commented on YARN-3296:


Agree with setting it to Public-Evolving. I think we should make the methods 
uniform along with marking it Public. At the very least, I think the following 
methods shouldn't be abstract: getCumulativeVmem, getCumulativeRssmem, 
getCumulativeCpuTime, getCpuUsagePercent. These methods should probably return 
UNAVAILABLE (=0). 

[~hitesh], [~jianhe] - what do you think? 

> yarn.nodemanager.container-monitor.process-tree.class is configurable but 
> ResourceCalculatorProcessTree class is marked Private
> ---
>
> Key: YARN-3296
> URL: https://issues.apache.org/jira/browse/YARN-3296
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
> Attachments: YARN-3296.1.patch
>
>
> Given that someone can implement their custom plugin for resource monitoring 
> and configure the NM to use it, this class should be marked public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3080) The DockerContainerExecutor could not write the right pid to container pidFile

2015-03-05 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349507#comment-14349507
 ] 

Sidharta Seethana commented on YARN-3080:
-

[~vvasudev] ,

[~ashahab] is right - docker run ( *irrespective of whether we run in detached 
mode or not* ) talks to the docker daemon which in turn sets up the appropriate 
environment before launching the process. For better or worse, the 'contained' 
process is always a child of the docker daemon and has no relation (from a 
process tree perspective) to the process that executes 'docker run'.  

> The DockerContainerExecutor could not write the right pid to container pidFile
> --
>
> Key: YARN-3080
> URL: https://issues.apache.org/jira/browse/YARN-3080
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Beckham007
>Assignee: Abin Shahab
> Attachments: YARN-3080.patch, YARN-3080.patch, YARN-3080.patch, 
> YARN-3080.patch
>
>
> The docker_container_executor_session.sh is like this:
> {quote}
> #!/usr/bin/env bash
> echo `/usr/bin/docker inspect --format {{.State.Pid}} 
> container_1421723685222_0008_01_02` > 
> /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp
> /bin/mv -f 
> /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp
>  
> /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid
> /usr/bin/docker run --rm  --name container_1421723685222_0008_01_02 -e 
> GAIA_HOST_IP=c162 -e GAIA_API_SERVER=10.6.207.226:8080 -e 
> GAIA_CLUSTER_ID=shpc-nm_restart -e GAIA_QUEUE=root.tdwadmin -e 
> GAIA_APP_NAME=test_nm_docker -e GAIA_INSTANCE_ID=1 -e 
> GAIA_CONTAINER_ID=container_1421723685222_0008_01_02 --memory=32M 
> --cpu-shares=1024 -v 
> /data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02
>  -v 
> /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02
>  -P -e A=B --privileged=true docker.oa.com:8080/library/centos7 bash 
> "/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02/launch_container.sh"
> {quote}
> The DockerContainerExecutor use docker inspect before docker run, so the 
> docker inspect couldn't get the right pid for the docker, signalContainer() 
> and nm restart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3264) [Storage implementation] Create backing storage write interface and a POC only file based storage implementation

2015-03-05 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349504#comment-14349504
 ] 

Robert Kanter commented on YARN-3264:
-

Looks good.

Two minor thing:
- {{TIMELINE_SERVICE_STORAGAGE_EXTENSION}} in {{FileSystemTimelineWriterImpl}} 
has a typo.
- Would it make sense to rename {{TimelineAggregationGranularity}} to 
{{TimelineAggregationTrack}}?  That seems more clear about what it's for, and 
more in line with what all the comments say about it.

> [Storage implementation] Create backing storage write interface and  a POC 
> only file based storage implementation
> -
>
> Key: YARN-3264
> URL: https://issues.apache.org/jira/browse/YARN-3264
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3264.001.patch, YARN-3264.002.patch, 
> YARN-3264.003.patch, YARN-3264.004.patch, YARN-3264.005.patch, 
> YARN-3264.006.patch, YARN-3264.007.patch
>
>
> For the PoC, need to create a backend impl for file based storage of entities 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3122) Metrics for container's actual CPU usage

2015-03-05 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349498#comment-14349498
 ] 

Karthik Kambatla commented on YARN-3122:


[~venkateshrin] - thanks for chiming in. 

Given that ResourceCalculatorProcessTree was @Private, I was okay with 
including an abstract method. As [~hitesh] points on YARN-3296, I understand 
why we would want to make it @Public. Let us continue the conversation there. 

> Metrics for container's actual CPU usage
> 
>
> Key: YARN-3122
> URL: https://issues.apache.org/jira/browse/YARN-3122
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Fix For: 2.7.0
>
> Attachments: YARN-3122.001.patch, YARN-3122.002.patch, 
> YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.005.patch, 
> YARN-3122.006.patch, YARN-3122.007.patch, YARN-3122.prelim.patch, 
> YARN-3122.prelim.patch
>
>
> It would be nice to capture resource usage per container, for a variety of 
> reasons. This JIRA is to track CPU usage. 
> YARN-2965 tracks the resource usage on the node, and the two implementations 
> should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-05 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349494#comment-14349494
 ] 

Yongjun Zhang commented on YARN-3021:
-

Thanks a lot Jian,

Good suggestion of #1, and I agree that "in reality I don't foresee big 
breakage".  I further discussed [~adhoot], and he agrees with this approach too.

Hi [~vinodkv] and [~qwertymaniac], comments on this approach that Jian 
described above?

If there is no objection, I will try to work out a revised patch asap.

Thanks.





> YARN's delegation-token handling disallows certain trust setups to operate 
> properly over DistCp
> ---
>
> Key: YARN-3021
> URL: https://issues.apache.org/jira/browse/YARN-3021
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.3.0
>Reporter: Harsh J
> Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
> YARN-3021.003.patch, YARN-3021.patch
>
>
> Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
> and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
> clusters.
> Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
> needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
> as it attempts a renewDelegationToken(…) synchronously during application 
> submission (to validate the managed token before it adds it to a scheduler 
> for automatic renewal). The call obviously fails cause B realm will not trust 
> A's credentials (here, the RM's principal is the renewer).
> In the 1.x JobTracker the same call is present, but it is done asynchronously 
> and once the renewal attempt failed we simply ceased to schedule any further 
> attempts of renewals, rather than fail the job immediately.
> We should change the logic such that we attempt the renewal but go easy on 
> the failure and skip the scheduling alone, rather than bubble back an error 
> to the client, failing the app submission. This way the old behaviour is 
> retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2981) DockerContainerExecutor must support a Cluster-wide default Docker image

2015-03-05 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349481#comment-14349481
 ] 

Sidharta Seethana commented on YARN-2981:
-

[~ashahab]  The change seems fine to me. I would recommend adding a unit test 
(like Ravi mentions above) and changing the docker image specified in the 
example to a more recent one ( I think there is a 2.6.0 image available ).

thanks.

> DockerContainerExecutor must support a Cluster-wide default Docker image
> 
>
> Key: YARN-2981
> URL: https://issues.apache.org/jira/browse/YARN-2981
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Abin Shahab
>Assignee: Abin Shahab
> Attachments: YARN-2981.patch, YARN-2981.patch, YARN-2981.patch, 
> YARN-2981.patch
>
>
> This allows the yarn administrator to add a cluster-wide default docker image 
> that will be used when there are no per-job override of docker images. With 
> this features, it would be convenient for newer applications like slider to 
> launch inside a cluster-default docker container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.

2015-03-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349450#comment-14349450
 ] 

Hadoop QA commented on YARN-3287:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702897/timeline.patch
  against trunk revision 952640f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6869//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6869//console

This message is automatically generated.

> TimelineClient kerberos authentication failure uses wrong login context.
> 
>
> Key: YARN-3287
> URL: https://issues.apache.org/jira/browse/YARN-3287
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Daryn Sharp
> Attachments: timeline.patch
>
>
> TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause 
> failure for yarn clients to create timeline domains during job submission.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1809) Synchronize RM and Generic History Service Web-UIs

2015-03-05 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1809:

Attachment: YARN-1809.17.rebase.patch

rebase the patch

> Synchronize RM and Generic History Service Web-UIs
> --
>
> Key: YARN-1809
> URL: https://issues.apache.org/jira/browse/YARN-1809
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Attachments: YARN-1809.1.patch, YARN-1809.10.patch, 
> YARN-1809.11.patch, YARN-1809.12.patch, YARN-1809.13.patch, 
> YARN-1809.14.patch, YARN-1809.15-rebase.patch, YARN-1809.15.patch, 
> YARN-1809.16.patch, YARN-1809.17.patch, YARN-1809.17.rebase.patch, 
> YARN-1809.2.patch, YARN-1809.3.patch, YARN-1809.4.patch, YARN-1809.5.patch, 
> YARN-1809.5.patch, YARN-1809.6.patch, YARN-1809.7.patch, YARN-1809.8.patch, 
> YARN-1809.9.patch
>
>
> After YARN-953, the web-UI of generic history service is provide more 
> information than that of RM, the details about app attempt and container. 
> It's good to provide similar web-UIs, but retrieve the data from separate 
> source, i.e., RM cache and history store respectively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3296) yarn.nodemanager.container-monitor.process-tree.class is configurable but ResourceCalculatorProcessTree class is marked Private

2015-03-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349415#comment-14349415
 ] 

Hadoop QA commented on YARN-3296:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702882/YARN-3296.1.patch
  against trunk revision 952640f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6867//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6867//console

This message is automatically generated.

> yarn.nodemanager.container-monitor.process-tree.class is configurable but 
> ResourceCalculatorProcessTree class is marked Private
> ---
>
> Key: YARN-3296
> URL: https://issues.apache.org/jira/browse/YARN-3296
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
> Attachments: YARN-3296.1.patch
>
>
> Given that someone can implement their custom plugin for resource monitoring 
> and configure the NM to use it, this class should be marked public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3297) Changes for ResourceCalculatorProcessTree in YARN-3122 could be done in a more compatible manner

2015-03-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349414#comment-14349414
 ] 

Hadoop QA commented on YARN-3297:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702881/YARN-3297.1.patch
  against trunk revision 952640f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6865//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6865//console

This message is automatically generated.

> Changes for ResourceCalculatorProcessTree in YARN-3122 could be done in a 
> more compatible manner
> 
>
> Key: YARN-3297
> URL: https://issues.apache.org/jira/browse/YARN-3297
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
> Attachments: YARN-3297.1.patch
>
>
> Related to YARN-3296, changes in YARN-3122 break any custom resource 
> monitoring plugin maintained outside of the YARN codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.

2015-03-05 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-3287:
--
Attachment: timeline.patch

posting this file on daryn's behalf.

> TimelineClient kerberos authentication failure uses wrong login context.
> 
>
> Key: YARN-3287
> URL: https://issues.apache.org/jira/browse/YARN-3287
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Daryn Sharp
> Attachments: timeline.patch
>
>
> TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause 
> failure for yarn clients to create timeline domains during job submission.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3187) Documentation of Capacity Scheduler Queue mapping based on user or group

2015-03-05 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349388#comment-14349388
 ] 

Jian He commented on YARN-3187:
---

Thanks [~gururaj],  some comments on the patch:
- “To specify list of users, %user can be used.” — I feel list of users is not 
that accurate, how about “to specify whichever user, %user can be used”
- “To specify queue name same as user list, %user can be used.”  — how about 
“To specify queue name same as user name, %user can be used.”
- “To specify queue name same as primary group ” — “To specify queue name same 
as the name of  the primary group which user belongs to ”

> Documentation of Capacity Scheduler Queue mapping based on user or group
> 
>
> Key: YARN-3187
> URL: https://issues.apache.org/jira/browse/YARN-3187
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, documentation
>Affects Versions: 2.6.0
>Reporter: Naganarasimha G R
>Assignee: Gururaj Shetty
>  Labels: documentation
> Fix For: 2.6.0
>
> Attachments: YARN-3187.1.patch, YARN-3187.2.patch, YARN-3187.3.patch
>
>
> YARN-2411 exposes a very useful feature {{support simple user and group 
> mappings to queues}} but its not captured in the documentation. So in this 
> jira we plan to document this feature



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-03-05 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349365#comment-14349365
 ] 

Vrushali C commented on YARN-3134:
--

Hi [~swagle]
bq. Do the responses to these API calls return any timeseries data?: 
_GetFlowByAppId_ and _GetAppDetails_
No, not for these two. These are specific to that hadoop application id which 
is unique. 

bq.  The set of access patterns do not cover query directly by a metricName. Is 
there a use case for this? (Note: General use case for driving graphs)
In hRaven, we usually fetch everything for a given flow and time range and 
allow filtering/searching in the UI for querying for a particular metric.  


bq. Do you use the hbase native timestamp for querying? This is an obvious 
optimization for timeseries data.
No, we don’t use that one at all. We have the submit time of a flow stored as 
run id in  row key (as well as in columns). 
The row key for job history is
{code} 
cluster ! user ! flow name ! run id ! app id
{code}
where run id is the submit time of the flow. It is stored as an inverted long, 
which helps maintain the sorting such that most recent flow runs are stored 
first for that flow.  When querying for time series or time range, having this 
inverted long in row key helps to set the start and stop rows for scan so that 
it's time bound.  

Eg: 
https://github.com/twitter/hraven/blob/master/hraven-core/src/main/java/com/twitter/hraven/datasource/JobHistoryService.java#L277

bq. however how do you handle out of band data
I am sorry, I didn’t get what is out of band data?


> [Storage implementation] Exploiting the option of using Phoenix to access 
> HBase backend
> ---
>
> Key: YARN-3134
> URL: https://issues.apache.org/jira/browse/YARN-3134
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> Quote the introduction on Phoenix web page:
> {code}
> Apache Phoenix is a relational database layer over HBase delivered as a 
> client-embedded JDBC driver targeting low latency queries over HBase data. 
> Apache Phoenix takes your SQL query, compiles it into a series of HBase 
> scans, and orchestrates the running of those scans to produce regular JDBC 
> result sets. The table metadata is stored in an HBase table and versioned, 
> such that snapshot queries over prior versions will automatically use the 
> correct schema. Direct use of the HBase API, along with coprocessors and 
> custom filters, results in performance on the order of milliseconds for small 
> queries, or seconds for tens of millions of rows.
> {code}
> It may simply our implementation read/write data from/to HBase, and can 
> easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3264) [Storage implementation] Create backing storage write interface and a POC only file based storage implementation

2015-03-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349356#comment-14349356
 ] 

Hadoop QA commented on YARN-3264:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702885/YARN-3264.007.patch
  against trunk revision 952640f.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6868//console

This message is automatically generated.

> [Storage implementation] Create backing storage write interface and  a POC 
> only file based storage implementation
> -
>
> Key: YARN-3264
> URL: https://issues.apache.org/jira/browse/YARN-3264
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3264.001.patch, YARN-3264.002.patch, 
> YARN-3264.003.patch, YARN-3264.004.patch, YARN-3264.005.patch, 
> YARN-3264.006.patch, YARN-3264.007.patch
>
>
> For the PoC, need to create a backend impl for file based storage of entities 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3264) [Storage implementation] Create backing storage write interface and a POC only file based storage implementation

2015-03-05 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349352#comment-14349352
 ] 

Li Lu commented on YARN-3264:
-

Thanks [~vrushalic]! Applied the v6 patch and tested locally, diffed with the 
v7 patch and the latest one (v7) LGTM. 

> [Storage implementation] Create backing storage write interface and  a POC 
> only file based storage implementation
> -
>
> Key: YARN-3264
> URL: https://issues.apache.org/jira/browse/YARN-3264
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3264.001.patch, YARN-3264.002.patch, 
> YARN-3264.003.patch, YARN-3264.004.patch, YARN-3264.005.patch, 
> YARN-3264.006.patch, YARN-3264.007.patch
>
>
> For the PoC, need to create a backend impl for file based storage of entities 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3264) [Storage implementation] Create backing storage write interface and a POC only file based storage implementation

2015-03-05 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3264:
-
Attachment: YARN-3264.007.patch


Updated patch with all modified files and imports. 

> [Storage implementation] Create backing storage write interface and  a POC 
> only file based storage implementation
> -
>
> Key: YARN-3264
> URL: https://issues.apache.org/jira/browse/YARN-3264
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3264.001.patch, YARN-3264.002.patch, 
> YARN-3264.003.patch, YARN-3264.004.patch, YARN-3264.005.patch, 
> YARN-3264.006.patch, YARN-3264.007.patch
>
>
> For the PoC, need to create a backend impl for file based storage of entities 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1621) Add CLI to list rows of

2015-03-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349339#comment-14349339
 ] 

Bartosz Ługowski commented on YARN-1621:


[~Naganarasimha], thanks for review.

1. Done.
2. Done.
3. Good idea, there is no need to repeat application attempt ID for each 
container. Example:
{code}ApplicationAttempt-Id: appattempt_1234_0005_01
Total number of containers: 2
  Container-IdStart Time Finish Time
   StateHost
LOG-URL
 container_1234_0005_01_01  Thu Jan 01 01:00:01 +0100 1970  Thu Jan 01 
01:00:01 +0100 1970  COMPLETE  host1:8881   
  logURL
 container_1234_0005_01_02  Thu Jan 01 01:00:01 +0100 1970  Thu Jan 01 
01:00:01 +0100 1970   NEW  host2:8882   
  logURL{code}

> Add CLI to list rows of  state of container>
> --
>
> Key: YARN-1621
> URL: https://issues.apache.org/jira/browse/YARN-1621
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Tassapol Athiapinya
>Assignee: Bartosz Ługowski
> Attachments: YARN-1621.1.patch, YARN-1621.2.patch, YARN-1621.3.patch, 
> YARN-1621.4.patch, YARN-1621.5.patch
>
>
> As more applications are moved to YARN, we need generic CLI to list rows of 
> . Today 
> if YARN application running in a container does hang, there is no way to find 
> out more info because a user does not know where each attempt is running in.
> For each running application, it is useful to differentiate between 
> running/succeeded/failed/killed containers.
>  
> {code:title=proposed yarn cli}
> $ yarn application -list-containers -applicationId  [-containerState 
> ]
> where containerState is optional filter to list container in given state only.
>  can be running/succeeded/killed/failed/all.
> A user can specify more than one container state at once e.g. KILLED,FAILED.
> 
> {code}
> CLI should work with running application/completed application. If a 
> container runs many task attempts, all attempts should be shown. That will 
> likely be the case of Tez container-reuse application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3296) yarn.nodemanager.container-monitor.process-tree.class is configurable but ResourceCalculatorProcessTree class is marked Private

2015-03-05 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-3296:
--
Attachment: YARN-3296.1.patch

> yarn.nodemanager.container-monitor.process-tree.class is configurable but 
> ResourceCalculatorProcessTree class is marked Private
> ---
>
> Key: YARN-3296
> URL: https://issues.apache.org/jira/browse/YARN-3296
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
> Attachments: YARN-3296.1.patch
>
>
> Given that someone can implement their custom plugin for resource monitoring 
> and configure the NM to use it, this class should be marked public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2015-03-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349336#comment-14349336
 ] 

Hadoop QA commented on YARN-2495:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12702856/YARN-2495.20150305-1.patch
  against trunk revision 28b85a2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 7 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels
  
org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater
  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6864//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6864//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6864//console

This message is automatically generated.

> Allow admin specify labels from each NM (Distributed configuration)
> ---
>
> Key: YARN-2495
> URL: https://issues.apache.org/jira/browse/YARN-2495
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
> YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
> YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
> YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
> YARN-2495.20150305-1.patch, YARN-2495_20141022.1.patch
>
>
> Target of this JIRA is to allow admin specify labels in each NM, this covers
> - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
> using script suggested by [~aw] (YARN-2729) )
> - NM will send labels to RM via ResourceTracker API
> - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1621) Add CLI to list rows of

2015-03-05 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bartosz Ługowski updated YARN-1621:
---
Attachment: YARN-1621.5.patch

> Add CLI to list rows of  state of container>
> --
>
> Key: YARN-1621
> URL: https://issues.apache.org/jira/browse/YARN-1621
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Tassapol Athiapinya
>Assignee: Bartosz Ługowski
> Attachments: YARN-1621.1.patch, YARN-1621.2.patch, YARN-1621.3.patch, 
> YARN-1621.4.patch, YARN-1621.5.patch
>
>
> As more applications are moved to YARN, we need generic CLI to list rows of 
> . Today 
> if YARN application running in a container does hang, there is no way to find 
> out more info because a user does not know where each attempt is running in.
> For each running application, it is useful to differentiate between 
> running/succeeded/failed/killed containers.
>  
> {code:title=proposed yarn cli}
> $ yarn application -list-containers -applicationId  [-containerState 
> ]
> where containerState is optional filter to list container in given state only.
>  can be running/succeeded/killed/failed/all.
> A user can specify more than one container state at once e.g. KILLED,FAILED.
> 
> {code}
> CLI should work with running application/completed application. If a 
> container runs many task attempts, all attempts should be shown. That will 
> likely be the case of Tez container-reuse application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3297) Changes for ResourceCalculatorProcessTree in YARN-3122 could be done in a more compatible manner

2015-03-05 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-3297:
--
Attachment: YARN-3297.1.patch

> Changes for ResourceCalculatorProcessTree in YARN-3122 could be done in a 
> more compatible manner
> 
>
> Key: YARN-3297
> URL: https://issues.apache.org/jira/browse/YARN-3297
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
> Attachments: YARN-3297.1.patch
>
>
> Related to YARN-3296, changes in YARN-3122 break any custom resource 
> monitoring plugin maintained outside of the YARN codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3297) Changes for ResourceCalculatorProcessTree in YARN-3122 could be done in a more compatible manner

2015-03-05 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-3297:
--
Attachment: (was: YARN-3297.1.patch)

> Changes for ResourceCalculatorProcessTree in YARN-3122 could be done in a 
> more compatible manner
> 
>
> Key: YARN-3297
> URL: https://issues.apache.org/jira/browse/YARN-3297
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>
> Related to YARN-3296, changes in YARN-3122 break any custom resource 
> monitoring plugin maintained outside of the YARN codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2868) Add metric for initial container launch time to FairScheduler

2015-03-05 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349323#comment-14349323
 ] 

Ray Chiang commented on YARN-2868:
--

RE: findbugs

None of the flagged issues are in files changed here.

RE: failing unit tests

These tests are passing in my tree.

> Add metric for initial container launch time to FairScheduler
> -
>
> Key: YARN-2868
> URL: https://issues.apache.org/jira/browse/YARN-2868
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: metrics, supportability
> Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
> YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, 
> YARN-2868.006.patch, YARN-2868.007.patch, YARN-2868.008.patch, 
> YARN-2868.009.patch, YARN-2868.010.patch
>
>
> Add a metric to measure the latency between "starting container allocation" 
> and "first container actually allocated".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3297) Changes for ResourceCalculatorProcessTree in YARN-3122 could be done in a more compatible manner

2015-03-05 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-3297:
--
Attachment: YARN-3297.1.patch

> Changes for ResourceCalculatorProcessTree in YARN-3122 could be done in a 
> more compatible manner
> 
>
> Key: YARN-3297
> URL: https://issues.apache.org/jira/browse/YARN-3297
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
> Attachments: YARN-3297.1.patch
>
>
> Related to YARN-3296, changes in YARN-3122 break any custom resource 
> monitoring plugin maintained outside of the YARN codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3297) Changes for ResourceCalculatorProcessTree in YARN-3122 could be done in a more compatible manner

2015-03-05 Thread Hitesh Shah (JIRA)
Hitesh Shah created YARN-3297:
-

 Summary: Changes for ResourceCalculatorProcessTree in YARN-3122 
could be done in a more compatible manner
 Key: YARN-3297
 URL: https://issues.apache.org/jira/browse/YARN-3297
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah


Related to YARN-3296, changes in YARN-3122 break any custom resource monitoring 
plugin maintained outside of the YARN codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3296) yarn.nodemanager.container-monitor.process-tree.class is configurable but ResourceCalculatorProcessTree class is marked Private

2015-03-05 Thread Hitesh Shah (JIRA)
Hitesh Shah created YARN-3296:
-

 Summary: yarn.nodemanager.container-monitor.process-tree.class is 
configurable but ResourceCalculatorProcessTree class is marked Private
 Key: YARN-3296
 URL: https://issues.apache.org/jira/browse/YARN-3296
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah


Given that someone can implement their custom plugin for resource monitoring 
and configure the NM to use it, this class should be marked public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-03-05 Thread Siddharth Wagle (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349302#comment-14349302
 ] 

Siddharth Wagle commented on YARN-3134:
---

Thanks [~vrushalic] 
*Questions*: 
- Do the responses to these API calls return any timeseries data?: 
_GetFlowByAppId_ and _GetAppDetails_
- The set of access patterns do not cover query directly by a metricName. Is 
there a use case for this? (Note: General use case for driving graphs)
- Do you use the hbase native timestamp for querying? This is an obvious 
optimization for timeseries data, however how do you handle out of band data?

> [Storage implementation] Exploiting the option of using Phoenix to access 
> HBase backend
> ---
>
> Key: YARN-3134
> URL: https://issues.apache.org/jira/browse/YARN-3134
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> Quote the introduction on Phoenix web page:
> {code}
> Apache Phoenix is a relational database layer over HBase delivered as a 
> client-embedded JDBC driver targeting low latency queries over HBase data. 
> Apache Phoenix takes your SQL query, compiles it into a series of HBase 
> scans, and orchestrates the running of those scans to produce regular JDBC 
> result sets. The table metadata is stored in an HBase table and versioned, 
> such that snapshot queries over prior versions will automatically use the 
> correct schema. Direct use of the HBase API, along with coprocessors and 
> custom filters, results in performance on the order of milliseconds for small 
> queries, or seconds for tens of millions of rows.
> {code}
> It may simply our implementation read/write data from/to HBase, and can 
> easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection

2015-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349286#comment-14349286
 ] 

Hudson commented on YARN-2786:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7269 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7269/])
YARN-2786. Created a yarn cluster CLI and seeded with one command for listing 
node-labels collection. Contributed by Wangda Tan. (vinodkv: rev 
138c9cadee32da4d17be9835461bde646825d8d5)
* hadoop-yarn-project/CHANGES.txt
* hadoop-yarn-project/hadoop-yarn/bin/yarn
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ClusterCLI.java
* hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestClusterCLI.java


> Create yarn cluster CLI to enable list node labels collection
> -
>
> Key: YARN-2786
> URL: https://issues.apache.org/jira/browse/YARN-2786
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 2.7.0
>
> Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, 
> YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, 
> YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, 
> YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, 
> YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, 
> YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch, 
> YARN-2786-20150108-1-full.patch, YARN-2786-20150108-1-without-yarn.cmd.patch, 
> YARN-2786-20150303-1-trunk.patch, YARN-2786-20150304-1-branch2.patch, 
> YARN-2786-20150304-1-trunk-to-rekick-Jenkins.patch, 
> YARN-2786-20150304-1-trunk.patch, YARN-2786-20150304-2-branch2.patch, 
> YARN-2786-20150304-2-trunk-to-kick-jenkins.patch, 
> YARN-2786-20150304-2-trunk.patch
>
>
> With YARN-2778, we can list node labels on existing RM nodes. But it is not 
> enough, we should be able to: 
> 1) list node labels collection
> The command should start with "yarn cluster ...", in the future, we can add 
> more functionality to the "yarnClusterCLI"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.

2015-03-05 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349238#comment-14349238
 ] 

Wangda Tan commented on YARN-3243:
--

mvn eclipse:eclipse can get passed locally.

> CapacityScheduler should pass headroom from parent to children to make sure 
> ParentQueue obey its capacity limits.
> -
>
> Key: YARN-3243
> URL: https://issues.apache.org/jira/browse/YARN-3243
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-3243.1.patch
>
>
> Now CapacityScheduler has some issues to make sure ParentQueue always obeys 
> its capacity limits, for example:
> 1) When allocating container of a parent queue, it will only check 
> parentQueue.usage < parentQueue.max. If leaf queue allocated a container.size 
> > (parentQueue.max - parentQueue.usage), parent queue can excess its max 
> resource limit, as following example:
> {code}
> A  (usage=54, max=55)
>/ \
>   A1 A2 (usage=1, max=55)
> (usage=53, max=53)
> {code}
> Queue-A2 is able to allocate container since its usage < max, but if we do 
> that, A's usage can excess A.max.
> 2) When doing continous reservation check, parent queue will only tell 
> children "you need unreserve *some* resource, so that I will less than my 
> maximum resource", but it will not tell how many resource need to be 
> unreserved. This may lead to parent queue excesses configured maximum 
> capacity as well.
> With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, 
> *here is my proposal*:
> - ParentQueue will set its children's ResourceUsage.headroom, which means, 
> *maximum resource its children can allocate*.
> - ParentQueue will set its children's headroom to be (saying parent's name is 
> "qA"): min(qA.headroom, qA.max - qA.used). This will make sure qA's 
> ancestors' capacity will be enforced as well (qA.headroom is set by qA's 
> parent).
> - {{needToUnReserve}} is not necessary, instead, children can get how much 
> resource need to be unreserved to keep its parent's resource limit.
> - More over, with this, YARN-3026 will make a clear boundary between 
> LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3214) Add non-exclusive node labels

2015-03-05 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349194#comment-14349194
 ] 

Naganarasimha G R commented on YARN-3214:
-

Thanks for the clarification [~wangda],

> Add non-exclusive node labels 
> --
>
> Key: YARN-3214
> URL: https://issues.apache.org/jira/browse/YARN-3214
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: Non-exclusive-Node-Partition-Design.pdf
>
>
> Currently node labels partition the cluster to some sub-clusters so resources 
> cannot be shared between partitioned cluster. 
> With the current implementation of node labels we cannot use the cluster 
> optimally and the throughput of the cluster will suffer.
> We are proposing adding non-exclusive node labels:
> 1. Labeled apps get the preference on Labeled nodes 
> 2. If there is no ask for labeled resources we can assign those nodes to non 
> labeled apps
> 3. If there is any future ask for those resources , we will preempt the non 
> labeled apps and give them back to labeled apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3214) Add non-exclusive node labels

2015-03-05 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349173#comment-14349173
 ] 

Wangda Tan commented on YARN-3214:
--

Hi Naga,
Thanks for review,
First I need to day, what we plan to do for this JIRA and already committed 
patches of YARN-2492 is not only partitioning, it's partition and tagging nodes 
(but each node can have at most one partition). This is to make scheduler part 
consistent, and each queue can have a dedicated portion of capacities on 
different partitions.

Regarding {{And any given node can have at most one label of the first kind 
(one on which capacity can be specified ) and multiple tag kind of labels. App 
can specify label expression on tag kind of labels.}}.
We're working on a design for this -- support multiple labels in each node -- 
we have some discussions internally, there're some workable ways to do it, but 
they're not perfect (some limitations with these approach like you said). Will 
post design doc once the proposal get polished.

Wangda

> Add non-exclusive node labels 
> --
>
> Key: YARN-3214
> URL: https://issues.apache.org/jira/browse/YARN-3214
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: Non-exclusive-Node-Partition-Design.pdf
>
>
> Currently node labels partition the cluster to some sub-clusters so resources 
> cannot be shared between partitioned cluster. 
> With the current implementation of node labels we cannot use the cluster 
> optimally and the throughput of the cluster will suffer.
> We are proposing adding non-exclusive node labels:
> 1. Labeled apps get the preference on Labeled nodes 
> 2. If there is no ask for labeled resources we can assign those nodes to non 
> labeled apps
> 3. If there is any future ask for those resources , we will preempt the non 
> labeled apps and give them back to labeled apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2015-03-05 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2495:

Attachment: YARN-2495.20150305-1.patch

Hi [~wangda],
Attaching a patch with following changes :
* ??Name changes is->are?? 
* ??Make RegisterNodeManagerRequest consist wiht NodeHeartbeatRequest?? 
* ??Change decentralized-configuration.enabled to input, accept value of 
 (by default is centralized)??  
* ??you can leave an empty provider implementation?? : have taken care in the 
patch, please check the approach
* ??stop NM if ResourceTrackerService informs it as invalid labels??  
* In Analogus to the issue mentioned by Vinod (stop NM if 
ResourceTrackerService informs it as invalid labels), I feel even during 
registration RM should send out shutdown with proper message. I have this 
handled in the patch
* RM on receiving invalid Labels for a node, it will set the state of RMNode as 
{{NodeState.UNHEALTHY}} using a new event RMNodeInValidNodeLabelsUpdateEvent , 
please inform whether to create new State or NodeState.UNHEALTHY should suffice.

Script related issued will discuss/fix in script jira (YARN-2729).

> Allow admin specify labels from each NM (Distributed configuration)
> ---
>
> Key: YARN-2495
> URL: https://issues.apache.org/jira/browse/YARN-2495
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
> YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
> YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
> YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
> YARN-2495.20150305-1.patch, YARN-2495_20141022.1.patch
>
>
> Target of this JIRA is to allow admin specify labels in each NM, this covers
> - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
> using script suggested by [~aw] (YARN-2729) )
> - NM will send labels to RM via ResourceTracker API
> - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2981) DockerContainerExecutor must support a Cluster-wide default Docker image

2015-03-05 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349159#comment-14349159
 ] 

Ravi Prakash commented on YARN-2981:


I believe this is a good change. Could you please add a unit test. I'm a +1 on 
the change after that.

> DockerContainerExecutor must support a Cluster-wide default Docker image
> 
>
> Key: YARN-2981
> URL: https://issues.apache.org/jira/browse/YARN-2981
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Abin Shahab
>Assignee: Abin Shahab
> Attachments: YARN-2981.patch, YARN-2981.patch, YARN-2981.patch, 
> YARN-2981.patch
>
>
> This allows the yarn administrator to add a cluster-wide default docker image 
> that will be used when there are no per-job override of docker images. With 
> this features, it would be convenient for newer applications like slider to 
> launch inside a cluster-default docker container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3293) Track and display capacity scheduler health metrics in web UI

2015-03-05 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349068#comment-14349068
 ] 

Varun Vasudev commented on YARN-3293:
-

Some metrics to track - number of allocations, number of completed containers, 
number of node heartbeats, last container allocation time, last completed 
container time, last heartbeat time from any container

> Track and display capacity scheduler health metrics in web UI
> -
>
> Key: YARN-3293
> URL: https://issues.apache.org/jira/browse/YARN-3293
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>
> It would be good to display metrics that let users know about the health of 
> the capacity scheduler in the web UI. Today it is hard to get an idea if the 
> capacity scheduler is functioning correctly. Metrics such as the time for the 
> last allocation, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2190) Provide a Windows container executor that can limit memory and CPU

2015-03-05 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349038#comment-14349038
 ] 

Varun Vasudev commented on YARN-2190:
-

+1 for the latest patch.

> Provide a Windows container executor that can limit memory and CPU
> --
>
> Key: YARN-2190
> URL: https://issues.apache.org/jira/browse/YARN-2190
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Chuan Liu
>Assignee: Chuan Liu
> Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, 
> YARN-2190.10.patch, YARN-2190.11.patch, YARN-2190.2.patch, YARN-2190.3.patch, 
> YARN-2190.4.patch, YARN-2190.5.patch, YARN-2190.6.patch, YARN-2190.7.patch, 
> YARN-2190.8.patch, YARN-2190.9.patch
>
>
> Yarn default container executor on Windows does not set the resource limit on 
> the containers currently. The memory limit is enforced by a separate 
> monitoring thread. The container implementation on Windows uses Job Object 
> right now. The latest Windows (8 or later) API allows CPU and memory limits 
> on the job objects. We want to create a Windows container executor that sets 
> the limits on job objects thus provides resource enforcement at OS level.
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2616) Add CLI client to the registry to list, view and manipulate entries

2015-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348985#comment-14348985
 ] 

Hudson commented on YARN-2616:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7268 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7268/])
Update CHANGES.txt for YARN-2616 to fix indentation. (ozawa: rev 
28b85a2116c3061fcb739aaca0dff89ff2a925e4)
* hadoop-yarn-project/CHANGES.txt


> Add CLI client to the registry to list, view and manipulate entries
> ---
>
> Key: YARN-2616
> URL: https://issues.apache.org/jira/browse/YARN-2616
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Akshay Radia
> Fix For: 2.7.0
>
> Attachments: YARN-2616-003.patch, YARN-2616-008.patch, 
> YARN-2616-008.patch, yarn-2616-v1.patch, yarn-2616-v2.patch, 
> yarn-2616-v4.patch, yarn-2616-v5.patch, yarn-2616-v6.patch, yarn-2616-v7.patch
>
>
> registry needs a CLI interface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3122) Metrics for container's actual CPU usage

2015-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348964#comment-14348964
 ] 

Hudson commented on YARN-3122:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2073 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2073/])
YARN-3122. Metrics for container's actual CPU usage. (Anubhav Dhoot via kasha) 
(kasha: rev 53947f37c7a84a84ef4ab1a3cab63ff27c078385)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestProcfsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLinuxResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/LinuxResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorProcessTree.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestResourceCalculatorProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/NodeManagerHardwareUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ProcfsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainerMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/CpuTimeTracker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/WindowsBasedProcessTree.java


> Metrics for container's actual CPU usage
> 
>
> Key: YARN-3122
> URL: https://issues.apache.org/jira/browse/YARN-3122
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Fix For: 2.7.0
>
> Attachments: YARN-3122.001.patch, YARN-3122.002.patch, 
> YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.005.patch, 
> YARN-3122.006.patch, YARN-3122.007.patch, YARN-3122.prelim.patch, 
> YARN-3122.prelim.patch
>
>
> It would be nice to capture resource usage per container, for a variety of 
> reasons. This JIRA is to track CPU usage. 
> YARN-2965 tracks the resource usage on the node, and the two implementations 
> should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration

2015-03-05 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348975#comment-14348975
 ] 

Sunil G commented on YARN-3136:
---

Errors seems unrelated.

> getTransferredContainers can be a bottleneck during AM registration
> ---
>
> Key: YARN-3136
> URL: https://issues.apache.org/jira/browse/YARN-3136
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 
> 0003-YARN-3136.patch, 0004-YARN-3136.patch
>
>
> While examining RM stack traces on a busy cluster I noticed a pattern of AMs 
> stuck waiting for the scheduler lock trying to call getTransferredContainers. 
>  The scheduler lock is highly contended, especially on a large cluster with 
> many nodes heartbeating, and it would be nice if we could find a way to 
> eliminate the need to grab this lock during this call.  We've already done 
> similar work during AM allocate calls to make sure they don't needlessly grab 
> the scheduler lock, and it would be good to do so here as well, if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication

2015-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348963#comment-14348963
 ] 

Hudson commented on YARN-3131:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2073 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2073/])
YARN-3131. YarnClientImpl should check FAILED and KILLED state in 
submitApplication. Contributed by Chang Li (jlowe: rev 
03cc22945e5d4e953c06a313b8158389554a6aa7)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java


> YarnClientImpl should check FAILED and KILLED state in submitApplication
> 
>
> Key: YARN-3131
> URL: https://issues.apache.org/jira/browse/YARN-3131
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Fix For: 2.7.0
>
> Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch, 
> yarn_3131_v3.patch, yarn_3131_v4.patch, yarn_3131_v5.patch, 
> yarn_3131_v6.patch, yarn_3131_v7.patch
>
>
> Just run into a issue when submit a job into a non-existent queue and 
> YarnClient raise no exception. Though that job indeed get submitted 
> successfully and just failed immediately after, it will be better if 
> YarnClient can handle the immediate fail situation like YarnRunner does



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3242) Asynchrony in ZK-close can lead to ZKRMStateStore watcher receiving events for old client

2015-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348953#comment-14348953
 ] 

Hudson commented on YARN-3242:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2073 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2073/])
YARN-3242. Asynchrony in ZK-close can lead to ZKRMStateStore watcher receiving 
events for old client. (Zhihai Xu via kasha) (kasha: rev 
8d88691d162f87f95c9ed7e0a569ef08e8385d4f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStoreZKClientConnections.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ClientBaseWithFixes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java


> Asynchrony in ZK-close can lead to ZKRMStateStore watcher receiving events 
> for old client
> -
>
> Key: YARN-3242
> URL: https://issues.apache.org/jira/browse/YARN-3242
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: YARN-3242.000.patch, YARN-3242.001.patch, 
> YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch
>
>
> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> The watcher event from old ZK client session can still be sent to 
> ZKRMStateStore after the old  ZK client session is closed.
> This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper 
> session.
> We only have one ZKRMStateStore but we can have multiple ZK client sessions.
> Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher 
> event is from current session. So the watcher event from old ZK client 
> session which just is closed will still be processed.
> For example, If a Disconnected event received from old session after new 
> session is connected, the zkClient will be set to null
> {code}
> case Disconnected:
>   LOG.info("ZKRMStateStore Session disconnected");
>   oldZkClient = zkClient;
>   zkClient = null;
>   break;
> {code}
> Then ZKRMStateStore won't receive SyncConnected event from new session 
> because new session is already in SyncConnected state and it won't send 
> SyncConnected event until it is disconnected and connected again.
> Then we will see all the ZKRMStateStore operations fail with IOException 
> "Wait for ZKClient creation timed out" until  RM shutdown.
> The following code from zookeeper(ClientCnxn#EventThread) show even after 
> receive eventOfDeath, EventThread will still process all the events until  
> waitingEvents queue is empty.
> {code}
>   while (true) {
>  Object event = waitingEvents.take();
>  if (event == eventOfDeath) {
> wasKilled = true;
>  } else {
> processEvent(event);
>  }
>  if (wasKilled)
> synchronized (waitingEvents) {
>if (waitingEvents.isEmpty()) {
>   isRunning = false;
>   break;
>}
> }
>   }
>   private void processEvent(Object event) {
>   try {
>   if (event instanceof WatcherSetEventPair) {
>   // each watcher will process the event
>   WatcherSetEventPair pair = (WatcherSetEventPair) event;
>   for (Watcher watcher : pair.watchers) {
>   try {
>   watcher.process(pair.event);
>   } catch (Throwable t) {
>   LOG.error("Error while calling watcher ", t);
>   }
>   }
>   } else {
> public void disconnect() {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Disconnecting client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> sendThread.close();
> eventThread.queueEventOfDeath();
> }
> public void close() throws IOException {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Closing client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> try {
> RequestHeader h = new RequestHeader();
> 

[jira] [Commented] (YARN-3249) Add a "kill application" button to Resource Manager's Web UI

2015-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348958#comment-14348958
 ] 

Hudson commented on YARN-3249:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2073 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2073/])
YARN-3249. Add a 'kill application' button to Resource Manager's Web UI. 
Contributed by Ryu Kobayashi. (ozawa: rev 
1b672096121fef775572b517d4f5721997abbac6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java
* hadoop-yarn-project/CHANGES.txt


> Add a "kill application" button to Resource Manager's Web UI
> 
>
> Key: YARN-3249
> URL: https://issues.apache.org/jira/browse/YARN-3249
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.3.patch, 
> YARN-3249.4.patch, YARN-3249.5.patch, YARN-3249.6.patch, YARN-3249.patch, 
> killapp-failed.log, killapp-failed2.log, screenshot.png, screenshot2.png
>
>
> It want to kill the application on the JobTracker similarly Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3231) FairScheduler: Changing queueMaxRunningApps interferes with pending jobs

2015-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348965#comment-14348965
 ] 

Hudson commented on YARN-3231:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2073 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2073/])
YARN-3231. FairScheduler: Changing queueMaxRunningApps interferes with pending 
jobs. (Siqi Li via kasha) (kasha: rev 22426a1c9f4bd616558089b6862fd34ab42d19a7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/MaxRunningAppsEnforcer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* hadoop-yarn-project/CHANGES.txt


> FairScheduler: Changing queueMaxRunningApps interferes with pending jobs
> 
>
> Key: YARN-3231
> URL: https://issues.apache.org/jira/browse/YARN-3231
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>Assignee: Siqi Li
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch, 
> YARN-3231.v3.patch, YARN-3231.v4.patch
>
>
> When a queue is piling up with a lot of pending jobs due to the 
> maxRunningApps limit. We want to increase this property on the fly to make 
> some of the pending job active. However, once we increase the limit, all 
> pending jobs were not assigned any resource, and were stuck forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2003) Support to process Job priority from Submission Context in AppAttemptAddedSchedulerEvent [RM side]

2015-03-05 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2003:
--
Attachment: 0004-YARN-2003.patch

Re basing as per changes priority manager class.

> Support to process Job priority from Submission Context in 
> AppAttemptAddedSchedulerEvent [RM side]
> --
>
> Key: YARN-2003
> URL: https://issues.apache.org/jira/browse/YARN-2003
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-2003.patch, 0002-YARN-2003.patch, 
> 0003-YARN-2003.patch, 0004-YARN-2003.patch
>
>
> AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from 
> Submission Context and store.
> Later this can be used by Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-03-05 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2004:
--
Attachment: 0004-YARN-2004.patch

Rebasing the patch

> Priority scheduling support in Capacity scheduler
> -
>
> Key: YARN-2004
> URL: https://issues.apache.org/jira/browse/YARN-2004
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 
> 0003-YARN-2004.patch, 0004-YARN-2004.patch
>
>
> Based on the priority of the application, Capacity Scheduler should be able 
> to give preference to application while doing scheduling.
> Comparator applicationComparator can be changed as below.   
> 
> 1.Check for Application priority. If priority is available, then return 
> the highest priority job.
> 2.Otherwise continue with existing logic such as App ID comparison and 
> then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication

2015-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1434#comment-1434
 ] 

Hudson commented on YARN-3131:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #123 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/123/])
YARN-3131. YarnClientImpl should check FAILED and KILLED state in 
submitApplication. Contributed by Chang Li (jlowe: rev 
03cc22945e5d4e953c06a313b8158389554a6aa7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* hadoop-yarn-project/CHANGES.txt


> YarnClientImpl should check FAILED and KILLED state in submitApplication
> 
>
> Key: YARN-3131
> URL: https://issues.apache.org/jira/browse/YARN-3131
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Fix For: 2.7.0
>
> Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch, 
> yarn_3131_v3.patch, yarn_3131_v4.patch, yarn_3131_v5.patch, 
> yarn_3131_v6.patch, yarn_3131_v7.patch
>
>
> Just run into a issue when submit a job into a non-existent queue and 
> YarnClient raise no exception. Though that job indeed get submitted 
> successfully and just failed immediately after, it will be better if 
> YarnClient can handle the immediate fail situation like YarnRunner does



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other "short-running' applications

2015-03-05 Thread Sumit Mohanty (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348892#comment-14348892
 ] 

Sumit Mohanty commented on YARN-3154:
-

[~xgong], do the long running applications such as HBase on YARN using Slider 
need to do anything to make sure that partial logs are uploaded?

> Should not upload partial logs for MR jobs or other "short-running' 
> applications 
> -
>
> Key: YARN-3154
> URL: https://issues.apache.org/jira/browse/YARN-3154
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-3154.1.patch, YARN-3154.2.patch, YARN-3154.3.patch
>
>
> Currently, if we are running a MR job, and we do not set the log interval 
> properly, we will have their partial logs uploaded and then removed from the 
> local filesystem which is not right.
> We only upload the partial logs for LRS applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3231) FairScheduler: Changing queueMaxRunningApps interferes with pending jobs

2015-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348890#comment-14348890
 ] 

Hudson commented on YARN-3231:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #123 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/123/])
YARN-3231. FairScheduler: Changing queueMaxRunningApps interferes with pending 
jobs. (Siqi Li via kasha) (kasha: rev 22426a1c9f4bd616558089b6862fd34ab42d19a7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/MaxRunningAppsEnforcer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* hadoop-yarn-project/CHANGES.txt


> FairScheduler: Changing queueMaxRunningApps interferes with pending jobs
> 
>
> Key: YARN-3231
> URL: https://issues.apache.org/jira/browse/YARN-3231
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>Assignee: Siqi Li
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch, 
> YARN-3231.v3.patch, YARN-3231.v4.patch
>
>
> When a queue is piling up with a lot of pending jobs due to the 
> maxRunningApps limit. We want to increase this property on the fly to make 
> some of the pending job active. However, once we increase the limit, all 
> pending jobs were not assigned any resource, and were stuck forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3249) Add a "kill application" button to Resource Manager's Web UI

2015-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348883#comment-14348883
 ] 

Hudson commented on YARN-3249:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #123 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/123/])
YARN-3249. Add a 'kill application' button to Resource Manager's Web UI. 
Contributed by Ryu Kobayashi. (ozawa: rev 
1b672096121fef775572b517d4f5721997abbac6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java
* hadoop-yarn-project/CHANGES.txt


> Add a "kill application" button to Resource Manager's Web UI
> 
>
> Key: YARN-3249
> URL: https://issues.apache.org/jira/browse/YARN-3249
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.3.patch, 
> YARN-3249.4.patch, YARN-3249.5.patch, YARN-3249.6.patch, YARN-3249.patch, 
> killapp-failed.log, killapp-failed2.log, screenshot.png, screenshot2.png
>
>
> It want to kill the application on the JobTracker similarly Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3242) Asynchrony in ZK-close can lead to ZKRMStateStore watcher receiving events for old client

2015-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348878#comment-14348878
 ] 

Hudson commented on YARN-3242:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #123 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/123/])
YARN-3242. Asynchrony in ZK-close can lead to ZKRMStateStore watcher receiving 
events for old client. (Zhihai Xu via kasha) (kasha: rev 
8d88691d162f87f95c9ed7e0a569ef08e8385d4f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStoreZKClientConnections.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ClientBaseWithFixes.java


> Asynchrony in ZK-close can lead to ZKRMStateStore watcher receiving events 
> for old client
> -
>
> Key: YARN-3242
> URL: https://issues.apache.org/jira/browse/YARN-3242
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: YARN-3242.000.patch, YARN-3242.001.patch, 
> YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch
>
>
> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> The watcher event from old ZK client session can still be sent to 
> ZKRMStateStore after the old  ZK client session is closed.
> This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper 
> session.
> We only have one ZKRMStateStore but we can have multiple ZK client sessions.
> Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher 
> event is from current session. So the watcher event from old ZK client 
> session which just is closed will still be processed.
> For example, If a Disconnected event received from old session after new 
> session is connected, the zkClient will be set to null
> {code}
> case Disconnected:
>   LOG.info("ZKRMStateStore Session disconnected");
>   oldZkClient = zkClient;
>   zkClient = null;
>   break;
> {code}
> Then ZKRMStateStore won't receive SyncConnected event from new session 
> because new session is already in SyncConnected state and it won't send 
> SyncConnected event until it is disconnected and connected again.
> Then we will see all the ZKRMStateStore operations fail with IOException 
> "Wait for ZKClient creation timed out" until  RM shutdown.
> The following code from zookeeper(ClientCnxn#EventThread) show even after 
> receive eventOfDeath, EventThread will still process all the events until  
> waitingEvents queue is empty.
> {code}
>   while (true) {
>  Object event = waitingEvents.take();
>  if (event == eventOfDeath) {
> wasKilled = true;
>  } else {
> processEvent(event);
>  }
>  if (wasKilled)
> synchronized (waitingEvents) {
>if (waitingEvents.isEmpty()) {
>   isRunning = false;
>   break;
>}
> }
>   }
>   private void processEvent(Object event) {
>   try {
>   if (event instanceof WatcherSetEventPair) {
>   // each watcher will process the event
>   WatcherSetEventPair pair = (WatcherSetEventPair) event;
>   for (Watcher watcher : pair.watchers) {
>   try {
>   watcher.process(pair.event);
>   } catch (Throwable t) {
>   LOG.error("Error while calling watcher ", t);
>   }
>   }
>   } else {
> public void disconnect() {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Disconnecting client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> sendThread.close();
> eventThread.queueEventOfDeath();
> }
> public void close() throws IOException {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Closing client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> try {
> RequestHeader h = new RequestHeader();

[jira] [Commented] (YARN-3122) Metrics for container's actual CPU usage

2015-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348889#comment-14348889
 ] 

Hudson commented on YARN-3122:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #123 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/123/])
YARN-3122. Metrics for container's actual CPU usage. (Anubhav Dhoot via kasha) 
(kasha: rev 53947f37c7a84a84ef4ab1a3cab63ff27c078385)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/NodeManagerHardwareUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestResourceCalculatorProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestProcfsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ProcfsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/WindowsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/CpuTimeTracker.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLinuxResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainerMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/LinuxResourceCalculatorPlugin.java


> Metrics for container's actual CPU usage
> 
>
> Key: YARN-3122
> URL: https://issues.apache.org/jira/browse/YARN-3122
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Fix For: 2.7.0
>
> Attachments: YARN-3122.001.patch, YARN-3122.002.patch, 
> YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.005.patch, 
> YARN-3122.006.patch, YARN-3122.007.patch, YARN-3122.prelim.patch, 
> YARN-3122.prelim.patch
>
>
> It would be nice to capture resource usage per container, for a variety of 
> reasons. This JIRA is to track CPU usage. 
> YARN-2965 tracks the resource usage on the node, and the two implementations 
> should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records

2015-03-05 Thread Chang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348877#comment-14348877
 ] 

Chang Li commented on YARN-3267:


[~pramachandran][~zjshen] Could you help review my patach. As for that -1 in 
test I believe it's unrelated to my changes.

> Timelineserver applies the ACL rules after applying the limit on the number 
> of records
> --
>
> Key: YARN-3267
> URL: https://issues.apache.org/jira/browse/YARN-3267
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Prakash Ramachandran
>Assignee: Chang Li
> Attachments: YARN_3267_V1.patch, YARN_3267_V2.patch, 
> YARN_3267_WIP.patch, YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, 
> YARN_3267_WIP3.patch
>
>
> While fetching the entities from timelineserver, the limit is applied on the 
> entities to be fetched from leveldb, the ACL filters are applied after this 
> (TimelineDataManager.java::getEntities). 
> this could mean that even if there are entities available which match the 
> query criteria, we could end up not getting any results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3214) Add non-exclusive node labels

2015-03-05 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348839#comment-14348839
 ] 

Naganarasimha G R commented on YARN-3214:
-

Hi [~wangda], 
I had a query (not sure whether this is the jira i need to discuss this though)
IIUC when Labels Requirement started we were trying to cater to 2 kinds of 
requirements
# Similar to the current jira, label the nodes and try to partition the cluster 
and ensure few queues/users get particular partition of nodes with high 
priority.(Multi Tenant scenario)
# Tagging the node with particular labels (like high MEM Nodes, More CPU cores, 
has more or particular kind of GPU's,  has particular library version , java 
version etc...) and trying to launch apps based on these tags.

Currently it seems like we are only focusing on first kind, and almost not 
supporting second one at all (we are not even accepting more than a label for a 
node), So i was thinking like we can support 2 kinds of labels; First kind of 
Labels which we will be able to support capacity  and second kind of label for 
tagging. And any given node can have at most one label of the first kind (one 
on which capacity can be specified ) and multiple tag kind of labels. App can 
specify label expression on tag kind of labels.
Correct me if my understanding is wrong or if there can be still limitations 
with the above said approach too.


> Add non-exclusive node labels 
> --
>
> Key: YARN-3214
> URL: https://issues.apache.org/jira/browse/YARN-3214
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: Non-exclusive-Node-Partition-Design.pdf
>
>
> Currently node labels partition the cluster to some sub-clusters so resources 
> cannot be shared between partitioned cluster. 
> With the current implementation of node labels we cannot use the cluster 
> optimally and the throughput of the cluster will suffer.
> We are proposing adding non-exclusive node labels:
> 1. Labeled apps get the preference on Labeled nodes 
> 2. If there is no ask for labeled resources we can assign those nodes to non 
> labeled apps
> 3. If there is any future ask for those resources , we will preempt the non 
> labeled apps and give them back to labeled apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication

2015-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348821#comment-14348821
 ] 

Hudson commented on YARN-3131:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #114 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/114/])
YARN-3131. YarnClientImpl should check FAILED and KILLED state in 
submitApplication. Contributed by Chang Li (jlowe: rev 
03cc22945e5d4e953c06a313b8158389554a6aa7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* hadoop-yarn-project/CHANGES.txt


> YarnClientImpl should check FAILED and KILLED state in submitApplication
> 
>
> Key: YARN-3131
> URL: https://issues.apache.org/jira/browse/YARN-3131
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Fix For: 2.7.0
>
> Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch, 
> yarn_3131_v3.patch, yarn_3131_v4.patch, yarn_3131_v5.patch, 
> yarn_3131_v6.patch, yarn_3131_v7.patch
>
>
> Just run into a issue when submit a job into a non-existent queue and 
> YarnClient raise no exception. Though that job indeed get submitted 
> successfully and just failed immediately after, it will be better if 
> YarnClient can handle the immediate fail situation like YarnRunner does



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3231) FairScheduler: Changing queueMaxRunningApps interferes with pending jobs

2015-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348823#comment-14348823
 ] 

Hudson commented on YARN-3231:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #114 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/114/])
YARN-3231. FairScheduler: Changing queueMaxRunningApps interferes with pending 
jobs. (Siqi Li via kasha) (kasha: rev 22426a1c9f4bd616558089b6862fd34ab42d19a7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/MaxRunningAppsEnforcer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


> FairScheduler: Changing queueMaxRunningApps interferes with pending jobs
> 
>
> Key: YARN-3231
> URL: https://issues.apache.org/jira/browse/YARN-3231
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>Assignee: Siqi Li
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch, 
> YARN-3231.v3.patch, YARN-3231.v4.patch
>
>
> When a queue is piling up with a lot of pending jobs due to the 
> maxRunningApps limit. We want to increase this property on the fly to make 
> some of the pending job active. However, once we increase the limit, all 
> pending jobs were not assigned any resource, and were stuck forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3242) Asynchrony in ZK-close can lead to ZKRMStateStore watcher receiving events for old client

2015-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348811#comment-14348811
 ] 

Hudson commented on YARN-3242:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #114 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/114/])
YARN-3242. Asynchrony in ZK-close can lead to ZKRMStateStore watcher receiving 
events for old client. (Zhihai Xu via kasha) (kasha: rev 
8d88691d162f87f95c9ed7e0a569ef08e8385d4f)
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ClientBaseWithFixes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStoreZKClientConnections.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java


> Asynchrony in ZK-close can lead to ZKRMStateStore watcher receiving events 
> for old client
> -
>
> Key: YARN-3242
> URL: https://issues.apache.org/jira/browse/YARN-3242
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: YARN-3242.000.patch, YARN-3242.001.patch, 
> YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch
>
>
> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> The watcher event from old ZK client session can still be sent to 
> ZKRMStateStore after the old  ZK client session is closed.
> This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper 
> session.
> We only have one ZKRMStateStore but we can have multiple ZK client sessions.
> Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher 
> event is from current session. So the watcher event from old ZK client 
> session which just is closed will still be processed.
> For example, If a Disconnected event received from old session after new 
> session is connected, the zkClient will be set to null
> {code}
> case Disconnected:
>   LOG.info("ZKRMStateStore Session disconnected");
>   oldZkClient = zkClient;
>   zkClient = null;
>   break;
> {code}
> Then ZKRMStateStore won't receive SyncConnected event from new session 
> because new session is already in SyncConnected state and it won't send 
> SyncConnected event until it is disconnected and connected again.
> Then we will see all the ZKRMStateStore operations fail with IOException 
> "Wait for ZKClient creation timed out" until  RM shutdown.
> The following code from zookeeper(ClientCnxn#EventThread) show even after 
> receive eventOfDeath, EventThread will still process all the events until  
> waitingEvents queue is empty.
> {code}
>   while (true) {
>  Object event = waitingEvents.take();
>  if (event == eventOfDeath) {
> wasKilled = true;
>  } else {
> processEvent(event);
>  }
>  if (wasKilled)
> synchronized (waitingEvents) {
>if (waitingEvents.isEmpty()) {
>   isRunning = false;
>   break;
>}
> }
>   }
>   private void processEvent(Object event) {
>   try {
>   if (event instanceof WatcherSetEventPair) {
>   // each watcher will process the event
>   WatcherSetEventPair pair = (WatcherSetEventPair) event;
>   for (Watcher watcher : pair.watchers) {
>   try {
>   watcher.process(pair.event);
>   } catch (Throwable t) {
>   LOG.error("Error while calling watcher ", t);
>   }
>   }
>   } else {
> public void disconnect() {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Disconnecting client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> sendThread.close();
> eventThread.queueEventOfDeath();
> }
> public void close() throws IOException {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Closing client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> try {
> RequestHeader h = new RequestHeader();
> 

[jira] [Commented] (YARN-3249) Add a "kill application" button to Resource Manager's Web UI

2015-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348816#comment-14348816
 ] 

Hudson commented on YARN-3249:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #114 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/114/])
YARN-3249. Add a 'kill application' button to Resource Manager's Web UI. 
Contributed by Ryu Kobayashi. (ozawa: rev 
1b672096121fef775572b517d4f5721997abbac6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


> Add a "kill application" button to Resource Manager's Web UI
> 
>
> Key: YARN-3249
> URL: https://issues.apache.org/jira/browse/YARN-3249
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.3.patch, 
> YARN-3249.4.patch, YARN-3249.5.patch, YARN-3249.6.patch, YARN-3249.patch, 
> killapp-failed.log, killapp-failed2.log, screenshot.png, screenshot2.png
>
>
> It want to kill the application on the JobTracker similarly Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3122) Metrics for container's actual CPU usage

2015-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348822#comment-14348822
 ] 

Hudson commented on YARN-3122:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #114 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/114/])
YARN-3122. Metrics for container's actual CPU usage. (Anubhav Dhoot via kasha) 
(kasha: rev 53947f37c7a84a84ef4ab1a3cab63ff27c078385)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ProcfsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/NodeManagerHardwareUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainerMetrics.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestResourceCalculatorProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/LinuxResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/WindowsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLinuxResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/CpuTimeTracker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestProcfsBasedProcessTree.java


> Metrics for container's actual CPU usage
> 
>
> Key: YARN-3122
> URL: https://issues.apache.org/jira/browse/YARN-3122
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Fix For: 2.7.0
>
> Attachments: YARN-3122.001.patch, YARN-3122.002.patch, 
> YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.005.patch, 
> YARN-3122.006.patch, YARN-3122.007.patch, YARN-3122.prelim.patch, 
> YARN-3122.prelim.patch
>
>
> It would be nice to capture resource usage per container, for a variety of 
> reasons. This JIRA is to track CPU usage. 
> YARN-2965 tracks the resource usage on the node, and the two implementations 
> should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3242) Asynchrony in ZK-close can lead to ZKRMStateStore watcher receiving events for old client

2015-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348778#comment-14348778
 ] 

Hudson commented on YARN-3242:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2055 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2055/])
YARN-3242. Asynchrony in ZK-close can lead to ZKRMStateStore watcher receiving 
events for old client. (Zhihai Xu via kasha) (kasha: rev 
8d88691d162f87f95c9ed7e0a569ef08e8385d4f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ClientBaseWithFixes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStoreZKClientConnections.java
* hadoop-yarn-project/CHANGES.txt


> Asynchrony in ZK-close can lead to ZKRMStateStore watcher receiving events 
> for old client
> -
>
> Key: YARN-3242
> URL: https://issues.apache.org/jira/browse/YARN-3242
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: YARN-3242.000.patch, YARN-3242.001.patch, 
> YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch
>
>
> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> The watcher event from old ZK client session can still be sent to 
> ZKRMStateStore after the old  ZK client session is closed.
> This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper 
> session.
> We only have one ZKRMStateStore but we can have multiple ZK client sessions.
> Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher 
> event is from current session. So the watcher event from old ZK client 
> session which just is closed will still be processed.
> For example, If a Disconnected event received from old session after new 
> session is connected, the zkClient will be set to null
> {code}
> case Disconnected:
>   LOG.info("ZKRMStateStore Session disconnected");
>   oldZkClient = zkClient;
>   zkClient = null;
>   break;
> {code}
> Then ZKRMStateStore won't receive SyncConnected event from new session 
> because new session is already in SyncConnected state and it won't send 
> SyncConnected event until it is disconnected and connected again.
> Then we will see all the ZKRMStateStore operations fail with IOException 
> "Wait for ZKClient creation timed out" until  RM shutdown.
> The following code from zookeeper(ClientCnxn#EventThread) show even after 
> receive eventOfDeath, EventThread will still process all the events until  
> waitingEvents queue is empty.
> {code}
>   while (true) {
>  Object event = waitingEvents.take();
>  if (event == eventOfDeath) {
> wasKilled = true;
>  } else {
> processEvent(event);
>  }
>  if (wasKilled)
> synchronized (waitingEvents) {
>if (waitingEvents.isEmpty()) {
>   isRunning = false;
>   break;
>}
> }
>   }
>   private void processEvent(Object event) {
>   try {
>   if (event instanceof WatcherSetEventPair) {
>   // each watcher will process the event
>   WatcherSetEventPair pair = (WatcherSetEventPair) event;
>   for (Watcher watcher : pair.watchers) {
>   try {
>   watcher.process(pair.event);
>   } catch (Throwable t) {
>   LOG.error("Error while calling watcher ", t);
>   }
>   }
>   } else {
> public void disconnect() {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Disconnecting client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> sendThread.close();
> eventThread.queueEventOfDeath();
> }
> public void close() throws IOException {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Closing client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> try {
> RequestHeader h = new RequestHeader();
> h.setT

[jira] [Commented] (YARN-3249) Add a "kill application" button to Resource Manager's Web UI

2015-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348784#comment-14348784
 ] 

Hudson commented on YARN-3249:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2055 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2055/])
YARN-3249. Add a 'kill application' button to Resource Manager's Web UI. 
Contributed by Ryu Kobayashi. (ozawa: rev 
1b672096121fef775572b517d4f5721997abbac6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


> Add a "kill application" button to Resource Manager's Web UI
> 
>
> Key: YARN-3249
> URL: https://issues.apache.org/jira/browse/YARN-3249
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.3.patch, 
> YARN-3249.4.patch, YARN-3249.5.patch, YARN-3249.6.patch, YARN-3249.patch, 
> killapp-failed.log, killapp-failed2.log, screenshot.png, screenshot2.png
>
>
> It want to kill the application on the JobTracker similarly Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3122) Metrics for container's actual CPU usage

2015-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348790#comment-14348790
 ] 

Hudson commented on YARN-3122:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2055 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2055/])
YARN-3122. Metrics for container's actual CPU usage. (Anubhav Dhoot via kasha) 
(kasha: rev 53947f37c7a84a84ef4ab1a3cab63ff27c078385)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/LinuxResourceCalculatorPlugin.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ProcfsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/NodeManagerHardwareUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/CpuTimeTracker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestResourceCalculatorProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainerMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/WindowsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestProcfsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLinuxResourceCalculatorPlugin.java


> Metrics for container's actual CPU usage
> 
>
> Key: YARN-3122
> URL: https://issues.apache.org/jira/browse/YARN-3122
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Fix For: 2.7.0
>
> Attachments: YARN-3122.001.patch, YARN-3122.002.patch, 
> YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.005.patch, 
> YARN-3122.006.patch, YARN-3122.007.patch, YARN-3122.prelim.patch, 
> YARN-3122.prelim.patch
>
>
> It would be nice to capture resource usage per container, for a variety of 
> reasons. This JIRA is to track CPU usage. 
> YARN-2965 tracks the resource usage on the node, and the two implementations 
> should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication

2015-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348789#comment-14348789
 ] 

Hudson commented on YARN-3131:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2055 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2055/])
YARN-3131. YarnClientImpl should check FAILED and KILLED state in 
submitApplication. Contributed by Chang Li (jlowe: rev 
03cc22945e5d4e953c06a313b8158389554a6aa7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java


> YarnClientImpl should check FAILED and KILLED state in submitApplication
> 
>
> Key: YARN-3131
> URL: https://issues.apache.org/jira/browse/YARN-3131
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Fix For: 2.7.0
>
> Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch, 
> yarn_3131_v3.patch, yarn_3131_v4.patch, yarn_3131_v5.patch, 
> yarn_3131_v6.patch, yarn_3131_v7.patch
>
>
> Just run into a issue when submit a job into a non-existent queue and 
> YarnClient raise no exception. Though that job indeed get submitted 
> successfully and just failed immediately after, it will be better if 
> YarnClient can handle the immediate fail situation like YarnRunner does



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >