date:20140717

Tsuyoshi OZAWA created YARN-2304:


 Summary: TestRMWebServices* fails intermittently
 Key: YARN-2304
 URL: https://issues.apache.org/jira/browse/YARN-2304
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Tsuyoshi OZAWA
 Attachments: test-failure-log-RMWeb.txt

The test fails intermittently because of bind exception.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2304) TestRMWebServices* fails intermittently


 [ 
https://issues.apache.org/jira/browse/YARN-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2304:
-

Attachment: test-failure-log-RMWeb.txt

 TestRMWebServices* fails intermittently
 ---

 Key: YARN-2304
 URL: https://issues.apache.org/jira/browse/YARN-2304
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Tsuyoshi OZAWA
 Attachments: test-failure-log-RMWeb.txt


 The test fails intermittently because of bind exception.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1050) Document the Fair Scheduler REST API

2014-07-17 Thread Kenji Kikushima (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenji Kikushima updated YARN-1050:
--

Attachment: YARN-1050-3.patch

Updated for removing whitespaces, but missing '[' is not changed yet.
This JSON response example is generated automatically, like wget -O - 
http://RM:8088/ws/v1/cluster/scheduler | python -m json.tool. I think 
missing '[' is a bug (or spec?), so we should discuss on other JIRA ticket. How 
about it, [~ajisakaa]?

 Document the Fair Scheduler REST API
 

 Key: YARN-1050
 URL: https://issues.apache.org/jira/browse/YARN-1050
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Sandy Ryza
Assignee: Kenji Kikushima
 Attachments: YARN-1050-2.patch, YARN-1050-3.patch, YARN-1050.patch


 The documentation should be placed here along with the Capacity Scheduler 
 documentation: 
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1050) Document the Fair Scheduler REST API


[ 
https://issues.apache.org/jira/browse/YARN-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064682#comment-14064682
 ] 

Hadoop QA commented on YARN-1050:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656239/YARN-1050-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4341//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4341//console

This message is automatically generated.

 Document the Fair Scheduler REST API
 

 Key: YARN-1050
 URL: https://issues.apache.org/jira/browse/YARN-1050
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Sandy Ryza
Assignee: Kenji Kikushima
 Attachments: YARN-1050-2.patch, YARN-1050-3.patch, YARN-1050.patch


 The documentation should be placed here along with the Capacity Scheduler 
 documentation: 
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2305) When a container is in reserved state then total cluster memory is displayed wrongly.

2014-07-17 Thread J.Andreina (JIRA)

J.Andreina created YARN-2305:


 Summary: When a container is in reserved state then total cluster 
memory is displayed wrongly.
 Key: YARN-2305
 URL: https://issues.apache.org/jira/browse/YARN-2305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: J.Andreina


ENV Details:
=  
 3 queues  :  a(50%),b(25%),c(25%) --- All max utilization is set to 
100
 2 Node cluster with total memory as 16GB

TestSteps:
=

  Execute following 3 jobs with different memory configurations for Map 
, reducer and AM task

  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=a 
-Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
-Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=2048 
/dir8 /preempt_85 (application_1405414066690_0023)

 ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=b 
-Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
-Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.reduce.memory.mb=2048 
/dir2 /preempt_86 (application_1405414066690_0025)
 
 ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=c 
-Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=1024 
-Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=1024 
/dir2 /preempt_62
Issue
=
  when 2GB memory is in reserved state  totoal memory is shown as 
15GB and used as 15GB  ( while total memory is 16GB)
 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2299) inconsistency at identifying node

[
https://issues.apache.org/jira/browse/YARN-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064711#comment-14064711
]

Hong Zhiguo commented on YARN-2299:
---

or take usage of existing config property
yarn.scheduler.include-port-in-node-name when differentiating nodes.

inconsistency at identifying node
-

Key: YARN-2299
URL: https://issues.apache.org/jira/browse/YARN-2299
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Critical

If port of yarn.nodemanager.address is not specified at NM, NM will choose
random port. If the NM is ungracefully dead(OOM kill, kill -9, or OS restart)
and then restarted within yarn.nm.liveness-monitor.expiry-interval-ms,
host:port1 and host:port2 will both be present in Active Nodes on WebUI
for a while, and after host:port1 expiration, we get host:port1 in Lost
Nodes and host:port2 in Active Nodes. If the NM is ungracefully dead
again, we get only host:port1 in Lost Nodes. host:port2 is neither in
Active Nodes nor in Lost Nodes.
Another case, two NM is running on same host(miniYarnCluster or other test
purpose), if both of them are lost, we get only one Lost Nodes in WebUI.
In both case, sum of Active Nodes and Lost Nodes is not the number of
nodes we expected.
The root cause is due to inconsistency at how we think two Nodes are
identical.
When we manager active nodes(RMContextImpl.nodes), we use NodeId which
contains port. Two nodes with same host but different port are thought to be
different node.
But when we manager inactive nodes(RMContextImpl.inactiveNodes), we use only
use host. Two nodes with same host but different port are thought to
identical.
To fix the inconsistency, we should differentiate below 2 cases and be
consistent for both of them:
- intentionally multiple NMs per host
- NM instances one after another on same host
Two possible solutions:
1) Introduce a boolean config like one-node-per-host(default as true),
and use host to differentiate nodes on RM if it's true.
2) Make it mandatory to have valid port in yarn.nodemanager.address config.
In this sutiation, NM instances one after another on same host will have
same NodeId, while intentionally multiple NMs per host will have different
NodeId.
Personally I prefer option 1 because it's easier for users.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-2305) When a container is in reserved state then total cluster memory is displayed wrongly.


 [ 
https://issues.apache.org/jira/browse/YARN-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo reassigned YARN-2305:
-

Assignee: Hong Zhiguo

 When a container is in reserved state then total cluster memory is displayed 
 wrongly.
 -

 Key: YARN-2305
 URL: https://issues.apache.org/jira/browse/YARN-2305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: J.Andreina
Assignee: Hong Zhiguo

 ENV Details:
 =  
  3 queues  :  a(50%),b(25%),c(25%) --- All max utilization is set to 
 100
  2 Node cluster with total memory as 16GB
 TestSteps:
 =
   Execute following 3 jobs with different memory configurations for 
 Map , reducer and AM task
   ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=a 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=2048 
 /dir8 /preempt_85 (application_1405414066690_0023)
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=b 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.reduce.memory.mb=2048 
 /dir2 /preempt_86 (application_1405414066690_0025)
  
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=c 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=1024 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=1024 
 /dir2 /preempt_62
 Issue
 =
   when 2GB memory is in reserved state  totoal memory is shown as 
 15GB and used as 15GB  ( while total memory is 16GB)
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2305) When a container is in reserved state then total cluster memory is displayed wrongly.


[ 
https://issues.apache.org/jira/browse/YARN-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064732#comment-14064732
 ] 

Hong Zhiguo commented on YARN-2305:
---

Are you using fair scheduler? If yes, then I thinks it's the same reason of 
YARN-2306.

 When a container is in reserved state then total cluster memory is displayed 
 wrongly.
 -

 Key: YARN-2305
 URL: https://issues.apache.org/jira/browse/YARN-2305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: J.Andreina

 ENV Details:
 =  
  3 queues  :  a(50%),b(25%),c(25%) --- All max utilization is set to 
 100
  2 Node cluster with total memory as 16GB
 TestSteps:
 =
   Execute following 3 jobs with different memory configurations for 
 Map , reducer and AM task
   ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=a 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=2048 
 /dir8 /preempt_85 (application_1405414066690_0023)
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=b 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.reduce.memory.mb=2048 
 /dir2 /preempt_86 (application_1405414066690_0025)
  
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=c 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=1024 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=1024 
 /dir2 /preempt_62
 Issue
 =
   when 2GB memory is in reserved state  totoal memory is shown as 
 15GB and used as 15GB  ( while total memory is 16GB)
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2305) When a container is in reserved state then total cluster memory is displayed wrongly.

2014-07-17 Thread J.Andreina (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.Andreina updated YARN-2305:
-

Attachment: Capture.jpg

 When a container is in reserved state then total cluster memory is displayed 
 wrongly.
 -

 Key: YARN-2305
 URL: https://issues.apache.org/jira/browse/YARN-2305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: J.Andreina
Assignee: Hong Zhiguo
 Attachments: Capture.jpg


 ENV Details:
 =  
  3 queues  :  a(50%),b(25%),c(25%) --- All max utilization is set to 
 100
  2 Node cluster with total memory as 16GB
 TestSteps:
 =
   Execute following 3 jobs with different memory configurations for 
 Map , reducer and AM task
   ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=a 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=2048 
 /dir8 /preempt_85 (application_1405414066690_0023)
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=b 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.reduce.memory.mb=2048 
 /dir2 /preempt_86 (application_1405414066690_0025)
  
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=c 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=1024 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=1024 
 /dir2 /preempt_62
 Issue
 =
   when 2GB memory is in reserved state  totoal memory is shown as 
 15GB and used as 15GB  ( while total memory is 16GB)
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2305) When a container is in reserved state then total cluster memory is displayed wrongly.

2014-07-17 Thread J.Andreina (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064735#comment-14064735
 ] 

J.Andreina commented on YARN-2305:
--

Iam using Capacity Scheduler

 When a container is in reserved state then total cluster memory is displayed 
 wrongly.
 -

 Key: YARN-2305
 URL: https://issues.apache.org/jira/browse/YARN-2305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: J.Andreina
Assignee: Hong Zhiguo
 Attachments: Capture.jpg


 ENV Details:
 =  
  3 queues  :  a(50%),b(25%),c(25%) --- All max utilization is set to 
 100
  2 Node cluster with total memory as 16GB
 TestSteps:
 =
   Execute following 3 jobs with different memory configurations for 
 Map , reducer and AM task
   ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=a 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=2048 
 /dir8 /preempt_85 (application_1405414066690_0023)
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=b 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.reduce.memory.mb=2048 
 /dir2 /preempt_86 (application_1405414066690_0025)
  
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=c 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=1024 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=1024 
 /dir2 /preempt_62
 Issue
 =
   when 2GB memory is in reserved state  totoal memory is shown as 
 15GB and used as 15GB  ( while total memory is 16GB)
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2305) When a container is in reserved state then total cluster memory is displayed wrongly.


[ 
https://issues.apache.org/jira/browse/YARN-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064736#comment-14064736
 ] 

Sunil G commented on YARN-2305:
---

No. Its a capacity scheduler. I could collect logs from Andreina for same. I 
will take over and analyze for Capacity Scheduler.

{noformat}
2014-07-15 16:56:50,720 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1405414066690_0023_01_000129 Container Transitioned from NEW to 
RESERVED
2014-07-15 16:56:50,720 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
Reserved container  application attempt=appattempt_1405414066690_0023_01 
resource=memory:2048, vCores:1 queue=a: capacity=0.5, absoluteCapacity=0.5, 
usedResources=memory:7168, vCores:4, usedCapacity=0.875, 
absoluteUsedCapacity=0.4375, numApps=1, numContainers=4 node=host: 
host-10-18-40-14:45026 #containers=4 available=1024 used=7168 
clusterResource=memory:16384, vCores:16
2014-07-15 16:56:50,720 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
Re-sorting assigned queue: root.a stats: a: capacity=0.5, absoluteCapacity=0.5, 
usedResources=memory:9216, vCores:5, usedCapacity=1.125, 
absoluteUsedCapacity=0.5625, numApps=1, numContainers=5
2014-07-15 16:56:50,720 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
assignedContainer queue=root usedCapacity=1.0625 absoluteUsedCapacity=1.0625 
used=memory:17408, vCores:10 cluster=memory:16384, vCores:16
{noformat}

 When a container is in reserved state then total cluster memory is displayed 
 wrongly.
 -

 Key: YARN-2305
 URL: https://issues.apache.org/jira/browse/YARN-2305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: J.Andreina
Assignee: Hong Zhiguo
 Attachments: Capture.jpg


 ENV Details:
 =  
  3 queues  :  a(50%),b(25%),c(25%) --- All max utilization is set to 
 100
  2 Node cluster with total memory as 16GB
 TestSteps:
 =
   Execute following 3 jobs with different memory configurations for 
 Map , reducer and AM task
   ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=a 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=2048 
 /dir8 /preempt_85 (application_1405414066690_0023)
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=b 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.reduce.memory.mb=2048 
 /dir2 /preempt_86 (application_1405414066690_0025)
  
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=c 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=1024 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=1024 
 /dir2 /preempt_62
 Issue
 =
   when 2GB memory is in reserved state  totoal memory is shown as 
 15GB and used as 15GB  ( while total memory is 16GB)
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2305) When a container is in reserved state then total cluster memory is displayed wrongly.


[ 
https://issues.apache.org/jira/browse/YARN-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064737#comment-14064737
 ] 

Sunil G commented on YARN-2305:
---

Hi [~zhiguohong], Could I take over the issue..

 When a container is in reserved state then total cluster memory is displayed 
 wrongly.
 -

 Key: YARN-2305
 URL: https://issues.apache.org/jira/browse/YARN-2305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: J.Andreina
Assignee: Hong Zhiguo
 Attachments: Capture.jpg


 ENV Details:
 =  
  3 queues  :  a(50%),b(25%),c(25%) --- All max utilization is set to 
 100
  2 Node cluster with total memory as 16GB
 TestSteps:
 =
   Execute following 3 jobs with different memory configurations for 
 Map , reducer and AM task
   ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=a 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=2048 
 /dir8 /preempt_85 (application_1405414066690_0023)
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=b 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.reduce.memory.mb=2048 
 /dir2 /preempt_86 (application_1405414066690_0025)
  
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=c 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=1024 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=1024 
 /dir2 /preempt_62
 Issue
 =
   when 2GB memory is in reserved state  totoal memory is shown as 
 15GB and used as 15GB  ( while total memory is 16GB)
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2306) leak of reservation metrics (fair scheduler)


 [ 
https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-2306:
--

Summary: leak of reservation metrics (fair scheduler)  (was: leak of 
reservation metrics)

 leak of reservation metrics (fair scheduler)
 

 Key: YARN-2306
 URL: https://issues.apache.org/jira/browse/YARN-2306
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor

 This only applies to fair scheduler. Capacity scheduler is OK.
 When appAttempt or node is removed, the metrics for 
 reservation(reservedContainers, reservedMB, reservedVCores) is not reduced 
 back.
 These are important metrics for administrator. The wrong metrics confuses may 
 confuse them. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2305) When a container is in reserved state then total cluster memory is displayed wrongly.


[ 
https://issues.apache.org/jira/browse/YARN-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064738#comment-14064738
 ] 

Hong Zhiguo commented on YARN-2305:
---

OK

 When a container is in reserved state then total cluster memory is displayed 
 wrongly.
 -

 Key: YARN-2305
 URL: https://issues.apache.org/jira/browse/YARN-2305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: J.Andreina
Assignee: Hong Zhiguo
 Attachments: Capture.jpg


 ENV Details:
 =  
  3 queues  :  a(50%),b(25%),c(25%) --- All max utilization is set to 
 100
  2 Node cluster with total memory as 16GB
 TestSteps:
 =
   Execute following 3 jobs with different memory configurations for 
 Map , reducer and AM task
   ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=a 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=2048 
 /dir8 /preempt_85 (application_1405414066690_0023)
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=b 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.reduce.memory.mb=2048 
 /dir2 /preempt_86 (application_1405414066690_0025)
  
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=c 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=1024 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=1024 
 /dir2 /preempt_62
 Issue
 =
   when 2GB memory is in reserved state  totoal memory is shown as 
 15GB and used as 15GB  ( while total memory is 16GB)
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-2305) When a container is in reserved state then total cluster memory is displayed wrongly.


 [ 
https://issues.apache.org/jira/browse/YARN-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reassigned YARN-2305:
-

Assignee: Sunil G  (was: Hong Zhiguo)

 When a container is in reserved state then total cluster memory is displayed 
 wrongly.
 -

 Key: YARN-2305
 URL: https://issues.apache.org/jira/browse/YARN-2305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: J.Andreina
Assignee: Sunil G
 Attachments: Capture.jpg


 ENV Details:
 =  
  3 queues  :  a(50%),b(25%),c(25%) --- All max utilization is set to 
 100
  2 Node cluster with total memory as 16GB
 TestSteps:
 =
   Execute following 3 jobs with different memory configurations for 
 Map , reducer and AM task
   ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=a 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=2048 
 /dir8 /preempt_85 (application_1405414066690_0023)
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=b 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.reduce.memory.mb=2048 
 /dir2 /preempt_86 (application_1405414066690_0025)
  
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=c 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=1024 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=1024 
 /dir2 /preempt_62
 Issue
 =
   when 2GB memory is in reserved state  totoal memory is shown as 
 15GB and used as 15GB  ( while total memory is 16GB)
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2307) Capacity scheduler user only ADMINISTER_QUEUE also can submit app

tangjunjie created YARN-2307:


 Summary: Capacity scheduler user only ADMINISTER_QUEUE also can 
submit app 
 Key: YARN-2307
 URL: https://issues.apache.org/jira/browse/YARN-2307
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.3.0
 Environment: hadoop 2.3.0  centos6.5  jdk1.7
Reporter: tangjunjie
Priority: Minor


Queue acls for user :  root

Queue  Operations
=
root  
default  
china  ADMINISTER_QUEUE
unfunded 

user root only have ADMINISTER_QUEUE  but user root can sumbit app to
china queue



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2305) When a container is in reserved state then total cluster memory is displayed wrongly.

2014-07-17 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064749#comment-14064749
 ] 

Wangda Tan commented on YARN-2305:
--

Hi [~sunilg],
Thanks for taking this issue,
I think there're two issues in your screenshot,
1) Root queue usage above 100%
It is possible that queue used resource is larger than its guaranteed resource 
because of container reservation. We may need show reserved resource and used 
resource separately in our web UI. I encountered a similar problem in YARN-2285 
too.

2) Total cluster memory showing on web UI is different from 
CapacityScheduler.clusterResource
This seems a new issue to me, memory showing on web UI is 
usedMemory+availableMemory of root queue. I feel like 
CSQueueUtils.updateQueueStatistics has some issues when we reserve container in 
LeafQueue. Hope to get more thoughts in your side.

Thanks,
Wangda

 When a container is in reserved state then total cluster memory is displayed 
 wrongly.
 -

 Key: YARN-2305
 URL: https://issues.apache.org/jira/browse/YARN-2305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: J.Andreina
Assignee: Sunil G
 Attachments: Capture.jpg


 ENV Details:
 =  
  3 queues  :  a(50%),b(25%),c(25%) --- All max utilization is set to 
 100
  2 Node cluster with total memory as 16GB
 TestSteps:
 =
   Execute following 3 jobs with different memory configurations for 
 Map , reducer and AM task
   ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=a 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=2048 
 /dir8 /preempt_85 (application_1405414066690_0023)
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=b 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.reduce.memory.mb=2048 
 /dir2 /preempt_86 (application_1405414066690_0025)
  
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=c 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=1024 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=1024 
 /dir2 /preempt_62
 Issue
 =
   when 2GB memory is in reserved state  totoal memory is shown as 
 15GB and used as 15GB  ( while total memory is 16GB)
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2307) Capacity scheduler user only ADMINISTER_QUEUE also can submit app


[ 
https://issues.apache.org/jira/browse/YARN-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064760#comment-14064760
 ] 

Sunil G commented on YARN-2307:
---

By default, SUBMIT_APPLICATIONS at root queue level can take * as default [If 
not configured]. 
So any user job submission in child queue will pass. I think its a config issue 
(if so can mark this as invalid), if not pls share config details. 


 Capacity scheduler user only ADMINISTER_QUEUE also can submit app 
 --

 Key: YARN-2307
 URL: https://issues.apache.org/jira/browse/YARN-2307
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.3.0
 Environment: hadoop 2.3.0  centos6.5  jdk1.7
Reporter: tangjunjie
Priority: Minor

 Queue acls for user :  root
 Queue  Operations
 =
 root  
 default  
 china  ADMINISTER_QUEUE
 unfunded 
 user root only have ADMINISTER_QUEUE  but user root can sumbit app to
 china queue



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2306) leak of reservation metrics (fair scheduler)


 [ 
https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-2306:
--

Attachment: YARN-2306.patch

 leak of reservation metrics (fair scheduler)
 

 Key: YARN-2306
 URL: https://issues.apache.org/jira/browse/YARN-2306
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2306.patch


 This only applies to fair scheduler. Capacity scheduler is OK.
 When appAttempt or node is removed, the metrics for 
 reservation(reservedContainers, reservedMB, reservedVCores) is not reduced 
 back.
 These are important metrics for administrator. The wrong metrics confuses may 
 confuse them. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed

2014-07-17 Thread Wangda Tan (JIRA)

Wangda Tan created YARN-2308:


 Summary: NPE happened when RM restart after CapacityScheduler 
queue configuration changed 
 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan


I encountered a NPE when RM restart
{code}
2014-07-16 07:22:46,957 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type APP_ATTEMPT_ADDED to the scheduler
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:744)
{code}
And RM will be failed to restart.

This is caused by queue configuration changed, I removed some queues and added 
new queues. So when RM restarts, it tries to recover history applications, and 
when any of queues of these applications removed, NPE will be raised.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2264) Race in DrainDispatcher can cause random test failures


[ 
https://issues.apache.org/jira/browse/YARN-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064797#comment-14064797
 ] 

Hudson commented on YARN-2264:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #615 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/615/])
YARN-2264. Fixed a race condition in DrainDispatcher which may cause random 
test failures. Contributed by Li Lu (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611126)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java


 Race in DrainDispatcher can cause random test failures
 --

 Key: YARN-2264
 URL: https://issues.apache.org/jira/browse/YARN-2264
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siddharth Seth
Assignee: Li Lu
 Fix For: 2.6.0

 Attachments: YARN-2264-070814.patch


 This is what can happen.
 This is the potential race.
 DrainDispatcher is started via serviceStart() . As a last step, this starts 
 the actual dispatcher thread (eventHandlingThread.start() - and returns 
 immediately - which means the thread may or may not have started up by the 
 time start returns.
 Event sequence: 
 UserThread: calls dispatcher.getEventHandler().handle()
 This sets drained = false, and a context switch happens.
 DispatcherThread: starts running
 DispatcherThread drained = queue.isEmpty(); - This sets drained to true, 
 since Thread1 yielded before putting anything into the queue.
 UserThread: actual.handle(event) - which puts the event in the queue for the 
 dispatcher thread to process, and returns control.
 UserThread: dispatcher.await() - Since drained is true, this returns 
 immediately - even though there is a pending event to process.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2306) leak of reservation metrics (fair scheduler)


[ 
https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064813#comment-14064813
 ] 

Hadoop QA commented on YARN-2306:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656254/YARN-2306.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4342//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4342//console

This message is automatically generated.

 leak of reservation metrics (fair scheduler)
 

 Key: YARN-2306
 URL: https://issues.apache.org/jira/browse/YARN-2306
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2306.patch


 This only applies to fair scheduler. Capacity scheduler is OK.
 When appAttempt or node is removed, the metrics for 
 reservation(reservedContainers, reservedMB, reservedVCores) is not reduced 
 back.
 These are important metrics for administrator. The wrong metrics confuses may 
 confuse them. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2309) NPE during RM-Restart test scenario

2014-07-17 Thread Nishan Shetty (JIRA)

Nishan Shetty created YARN-2309:
---

 Summary: NPE during RM-Restart test scenario
 Key: YARN-2309
 URL: https://issues.apache.org/jira/browse/YARN-2309
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Nishan Shetty
Priority: Minor


During RMRestart test scenarios, we met with below exception. 
A point to note here is, Zookeeper also was not stable during this testing, we 
could see many Zookeeper exception before getting this NPE

{code}
2014-07-10 10:49:46,817 WARN org.apache.hadoop.service.AbstractService: When 
stopping the service 
org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService : 
java.lang.NullPointerException
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceStop(EmbeddedElectorService.java:108)
at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at 
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
at 
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:125)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:232)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1039)
{code}

Zookeeper Exception
{code}
2014-07-10 10:49:46,816 INFO org.apache.hadoop.service.AbstractService: Service 
org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService failed in 
state INITED; cause: 
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at 
org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.waitForZKConnectionEvent(ActiveStandbyElector.java:1046)
at 
org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.access$400(ActiveStandbyElector.java:1017)
at 
org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:632)
at 
org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:766)
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2045) Data persisted in NM should be versioned


[ 
https://issues.apache.org/jira/browse/YARN-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064849#comment-14064849
 ] 

Hadoop QA commented on YARN-2045:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656193/YARN-2045-v5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesContainers
  
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices
  
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesApps

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4343//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4343//console

This message is automatically generated.

 Data persisted in NM should be versioned
 

 Key: YARN-2045
 URL: https://issues.apache.org/jira/browse/YARN-2045
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2045-v2.patch, YARN-2045-v3.patch, 
 YARN-2045-v4.patch, YARN-2045-v5.patch, YARN-2045.patch


 As a split task from YARN-667, we want to add version info to NM related 
 data, include:
 - NodeManager local LevelDB state
 - NodeManager directory structure



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart


[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064870#comment-14064870
 ] 

Hadoop QA commented on YARN-1341:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656064/YARN-1341v7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesContainers
  
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices
  
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesApps

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4344//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4344//console

This message is automatically generated.

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch, 
 YARN-1341v7.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2301) Improve yarn container command

2014-07-17 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064887#comment-14064887
 ] 

Naganarasimha G R commented on YARN-2301:
-

Hi [~jianhe] 
Could i work on this issue ?

 Improve yarn container command
 --

 Key: YARN-2301
 URL: https://issues.apache.org/jira/browse/YARN-2301
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
  Labels: usability

 While running yarn container -list Application Attempt ID command, some 
 observations:
 1) the scheme (e.g. http/https  ) before LOG-URL is missing
 2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to 
 print as time format.
 3) finish-time is 0 if container is not yet finished. May be N/A
 4) May have an option to run as yarn container -list appId OR  yarn 
 application -list-containers appId also.  
 As attempt Id is not shown on console, this is easier for user to just copy 
 the appId and run it, may  also be useful for container-preserving AM 
 restart. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2219) AMs and NMs can get exceptions after recovery but before scheduler knowns apps and app-attempts


[ 
https://issues.apache.org/jira/browse/YARN-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064902#comment-14064902
 ] 

Hudson commented on YARN-2219:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1834 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1834/])
YARN-2219. Addendum patch for YARN-2219 (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611240)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
YARN-2219. Changed ResourceManager to avoid AMs and NMs getting exceptions 
after RM recovery but before scheduler learns about apps and app-attempts. 
Contributed by Jian He. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611222)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppAddedSchedulerEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppAttemptAddedSchedulerEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 AMs and NMs can get exceptions after recovery but before scheduler knowns 
 apps and app-attempts
 ---

 Key: YARN-2219
 URL: https://issues.apache.org/jira/browse/YARN-2219
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Ashwin Shankar
Assignee: Jian He
 Fix For: 2.6.0

 Attachments: YARN-2219-fix-compilation-failure.txt, 
 YARN-2219.1.patch, YARN-2219.2.patch


 {code}
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
 testAppReregisterOnRMWorkPreservingRestart[0](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart)
   Time elapsed: 4.335 sec   ERROR!
 java.lang.NullPointerException: null
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getTransferredContainers(AbstractYarnScheduler.java:91)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.registerApplicationMaster(ApplicationMasterService.java:297)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockAM$1.run(MockAM.java:113)
   at

[jira] [Commented] (YARN-2264) Race in DrainDispatcher can cause random test failures


[ 
https://issues.apache.org/jira/browse/YARN-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064904#comment-14064904
 ] 

Hudson commented on YARN-2264:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1834 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1834/])
YARN-2264. Fixed a race condition in DrainDispatcher which may cause random 
test failures. Contributed by Li Lu (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611126)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java


 Race in DrainDispatcher can cause random test failures
 --

 Key: YARN-2264
 URL: https://issues.apache.org/jira/browse/YARN-2264
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siddharth Seth
Assignee: Li Lu
 Fix For: 2.6.0

 Attachments: YARN-2264-070814.patch


 This is what can happen.
 This is the potential race.
 DrainDispatcher is started via serviceStart() . As a last step, this starts 
 the actual dispatcher thread (eventHandlingThread.start() - and returns 
 immediately - which means the thread may or may not have started up by the 
 time start returns.
 Event sequence: 
 UserThread: calls dispatcher.getEventHandler().handle()
 This sets drained = false, and a context switch happens.
 DispatcherThread: starts running
 DispatcherThread drained = queue.isEmpty(); - This sets drained to true, 
 since Thread1 yielded before putting anything into the queue.
 UserThread: actual.handle(event) - which puts the event in the queue for the 
 dispatcher thread to process, and returns control.
 UserThread: dispatcher.await() - Since drained is true, this returns 
 immediately - even though there is a pending event to process.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2264) Race in DrainDispatcher can cause random test failures


[ 
https://issues.apache.org/jira/browse/YARN-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064925#comment-14064925
 ] 

Hudson commented on YARN-2264:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1807 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1807/])
YARN-2264. Fixed a race condition in DrainDispatcher which may cause random 
test failures. Contributed by Li Lu (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611126)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java


 Race in DrainDispatcher can cause random test failures
 --

 Key: YARN-2264
 URL: https://issues.apache.org/jira/browse/YARN-2264
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siddharth Seth
Assignee: Li Lu
 Fix For: 2.6.0

 Attachments: YARN-2264-070814.patch


 This is what can happen.
 This is the potential race.
 DrainDispatcher is started via serviceStart() . As a last step, this starts 
 the actual dispatcher thread (eventHandlingThread.start() - and returns 
 immediately - which means the thread may or may not have started up by the 
 time start returns.
 Event sequence: 
 UserThread: calls dispatcher.getEventHandler().handle()
 This sets drained = false, and a context switch happens.
 DispatcherThread: starts running
 DispatcherThread drained = queue.isEmpty(); - This sets drained to true, 
 since Thread1 yielded before putting anything into the queue.
 UserThread: actual.handle(event) - which puts the event in the queue for the 
 dispatcher thread to process, and returns control.
 UserThread: dispatcher.await() - Since drained is true, this returns 
 immediately - even though there is a pending event to process.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2219) AMs and NMs can get exceptions after recovery but before scheduler knowns apps and app-attempts


[ 
https://issues.apache.org/jira/browse/YARN-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064923#comment-14064923
 ] 

Hudson commented on YARN-2219:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1807 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1807/])
YARN-2219. Addendum patch for YARN-2219 (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611240)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
YARN-2219. Changed ResourceManager to avoid AMs and NMs getting exceptions 
after RM recovery but before scheduler learns about apps and app-attempts. 
Contributed by Jian He. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611222)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppAddedSchedulerEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppAttemptAddedSchedulerEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 AMs and NMs can get exceptions after recovery but before scheduler knowns 
 apps and app-attempts
 ---

 Key: YARN-2219
 URL: https://issues.apache.org/jira/browse/YARN-2219
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Ashwin Shankar
Assignee: Jian He
 Fix For: 2.6.0

 Attachments: YARN-2219-fix-compilation-failure.txt, 
 YARN-2219.1.patch, YARN-2219.2.patch


 {code}
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
 testAppReregisterOnRMWorkPreservingRestart[0](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart)
   Time elapsed: 4.335 sec   ERROR!
 java.lang.NullPointerException: null
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getTransferredContainers(AbstractYarnScheduler.java:91)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.registerApplicationMaster(ApplicationMasterService.java:297)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockAM$1.run(MockAM.java:113)
   at

[jira] [Commented] (YARN-2247) Allow RM web services users to authenticate using delegation tokens

[
https://issues.apache.org/jira/browse/YARN-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065002#comment-14065002
]

Zhijie Shen commented on YARN-2247:
---

bq. The current implementation uses the standard http authentication for
hadoop. Users can set it to simple if they choose.

I was trying to make the point that when kerberos authentication is not used,
simple authentication is not implicitly set, isn't it? In this case, without
the authentication filter, we cannot identify the user via HTTP interface, such
that we cannot behave correctly for those operations that require the knowledge
of user information, such as submit/kill an application.

One step back, and let's look at the analog RPC interfaces. By default, the
authentication is SIMPLE, and at the server side, we can still identify who the
user is, such that the feature such as ACLs are is still working in the SIMPLE
case.

bq. For now I'd like to use the same configs as the standard hadoop http auth.
I'm open to changing them if we feel strongly about it in the future.

It's okay to keep the configuration same. Just think it out loudly. If so, you
may not want to add RM_WEBAPP_USE_YARN_AUTH_FILTER at all, and not load
YarnAuthenticationFilterInitializer programatically. The rationales behind them
are similar. Previously, I tried to add TimelineAuthenticationFilterInitializer
programmatically because I find the same http auth config applies to different
daemons, and I think it's annoying that at a single node cluster, I want to
config something only for timeline server, it will affect others. Afterwards, I
tried to make timeline server to use a set of configs with timeline-service
prefix. This is what we did for the RPC interface configurations.

bq. I didn't understand - can you explain further?

Let's take RMWebServices#getApp as an example. Previously we don't have (at
least don't know) the auth filter, such that we cannot detect the user.
Therefore, we don't check the ACLs, and simply get the application from
RMContext and return the user. Now, we have the auth filter, and we can
identify the user. Hence, it's possible for use to fix this API to only return
the application information to the user that has the access. This is also
another reason why I suggest to always have authentication filter on, whether
it is simple or kerberos.

bq. Am I looking at the wrong file?

This is the right file, but I'm afraid it is not the correct logic.
AuthenticationFilter accept null secret file. However, if we use
AuthenticationFilterInitializer to construct AuthenticationFilter, the null
case is denied. I previously open a ticket for this issue (HADOOP-10600).

Allow RM web services users to authenticate using delegation tokens
---

Key: YARN-2247
URL: https://issues.apache.org/jira/browse/YARN-2247
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
Attachments: apache-yarn-2247.0.patch, apache-yarn-2247.1.patch,
apache-yarn-2247.2.patch

The RM webapp should allow users to authenticate using delegation tokens to
maintain parity with RPC.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2310) Revisit the APIs in RM web services where user information can make difference

Zhijie Shen created YARN-2310:
-

 Summary: Revisit the APIs in RM web services where user 
information can make difference
 Key: YARN-2310
 URL: https://issues.apache.org/jira/browse/YARN-2310
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 3.0.0, 2.5.0
Reporter: Zhijie Shen


After YARN-2247, RM web services can be sheltered by the authentication filter, 
which can help to identify who the user is. With this information, we should be 
able to fix the security problem of some existing APIs, such as getApp, 
getAppAttempts, getApps. We should use the user information to check the ACLs 
before returning the requested data to the user.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2311) Revisit RM web pages where user information may make difference.

Zhijie Shen created YARN-2311:
-

 Summary: Revisit RM web pages where user information may make 
difference.
 Key: YARN-2311
 URL: https://issues.apache.org/jira/browse/YARN-2311
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 3.0.0, 2.5.0
Reporter: Zhijie Shen


Similar to YARN-2310, RM web apps are list information without considering 
whether the user have the access to it. This could be fixed after YARN-2247, 
via which we can have the authentication filter to identify the user of the 
incoming request.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2247) Allow RM web services users to authenticate using delegation tokens


[ 
https://issues.apache.org/jira/browse/YARN-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065027#comment-14065027
 ] 

Zhijie Shen commented on YARN-2247:
---

I filed YARN-2310 and YARN-2311 for the third point.

 Allow RM web services users to authenticate using delegation tokens
 ---

 Key: YARN-2247
 URL: https://issues.apache.org/jira/browse/YARN-2247
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
 Attachments: apache-yarn-2247.0.patch, apache-yarn-2247.1.patch, 
 apache-yarn-2247.2.patch


 The RM webapp should allow users to authenticate using delegation tokens to 
 maintain parity with RPC.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2312) Marking ContainerId#getId as deprecated

Tsuyoshi OZAWA created YARN-2312:


 Summary: Marking ContainerId#getId as deprecated
 Key: YARN-2312
 URL: https://issues.apache.org/jira/browse/YARN-2312
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA


{{ContainerId#getId}} will only return partial value of containerId, only 
sequence number of container id without epoch, after YARN-2229. We should mark 
{{ContainerId#getId}} as deprecated and use {{ContainerId#getContainerId}} 
instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2229) ContainerId can overflow with RM restart


[ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065045#comment-14065045
 ] 

Tsuyoshi OZAWA commented on YARN-2229:
--

Created YARN-2312 to address marking getId as deprecated method.

 ContainerId can overflow with RM restart
 

 Key: YARN-2229
 URL: https://issues.apache.org/jira/browse/YARN-2229
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2229.1.patch, YARN-2229.10.patch, 
 YARN-2229.10.patch, YARN-2229.2.patch, YARN-2229.2.patch, YARN-2229.3.patch, 
 YARN-2229.4.patch, YARN-2229.5.patch, YARN-2229.6.patch, YARN-2229.7.patch, 
 YARN-2229.8.patch, YARN-2229.9.patch


 On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
 lower 22 bits are for sequence number of Ids. This is for preserving 
 semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
 {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
 {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
 restarts 1024 times.
 To avoid the problem, its better to make containerId long. We need to define 
 the new format of container Id with preserving backward compatibility on this 
 JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-07-17 Thread Eric Payne (JIRA)

[
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eric Payne updated YARN-415:

Attachment: YARN-415.201407171553.txt

Thanks [~leftnoteasy]]
{quote}
No, you can update current trunk code, and check
RMContainerImpl#FinishedTransition#updateMetricsIfPreempted, you can change the
updateMetricsIfPreempted to something like updateAttemptMetrics. And create a
new method in RMAppAttemptMetrics, like updateResourceUtilization. The
benefit of doing this is you don need send payload to RMAppAttempt, all you
needed information should be existed in RMContainer.
{quote}

updateMetricsIfPreempted() is using the current attempt. Is there a way to get
the RMAppAttempt object for completed attempts. IIUC, there are races where
there is no running attempt, such as when an attempt dies after other
containers have started and then the app itself dies or is killed. Also, in the
case of work-preserving restart, the appattempt could die and it's child
containers could be assigned to the second appattempt,

I have included a new patch that adds the payload to the CONTAINER_FINISHED
event, which is sent to the appropriate RMAppAttempt. The RMAppAttempt then
will keep track of it's own stats, even after the container for that appattempt
has finished.

Capture memory utilization at the app-level for chargeback
--

Key: YARN-415
URL: https://issues.apache.org/jira/browse/YARN-415
Project: Hadoop YARN
Issue Type: New Feature
Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
Attachments: YARN-415--n10.patch, YARN-415--n2.patch,
YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch,
YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch,
YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt,
YARN-415.201406262136.txt, YARN-415.201407042037.txt,
YARN-415.201407071542.txt, YARN-415.201407171553.txt, YARN-415.patch

For the purpose of chargeback, I'd like to be able to compute the cost of an
application in terms of cluster resource usage. To start out, I'd like to
get the memory utilization of an application. The unit should be MB-seconds
or something similar and, from a chargeback perspective, the memory amount
should be the memory reserved for the application, as even if the app didn't
use all that memory, no one else was able to use it.
(reserved ram for container 1 * lifetime of container 1) + (reserved ram for
container 2 * lifetime of container 2) + ... + (reserved ram for container n
* lifetime of container n)
It'd be nice to have this at the app level instead of the job level because:
1. We'd still be able to get memory usage for jobs that crashed (and wouldn't
appear on the job history server).
2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
This new metric should be available both through the RM UI and RM Web
Services REST API.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2301) Improve yarn container command

2014-07-17 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065144#comment-14065144
 ] 

Jian He commented on YARN-2301:
---

[~Naganarasimha], sure, thanks for working it!

 Improve yarn container command
 --

 Key: YARN-2301
 URL: https://issues.apache.org/jira/browse/YARN-2301
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
  Labels: usability

 While running yarn container -list Application Attempt ID command, some 
 observations:
 1) the scheme (e.g. http/https  ) before LOG-URL is missing
 2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to 
 print as time format.
 3) finish-time is 0 if container is not yet finished. May be N/A
 4) May have an option to run as yarn container -list appId OR  yarn 
 application -list-containers appId also.  
 As attempt Id is not shown on console, this is easier for user to just copy 
 the appId and run it, may  also be useful for container-preserving AM 
 restart. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2306) leak of reservation metrics (fair scheduler)

2014-07-17 Thread Wei Yan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065170#comment-14065170
 ] 

Wei Yan commented on YARN-2306:
---

Thanks, [~zhiguohong], the patch looks good to me.

 leak of reservation metrics (fair scheduler)
 

 Key: YARN-2306
 URL: https://issues.apache.org/jira/browse/YARN-2306
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2306.patch


 This only applies to fair scheduler. Capacity scheduler is OK.
 When appAttempt or node is removed, the metrics for 
 reservation(reservedContainers, reservedMB, reservedVCores) is not reduced 
 back.
 These are important metrics for administrator. The wrong metrics confuses may 
 confuse them. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2313) Livelock can occur on FairScheduler when there are lots entry in queue

Tsuyoshi OZAWA created YARN-2313:


 Summary: Livelock can occur on FairScheduler when there are lots 
entry in queue
 Key: YARN-2313
 URL: https://issues.apache.org/jira/browse/YARN-2313
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA


Observed livelock on FairScheduler when there are lots entry in queue. After my 
investigating code, following case can occur:

1. {{update()}} called by UpdateThread takes longer times than 
UPDATE_INTERVAL(500ms) if there are lots queue.
2. UpdateThread goes busy loop.
3. Other threads(AllocationFileReloader, 
ResourceManager$SchedulerEventDispatcher) can wait forever.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2313) Livelock can occur on FairScheduler when there are lots entry in queue


 [ 
https://issues.apache.org/jira/browse/YARN-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2313:
-

Attachment: YARN-2313.1.patch

Ideally, UPDATE_INTERVAL should be calculated based on current number of 
entries in queue. Another workaround is making UPDATE_INTERVAL configurable. 
Attached patch takes the latter approach, because it's easy to implement.

 Livelock can occur on FairScheduler when there are lots entry in queue
 --

 Key: YARN-2313
 URL: https://issues.apache.org/jira/browse/YARN-2313
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2313.1.patch


 Observed livelock on FairScheduler when there are lots entry in queue. After 
 my investigating code, following case can occur:
 1. {{update()}} called by UpdateThread takes longer times than 
 UPDATE_INTERVAL(500ms) if there are lots queue.
 2. UpdateThread goes busy loop.
 3. Other threads(AllocationFileReloader, 
 ResourceManager$SchedulerEventDispatcher) can wait forever.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

[
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065203#comment-14065203
]

Hadoop QA commented on YARN-415:

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12656291/YARN-415.201407171553.txt
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 7 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.util.TestFSDownload

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/4345//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4345//console

This message is automatically generated.

Capture memory utilization at the app-level for chargeback
--

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2313) Livelock can occur on FairScheduler when there are lots entry in queue


 [ 
https://issues.apache.org/jira/browse/YARN-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2313:
-

Attachment: rm-stack-trace.txt

Attached stack trace when we faced the problem.

 Livelock can occur on FairScheduler when there are lots entry in queue
 --

 Key: YARN-2313
 URL: https://issues.apache.org/jira/browse/YARN-2313
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2313.1.patch, rm-stack-trace.txt


 Observed livelock on FairScheduler when there are lots entry in queue. After 
 my investigating code, following case can occur:
 1. {{update()}} called by UpdateThread takes longer times than 
 UPDATE_INTERVAL(500ms) if there are lots queue.
 2. UpdateThread goes busy loop.
 3. Other threads(AllocationFileReloader, 
 ResourceManager$SchedulerEventDispatcher) can wait forever.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2305) When a container is in reserved state then total cluster memory is displayed wrongly.


[ 
https://issues.apache.org/jira/browse/YARN-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065238#comment-14065238
 ] 

Sunil G commented on YARN-2305:
---

1. Yes [~leftnoteasy], GUI display of 106% is similar to YARN-2285. It can be 
tackled there.
2. 

As mentioned in earlier comment, Total MB in GUI is internally sum of 
availableMB+allottedMB.

a. *LeafQueue#usedResources* is sum of used and reserved memory.  
But *CSQueueUtils#updateQueueStatistics()* code may give a -ve value in case of 
reservation which sets availableMB in QueueMetrics.
{code}
Resource available = Resources.subtract(queueLimit, usedResources);
{code}

If this comes as -ve, then *availableMB* is set as 0.

b. *allocatedMB*: This is set when a container is really allocated. This is the 
real queue usage.

In above scenario, it should have come as 
{noformat}15(availableMb)+1(allocatedMB)=16{noformat}
But due to reservation, allocatedMB became 0. Hence total shown as 15.
I feel instead of showing Total as *allocated+available*, we can show 
*clusterResource* here. Any particular reason why we need like 
allocated+available, thoughts?


 When a container is in reserved state then total cluster memory is displayed 
 wrongly.
 -

 Key: YARN-2305
 URL: https://issues.apache.org/jira/browse/YARN-2305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: J.Andreina
Assignee: Sunil G
 Attachments: Capture.jpg


 ENV Details:
 =  
  3 queues  :  a(50%),b(25%),c(25%) --- All max utilization is set to 
 100
  2 Node cluster with total memory as 16GB
 TestSteps:
 =
   Execute following 3 jobs with different memory configurations for 
 Map , reducer and AM task
   ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=a 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=2048 
 /dir8 /preempt_85 (application_1405414066690_0023)
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=b 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.reduce.memory.mb=2048 
 /dir2 /preempt_86 (application_1405414066690_0025)
  
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=c 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=1024 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=1024 
 /dir2 /preempt_62
 Issue
 =
   when 2GB memory is in reserved state  totoal memory is shown as 
 15GB and used as 15GB  ( while total memory is 16GB)
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2314) ContainerManagementProtocolProxy can create thousands of threads for a large cluster

2014-07-17 Thread Jason Lowe (JIRA)

Jason Lowe created YARN-2314:


 Summary: ContainerManagementProtocolProxy can create thousands of 
threads for a large cluster
 Key: YARN-2314
 URL: https://issues.apache.org/jira/browse/YARN-2314
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Priority: Critical


ContainerManagementProtocolProxy has a cache of NM proxies, and the size of 
this cache is configurable.  However the cache can grow far beyond the 
configured size when running on a large cluster and blow AM address/container 
limits.  More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2313) Livelock can occur on FairScheduler when there are lots entry in queue


[ 
https://issues.apache.org/jira/browse/YARN-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065294#comment-14065294
 ] 

Hadoop QA commented on YARN-2313:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656305/rm-stack-trace.txt
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4347//console

This message is automatically generated.

 Livelock can occur on FairScheduler when there are lots entry in queue
 --

 Key: YARN-2313
 URL: https://issues.apache.org/jira/browse/YARN-2313
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2313.1.patch, rm-stack-trace.txt


 Observed livelock on FairScheduler when there are lots entry in queue. After 
 my investigating code, following case can occur:
 1. {{update()}} called by UpdateThread takes longer times than 
 UPDATE_INTERVAL(500ms) if there are lots queue.
 2. UpdateThread goes busy loop.
 3. Other threads(AllocationFileReloader, 
 ResourceManager$SchedulerEventDispatcher) can wait forever.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2313) Livelock can occur on FairScheduler when there are lots entry in queue


 [ 
https://issues.apache.org/jira/browse/YARN-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2313:
-

Attachment: (was: YARN-2313.1.patch)

 Livelock can occur on FairScheduler when there are lots entry in queue
 --

 Key: YARN-2313
 URL: https://issues.apache.org/jira/browse/YARN-2313
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2313.1.patch, rm-stack-trace.txt


 Observed livelock on FairScheduler when there are lots entry in queue. After 
 my investigating code, following case can occur:
 1. {{update()}} called by UpdateThread takes longer times than 
 UPDATE_INTERVAL(500ms) if there are lots queue.
 2. UpdateThread goes busy loop.
 3. Other threads(AllocationFileReloader, 
 ResourceManager$SchedulerEventDispatcher) can wait forever.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2313) Livelock can occur on FairScheduler when there are lots entry in queue


 [ 
https://issues.apache.org/jira/browse/YARN-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2313:
-

Attachment: YARN-2313.1.patch

 Livelock can occur on FairScheduler when there are lots entry in queue
 --

 Key: YARN-2313
 URL: https://issues.apache.org/jira/browse/YARN-2313
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2313.1.patch, rm-stack-trace.txt


 Observed livelock on FairScheduler when there are lots entry in queue. After 
 my investigating code, following case can occur:
 1. {{update()}} called by UpdateThread takes longer times than 
 UPDATE_INTERVAL(500ms) if there are lots queue.
 2. UpdateThread goes busy loop.
 3. Other threads(AllocationFileReloader, 
 ResourceManager$SchedulerEventDispatcher) can wait forever.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2314) ContainerManagementProtocolProxy can create thousands of threads for a large cluster

2014-07-17 Thread Jason Lowe (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065316#comment-14065316
]

Jason Lowe commented on YARN-2314:
--

The problem is that the cache doesn't try very hard to remove proxies when the
cache is at or beyond the maximum configured size. When adding a new proxy to
the cache and it should remove an entry, it simply grabs the
least-recently-used proxy and tries to close it. If the entry is currently in
use then an entry isn't immediately removed and that means we're running with a
cache larger than configured.

This can get far worse on a big cluster. For example, if the
least-recently-used proxy is currently performing a call that is stuck on
socket connection retries, the LRU entry could take quite a while before it
closes. During that time each new proxy created will make the same attempt to
close that proxy and fail to do so. That means that the cache size is now N-1
larger than it should be when it finally does close where N is the number of
proxies created while the LRU entry was busy.

On a large cluster with thousands of nodes a proxy hanging on one node could
allow the cache to have thousands of more proxies in it than configured. Since
each proxy is a thread, that's thousands of threads, and all those thread
stacks can blow container limits on the AM (or address limits if it's a 32-bit
AM).

ContainerManagementProtocolProxy can create thousands of threads for a large
cluster

Key: YARN-2314
URL: https://issues.apache.org/jira/browse/YARN-2314
Project: Hadoop YARN
Issue Type: Bug
Components: client
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Priority: Critical

ContainerManagementProtocolProxy has a cache of NM proxies, and the size of
this cache is configurable. However the cache can grow far beyond the
configured size when running on a large cluster and blow AM address/container
limits. More details in the first comment.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2313) Livelock can occur on FairScheduler when there are lots entry in queue


[ 
https://issues.apache.org/jira/browse/YARN-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065312#comment-14065312
 ] 

Hadoop QA commented on YARN-2313:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656304/YARN-2313.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4346//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4346//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4346//console

This message is automatically generated.

 Livelock can occur on FairScheduler when there are lots entry in queue
 --

 Key: YARN-2313
 URL: https://issues.apache.org/jira/browse/YARN-2313
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2313.1.patch, rm-stack-trace.txt


 Observed livelock on FairScheduler when there are lots entry in queue. After 
 my investigating code, following case can occur:
 1. {{update()}} called by UpdateThread takes longer times than 
 UPDATE_INTERVAL(500ms) if there are lots queue.
 2. UpdateThread goes busy loop.
 3. Other threads(AllocationFileReloader, 
 ResourceManager$SchedulerEventDispatcher) can wait forever.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed


[ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065363#comment-14065363
 ] 

Sunil G commented on YARN-2308:
---

During *RMAppRecoveredTransition* in RMAppImpl, may be we can check recovered 
app queue (can get this from submission context) is still a valid queue?
If this queue not present, recovery for that app can be made failed, and may be 
need to do some more RMApp clean up. Sounds doable?

 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Priority: Critical

 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2208) AMRMTokenManager need to have a way to roll over AMRMToken

2014-07-17 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2208:


Attachment: YARN-2208.9.patch

 AMRMTokenManager need to have a way to roll over AMRMToken
 --

 Key: YARN-2208
 URL: https://issues.apache.org/jira/browse/YARN-2208
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2208.1.patch, YARN-2208.2.patch, YARN-2208.3.patch, 
 YARN-2208.4.patch, YARN-2208.5.patch, YARN-2208.5.patch, YARN-2208.6.patch, 
 YARN-2208.7.patch, YARN-2208.8.patch, YARN-2208.8.patch, YARN-2208.8.patch, 
 YARN-2208.9.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2313) Livelock can occur on FairScheduler when there are lots entry in queue


[ 
https://issues.apache.org/jira/browse/YARN-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065414#comment-14065414
 ] 

Hadoop QA commented on YARN-2313:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656316/YARN-2313.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4348//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4348//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4348//console

This message is automatically generated.

 Livelock can occur on FairScheduler when there are lots entry in queue
 --

 Key: YARN-2313
 URL: https://issues.apache.org/jira/browse/YARN-2313
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2313.1.patch, rm-stack-trace.txt


 Observed livelock on FairScheduler when there are lots entry in queue. After 
 my investigating code, following case can occur:
 1. {{update()}} called by UpdateThread takes longer times than 
 UPDATE_INTERVAL(500ms) if there are lots queue.
 2. UpdateThread goes busy loop.
 3. Other threads(AllocationFileReloader, 
 ResourceManager$SchedulerEventDispatcher) can wait forever.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2208) AMRMTokenManager need to have a way to roll over AMRMToken


[ 
https://issues.apache.org/jira/browse/YARN-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065451#comment-14065451
 ] 

Hadoop QA commented on YARN-2208:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656318/YARN-2208.9.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.util.TestFSDownload
  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4349//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4349//console

This message is automatically generated.

 AMRMTokenManager need to have a way to roll over AMRMToken
 --

 Key: YARN-2208
 URL: https://issues.apache.org/jira/browse/YARN-2208
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2208.1.patch, YARN-2208.2.patch, YARN-2208.3.patch, 
 YARN-2208.4.patch, YARN-2208.5.patch, YARN-2208.5.patch, YARN-2208.6.patch, 
 YARN-2208.7.patch, YARN-2208.8.patch, YARN-2208.8.patch, YARN-2208.8.patch, 
 YARN-2208.9.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2208) AMRMTokenManager need to have a way to roll over AMRMToken

2014-07-17 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2208:


Attachment: YARN-2208.9.patch

Try again with the same patch

 AMRMTokenManager need to have a way to roll over AMRMToken
 --

 Key: YARN-2208
 URL: https://issues.apache.org/jira/browse/YARN-2208
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2208.1.patch, YARN-2208.2.patch, YARN-2208.3.patch, 
 YARN-2208.4.patch, YARN-2208.5.patch, YARN-2208.5.patch, YARN-2208.6.patch, 
 YARN-2208.7.patch, YARN-2208.8.patch, YARN-2208.8.patch, YARN-2208.8.patch, 
 YARN-2208.9.patch, YARN-2208.9.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2208) AMRMTokenManager need to have a way to roll over AMRMToken


[ 
https://issues.apache.org/jira/browse/YARN-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065525#comment-14065525
 ] 

Hadoop QA commented on YARN-2208:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656332/YARN-2208.9.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.util.TestFSDownload
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4350//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4350//console

This message is automatically generated.

 AMRMTokenManager need to have a way to roll over AMRMToken
 --

 Key: YARN-2208
 URL: https://issues.apache.org/jira/browse/YARN-2208
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2208.1.patch, YARN-2208.2.patch, YARN-2208.3.patch, 
 YARN-2208.4.patch, YARN-2208.5.patch, YARN-2208.5.patch, YARN-2208.6.patch, 
 YARN-2208.7.patch, YARN-2208.8.patch, YARN-2208.8.patch, YARN-2208.8.patch, 
 YARN-2208.9.patch, YARN-2208.9.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-07-17 Thread Eric Payne (JIRA)

[
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eric Payne updated YARN-415:

Attachment: YARN-415.201407172144.txt

Thank you, [~leftnoteasy]
{quote}
No, you can update current trunk code, and check
RMContainerImpl#FinishedTransition#updateMetricsIfPreempted, you can change the
updateMetricsIfPreempted to something like updateAttemptMetrics. And create a
new method in RMAppAttemptMetrics, like updateResourceUtilization. The
benefit of doing this is you don need send payload to RMAppAttempt, all you
needed information should be existed in RMContainer.
{quote}

I see that I can use RMApp#getRMAppAttempt to get the attempt that belongs to
the container, so your suggestion will work for this use case. This is a
cleaner solution.

I have updated the patch with your suggestions. I am still looking into the
test problems.

Capture memory utilization at the app-level for chargeback
--

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2045) Data persisted in NM should be versioned

2014-07-17 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065669#comment-14065669
 ] 

Jason Lowe commented on YARN-2045:
--

Thanks for updating the patch!

bq. Also, I like suggestion to make unit test as a black box which may only 
handle NMStateStore's start and stop. However, in this case, it could means we 
need extra API to update CURRENT_VERSION_INFO which is a constant now (but 
could be changed to different values across different YARN versions)

What I meant is instead of using checkVersion to verify the version we would 
instead stop and start the state store to see if it accepted the version.  We 
would still need to use the storeVersion(NMDBSchemaVersion) package-private 
method to store a custom version after it starts then restart the state store 
to verify it either started up or failed due to an incompatible version.  It's 
not a big deal if you'd rather keep it as-is.

Otherwise latest patch looks good.  Will give it a closer look tomorrow for 
what I think will be final review/commit, and that will also give [~vvasudev] a 
chance to take another look.

 Data persisted in NM should be versioned
 

 Key: YARN-2045
 URL: https://issues.apache.org/jira/browse/YARN-2045
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2045-v2.patch, YARN-2045-v3.patch, 
 YARN-2045-v4.patch, YARN-2045-v5.patch, YARN-2045.patch


 As a split task from YARN-667, we want to add version info to NM related 
 data, include:
 - NodeManager local LevelDB state
 - NodeManager directory structure



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2315) Should use setCurrentCapacity instead of setCapacity to configure used resource capacity for FairScheduler.

2014-07-17 Thread zhihai xu (JIRA)

zhihai xu created YARN-2315:
---

 Summary: Should use setCurrentCapacity instead of setCapacity to 
configure used resource capacity for FairScheduler.
 Key: YARN-2315
 URL: https://issues.apache.org/jira/browse/YARN-2315
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu


Should use setCurrentCapacity instead of setCapacity to configure used resource 
capacity for FairScheduler.
In function getQueueInfo of FSQueue.java, we call setCapacity twice with 
different parameters so the first call is overrode by the second call. 
queueInfo.setCapacity((float) getFairShare().getMemory() /
scheduler.getClusterResource().getMemory());
queueInfo.setCapacity((float) getResourceUsage().getMemory() /
scheduler.getClusterResource().getMemory());
We should change the second setCapacity call to setCurrentCapacity to configure 
the current used capacity.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2315) Should use setCurrentCapacity instead of setCapacity to configure used resource capacity for FairScheduler.

2014-07-17 Thread zhihai xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2315:


Attachment: YARN-2315.patch

 Should use setCurrentCapacity instead of setCapacity to configure used 
 resource capacity for FairScheduler.
 ---

 Key: YARN-2315
 URL: https://issues.apache.org/jira/browse/YARN-2315
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2315.patch


 Should use setCurrentCapacity instead of setCapacity to configure used 
 resource capacity for FairScheduler.
 In function getQueueInfo of FSQueue.java, we call setCapacity twice with 
 different parameters so the first call is overrode by the second call. 
 queueInfo.setCapacity((float) getFairShare().getMemory() /
 scheduler.getClusterResource().getMemory());
 queueInfo.setCapacity((float) getResourceUsage().getMemory() /
 scheduler.getClusterResource().getMemory());
 We should change the second setCapacity call to setCurrentCapacity to 
 configure the current used capacity.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2316) TestNMWebServices* get failed on trunk

Junping Du created YARN-2316:


 Summary: TestNMWebServices* get failed on trunk
 Key: YARN-2316
 URL: https://issues.apache.org/jira/browse/YARN-2316
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du


From Jenkins test in YARN-2045 and YARN-1341, these tests get failed with 
address already get bind. The similar issue happens at RMWebService 
(YARN-2304) and AMWebService (MAPREDUCE-5973) as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2316) TestNMWebServices* get failed on trunk


[ 
https://issues.apache.org/jira/browse/YARN-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065705#comment-14065705
 ] 

Junping Du commented on YARN-2316:
--

I suspect some new added tests recently on Web-Services didn't do proper 
cleanup.

 TestNMWebServices* get failed on trunk
 --

 Key: YARN-2316
 URL: https://issues.apache.org/jira/browse/YARN-2316
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du

 From Jenkins test in YARN-2045 and YARN-1341, these tests get failed with 
 address already get bind. The similar issue happens at RMWebService 
 (YARN-2304) and AMWebService (MAPREDUCE-5973) as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart


[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065716#comment-14065716
 ] 

Junping Du commented on YARN-1341:
--

+1. Patch looks good. Will commit it shortly.

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch, 
 YARN-1341v7.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart


[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065715#comment-14065715
 ] 

Junping Du commented on YARN-1341:
--

I can confirm test failure is not related to the patch as it also show up in 
YARN-2045. The similar issue happens in AM WebServices (MAPREDUCE-5973)and RM 
WebServices (YARN-2304) also. Already filed YARN-2316 to track these failures.

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch, 
 YARN-1341v7.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback


[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065743#comment-14065743
 ] 

Hadoop QA commented on YARN-415:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12656361/YARN-415.201407172144.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.util.TestFSDownload

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4351//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4351//console

This message is automatically generated.

 Capture memory utilization at the app-level for chargeback
 --

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
 YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
 YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
 YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
 YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
 YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
 YARN-415.201407172144.txt, YARN-415.patch


 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
 This new metric should be available both through the RM UI and RM Web 
 Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2045) Data persisted in NM should be versioned


 [ 
https://issues.apache.org/jira/browse/YARN-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2045:
-

Attachment: YARN-2045-v6.patch

Address [~jlowe]'s comments in v6 patch.

 Data persisted in NM should be versioned
 

 Key: YARN-2045
 URL: https://issues.apache.org/jira/browse/YARN-2045
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2045-v2.patch, YARN-2045-v3.patch, 
 YARN-2045-v4.patch, YARN-2045-v5.patch, YARN-2045-v6.patch, YARN-2045.patch


 As a split task from YARN-667, we want to add version info to NM related 
 data, include:
 - NodeManager local LevelDB state
 - NodeManager directory structure



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2315) Should use setCurrentCapacity instead of setCapacity to configure used resource capacity for FairScheduler.


[ 
https://issues.apache.org/jira/browse/YARN-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065751#comment-14065751
 ] 

Hadoop QA commented on YARN-2315:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656373/YARN-2315.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4352//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4352//console

This message is automatically generated.

 Should use setCurrentCapacity instead of setCapacity to configure used 
 resource capacity for FairScheduler.
 ---

 Key: YARN-2315
 URL: https://issues.apache.org/jira/browse/YARN-2315
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2315.patch


 Should use setCurrentCapacity instead of setCapacity to configure used 
 resource capacity for FairScheduler.
 In function getQueueInfo of FSQueue.java, we call setCapacity twice with 
 different parameters so the first call is overrode by the second call. 
 queueInfo.setCapacity((float) getFairShare().getMemory() /
 scheduler.getClusterResource().getMemory());
 queueInfo.setCapacity((float) getResourceUsage().getMemory() /
 scheduler.getClusterResource().getMemory());
 We should change the second setCapacity call to setCurrentCapacity to 
 configure the current used capacity.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2317) Update documentation about how to write YARN applications

2014-07-17 Thread Li Lu (JIRA)

Li Lu created YARN-2317:
---

 Summary: Update documentation about how to write YARN applications
 Key: YARN-2317
 URL: https://issues.apache.org/jira/browse/YARN-2317
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu


Some information in WritingYarnApplications webpage is out-dated. Need some 
refresh work on this document to reflect the most recent changes in YARN APIs. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2317) Update documentation about how to write YARN applications

2014-07-17 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2317:


Attachment: YARN-2317-071714.patch

Hi folks, I've refreshed the WritingYarnApplications webpage to keep this 
consistent with some API changes. I'm using the YARN distributed shell as a 
sample, and explained how the new ( esp. asynchronous) APIs works. This is my 
first draft of it. I would definitely appreciate comments/suggestions from the 
whole community on this. Thank you! 

 Update documentation about how to write YARN applications
 -

 Key: YARN-2317
 URL: https://issues.apache.org/jira/browse/YARN-2317
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2317-071714.patch


 Some information in WritingYarnApplications webpage is out-dated. Need some 
 refresh work on this document to reflect the most recent changes in YARN 
 APIs. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (YARN-2275) When log aggregation not enabled, message should point to NM HTTP port, not IPC port

2014-07-17 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang resolved YARN-2275.
--

Resolution: Won't Fix

Unable to fix this using a single Configuration property.  Patch which hacks 
and uses two properties considered not acceptable.  Closing this bug as won't 
fix.

 When log aggregation not enabled, message should point to NM HTTP port, not 
 IPC port 
 -

 Key: YARN-2275
 URL: https://issues.apache.org/jira/browse/YARN-2275
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Ray Chiang
  Labels: usability
 Attachments: MAPREDUCE5185-01.patch


 When I try to get a container's logs in the JHS without log aggregation 
 enabled, I get a message that looks like this:
 Aggregation is not enabled. Try the nodemanager at sandy-ThinkPad-T530:33224
 This could be a lot more helpful by actually pointing the URL that would show 
 the container logs on the NM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2045) Data persisted in NM should be versioned


[ 
https://issues.apache.org/jira/browse/YARN-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065791#comment-14065791
 ] 

Hadoop QA commented on YARN-2045:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656386/YARN-2045-v6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4353//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4353//console

This message is automatically generated.

 Data persisted in NM should be versioned
 

 Key: YARN-2045
 URL: https://issues.apache.org/jira/browse/YARN-2045
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2045-v2.patch, YARN-2045-v3.patch, 
 YARN-2045-v4.patch, YARN-2045-v5.patch, YARN-2045-v6.patch, YARN-2045.patch


 As a split task from YARN-667, we want to add version info to NM related 
 data, include:
 - NodeManager local LevelDB state
 - NodeManager directory structure



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2317) Update documentation about how to write YARN applications


[ 
https://issues.apache.org/jira/browse/YARN-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065793#comment-14065793
 ] 

Hadoop QA commented on YARN-2317:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12656393/YARN-2317-071714.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4354//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4354//console

This message is automatically generated.

 Update documentation about how to write YARN applications
 -

 Key: YARN-2317
 URL: https://issues.apache.org/jira/browse/YARN-2317
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2317-071714.patch


 Some information in WritingYarnApplications webpage is out-dated. Need some 
 refresh work on this document to reflect the most recent changes in YARN 
 APIs. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart


[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065795#comment-14065795
 ] 

Hudson commented on YARN-1341:
--

FAILURE: Integrated in Hadoop-trunk-Commit #5906 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5906/])
YARN-1341. Recover NMTokens upon nodemanager restart. (Contributed by Jason 
Lowe) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611512)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/security/BaseNMTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/security/NMTokenSecretManagerInNM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/security
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/security/TestNMTokenSecretManagerInNM.java


 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch, 
 YARN-1341v7.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2307) Capacity scheduler user only ADMINISTER_QUEUE also can submit app


[ 
https://issues.apache.org/jira/browse/YARN-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065840#comment-14065840
 ] 

tangjunjie commented on YARN-2307:
--

Queue acls for user : root
Queue Operations
=
root 
default 
china ADMINISTER_QUEUE
unfunded

I think  if root user can submit job ,hadoop queue -showacls will display as 
follow:


Queue acls for user : root
Queue Operations
=
root 
default 
china ADMINISTER_QUEUESUBMIT_APPLICATIONS
unfunded


This is my config detail :

configuration
 property
nameyarn.scheduler.capacity.root.queues/name
valueunfunded,china,default/value
  /property
  
  property
nameyarn.scheduler.capacity.root.capacity/name
value100/value
  /property

  property
nameyarn.scheduler.capacity.root.acl_submit_applications/name
valuejj/value
  /property
 property
nameyarn.scheduler.capacity.root.acl_administer_queue/name
valuejj/value
  /property

  property
nameyarn.scheduler.capacity.root.unfunded.acl_submit_applications/name
valuexjj/value
  /property
 property
nameyarn.scheduler.capacity.root.unfunded.acl_administer_queue/name
valuexjj/value
  /property
 property
nameyarn.scheduler.capacity.root.china.acl_submit_applications/name
valuechina1/value
  /property

 property
nameyarn.scheduler.capacity.root.china.acl_administer_queue/name
valuechina,root/value
  /property

  
  property
nameyarn.scheduler.capacity.root.unfunded.capacity/name
value40/value
  /property
  
  property
nameyarn.scheduler.capacity.root.china.capacity/name
value50/value
  /property
property
nameyarn.scheduler.capacity.root.default.capacity/name
value10/value
  /property
/configuration



 Capacity scheduler user only ADMINISTER_QUEUE also can submit app 
 --

 Key: YARN-2307
 URL: https://issues.apache.org/jira/browse/YARN-2307
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.3.0
 Environment: hadoop 2.3.0  centos6.5  jdk1.7
Reporter: tangjunjie
Priority: Minor

 Queue acls for user :  root
 Queue  Operations
 =
 root  
 default  
 china  ADMINISTER_QUEUE
 unfunded 
 user root only have ADMINISTER_QUEUE  but user root can sumbit app to
 china queue



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2317) Update documentation about how to write YARN applications

2014-07-17 Thread Akira AJISAKA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-2317:


Component/s: documentation

 Update documentation about how to write YARN applications
 -

 Key: YARN-2317
 URL: https://issues.apache.org/jira/browse/YARN-2317
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2317-071714.patch


 Some information in WritingYarnApplications webpage is out-dated. Need some 
 refresh work on this document to reflect the most recent changes in YARN 
 APIs. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1342) Recover container tokens upon nodemanager restart


[ 
https://issues.apache.org/jira/browse/YARN-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065868#comment-14065868
 ] 

Junping Du commented on YARN-1342:
--

Yes. The patch is not get sync for sometime. [~jlowe], would you mind to update 
the patch against latest trunk?

 Recover container tokens upon nodemanager restart
 -

 Key: YARN-1342
 URL: https://issues.apache.org/jira/browse/YARN-1342
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1342.patch, YARN-1342v2.patch, 
 YARN-1342v3-and-YARN-1987.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-07-17 Thread Wangda Tan (JIRA)

[
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065910#comment-14065910
]

Wangda Tan commented on YARN-415:
-

Hi [~eepayne],
Thanks for updating your patch, the failed test case should be irrelevant to
your changes, it is tracked by YARN-2270.
Reviewing..

Thanks,
Wangda

Capture memory utilization at the app-level for chargeback
--

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2229) ContainerId can overflow with RM restart


[ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065942#comment-14065942
 ] 

Tsuyoshi OZAWA commented on YARN-2229:
--

[~jianhe], I appreciate if you can take a look.

 ContainerId can overflow with RM restart
 

 Key: YARN-2229
 URL: https://issues.apache.org/jira/browse/YARN-2229
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2229.1.patch, YARN-2229.10.patch, 
 YARN-2229.10.patch, YARN-2229.2.patch, YARN-2229.2.patch, YARN-2229.3.patch, 
 YARN-2229.4.patch, YARN-2229.5.patch, YARN-2229.6.patch, YARN-2229.7.patch, 
 YARN-2229.8.patch, YARN-2229.9.patch


 On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
 lower 22 bits are for sequence number of Ids. This is for preserving 
 semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
 {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
 {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
 restarts 1024 times.
 To avoid the problem, its better to make containerId long. We need to define 
 the new format of container Id with preserving backward compatibility on this 
 JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2313) Livelock can occur on FairScheduler when there are lots entry in queue


 [ 
https://issues.apache.org/jira/browse/YARN-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2313:
-

Attachment: YARN-2313.2.patch

Fixed the warning by findbugs.

 Livelock can occur on FairScheduler when there are lots entry in queue
 --

 Key: YARN-2313
 URL: https://issues.apache.org/jira/browse/YARN-2313
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2313.1.patch, YARN-2313.2.patch, rm-stack-trace.txt


 Observed livelock on FairScheduler when there are lots entry in queue. After 
 my investigating code, following case can occur:
 1. {{update()}} called by UpdateThread takes longer times than 
 UPDATE_INTERVAL(500ms) if there are lots queue.
 2. UpdateThread goes busy loop.
 3. Other threads(AllocationFileReloader, 
 ResourceManager$SchedulerEventDispatcher) can wait forever.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2318) hadoop configuraion checker

tangjunjie created YARN-2318:


 Summary: hadoop configuraion checker
 Key: YARN-2318
 URL: https://issues.apache.org/jira/browse/YARN-2318
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: tangjunjie


hadoop  have a lot of config property. People will make mistake when modify 
configuration file. So hadoop can do config check tool .This tool can find 
mistake as follow.
if config 
property
  namemapreduce.tasktracker.reduce.tasks.maximu/name !--should be 
mapreduce.tasktracker.reduce.tasks.maximum--
  value9/value
  descriptionThe maximum number of reduce tasks that will be run
  simultaneously by a task tracker.
  /description
/property

OR this tool can warn use deprecated property name and correct it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2318) hadoop configuraion checker

[
https://issues.apache.org/jira/browse/YARN-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

tangjunjie updated YARN-2318:
-

Description:
hadoop have a lot of config property. People will make mistake when modify
configuration file. So hadoop can do config check tool .This tool can find
mistake as follow.
if config
property
namemapreduce.tasktracker.reduce.tasks.maximu/name should be
mapreduce.tasktracker.reduce.tasks.maximum
value9/value
descriptionThe maximum number of reduce tasks that will be run
simultaneously by a task tracker.
/description
/property

OR this tool can warn use deprecated property name and correct it.

was:
hadoop have a lot of config property. People will make mistake when modify
configuration file. So hadoop can do config check tool .This tool can find
mistake as follow.
if config
property
namemapreduce.tasktracker.reduce.tasks.maximu/name !--should be
mapreduce.tasktracker.reduce.tasks.maximum--
value9/value
descriptionThe maximum number of reduce tasks that will be run
simultaneously by a task tracker.
/description
/property

OR this tool can warn use deprecated property name and correct it.

hadoop configuraion checker
---

Key: YARN-2318
URL: https://issues.apache.org/jira/browse/YARN-2318
Project: Hadoop YARN
Issue Type: New Feature
Reporter: tangjunjie

hadoop have a lot of config property. People will make mistake when modify
configuration file. So hadoop can do config check tool .This tool can find
mistake as follow.
if config
property
namemapreduce.tasktracker.reduce.tasks.maximu/name should be
mapreduce.tasktracker.reduce.tasks.maximum
value9/value
descriptionThe maximum number of reduce tasks that will be run
simultaneously by a task tracker.
/description
/property
OR this tool can warn use deprecated property name and correct it.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2313) Livelock can occur on FairScheduler when there are lots entry in queue


[ 
https://issues.apache.org/jira/browse/YARN-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065991#comment-14065991
 ] 

Hadoop QA commented on YARN-2313:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656429/YARN-2313.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4355//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4355//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4355//console

This message is automatically generated.

 Livelock can occur on FairScheduler when there are lots entry in queue
 --

 Key: YARN-2313
 URL: https://issues.apache.org/jira/browse/YARN-2313
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2313.1.patch, YARN-2313.2.patch, rm-stack-trace.txt


 Observed livelock on FairScheduler when there are lots entry in queue. After 
 my investigating code, following case can occur:
 1. {{update()}} called by UpdateThread takes longer times than 
 UPDATE_INTERVAL(500ms) if there are lots queue.
 2. UpdateThread goes busy loop.
 3. Other threads(AllocationFileReloader, 
 ResourceManager$SchedulerEventDispatcher) can wait forever.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2313) Livelock can occur on FairScheduler when there are lots entry in queue


 [ 
https://issues.apache.org/jira/browse/YARN-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2313:
-

Attachment: YARN-2313.3.patch

 Livelock can occur on FairScheduler when there are lots entry in queue
 --

 Key: YARN-2313
 URL: https://issues.apache.org/jira/browse/YARN-2313
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2313.1.patch, YARN-2313.2.patch, YARN-2313.3.patch, 
 rm-stack-trace.txt


 Observed livelock on FairScheduler when there are lots entry in queue. After 
 my investigating code, following case can occur:
 1. {{update()}} called by UpdateThread takes longer times than 
 UPDATE_INTERVAL(500ms) if there are lots queue.
 2. UpdateThread goes busy loop.
 3. Other threads(AllocationFileReloader, 
 ResourceManager$SchedulerEventDispatcher) can wait forever.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2319) Fix MiniKdc not close in TestRMWebServicesDelegationTokens.java

2014-07-17 Thread Wenwu Peng (JIRA)

Wenwu Peng created YARN-2319:


 Summary: Fix MiniKdc not close in 
TestRMWebServicesDelegationTokens.java
 Key: YARN-2319
 URL: https://issues.apache.org/jira/browse/YARN-2319
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wenwu Peng
Assignee: Wenwu Peng


MiniKdc only invoke start method not stop in 
TestRMWebServicesDelegationTokens.java
{code}
testMiniKDC.start();
{code}





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2244) FairScheduler missing handling of containers for unknown application attempts

2014-07-17 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066089#comment-14066089
 ] 

Karthik Kambatla commented on YARN-2244:


Latest patch looks good. A couple of nits:
# The 80 chars limit doesn't apply to imports - can we get them one per line?
{code}
+import org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity
+.CapacityScheduler;
+import org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica
+.FiCaSchedulerApp;
{code}
# Limit the following line to 80 chars. 
{code}
+SchedulerApplicationAttempt application = 
getCurrentAttemptForContainer(containerId);
{code}
# Unused imports in CapacityScheduler, FairScheduler, FifoScheduler.
# Few comments on the do-while loop in the test:
{code}
int waitCount = 0;
dispatcher.await();
ListContainerId contsToClean;
int cleanedConts = 0;
do {
  contsToClean = resp.getContainersToCleanup();
  cleanedConts += contsToClean.size();
  if (cleanedConts = 1) {
break;
  }
  Thread.sleep(100);
  resp = nm.nodeHeartbeat(true);
  dispatcher.await();
}
while(waitCount++  200);
{code}
## Define waitCount and cleanedCounts on the same line? 
## while should be on the same line as the closing brace. 
## Remove dispatcher.await() outside the loop and have it as the first 
statement in the do-block? 

 FairScheduler missing handling of containers for unknown application attempts 
 --

 Key: YARN-2244
 URL: https://issues.apache.org/jira/browse/YARN-2244
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical
 Attachments: YARN-2224.patch, YARN-2244.001.patch, 
 YARN-2244.002.patch, YARN-2244.003.patch, YARN-2244.004.patch


 We are missing changes in patch MAPREDUCE-3596 in FairScheduler. Among other 
 fixes that were common across schedulers, there were some scheduler specific 
 fixes added to handle containers for unknown application attempts. Without 
 these fair scheduler simply logs that an unknown container was found and 
 continues to let it run. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2313) Livelock can occur on FairScheduler when there are lots entry in queue