[jira] [Commented] (YARN-2917) Potential deadlock in AsyncDispatcher when system.exit called in AsyncDispatcher#dispatch and AsyscDispatcher#serviceStop from shutdown hook

2014-12-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237677#comment-14237677
 ] 

Hadoop QA commented on YARN-2917:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12685697/0001-YARN-2917.patch
  against trunk revision 120e1de.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6029//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6029//console

This message is automatically generated.

 Potential deadlock in AsyncDispatcher when system.exit called in 
 AsyncDispatcher#dispatch and AsyscDispatcher#serviceStop from shutdown hook
 

 Key: YARN-2917
 URL: https://issues.apache.org/jira/browse/YARN-2917
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Critical
 Attachments: 0001-YARN-2917.patch


 I encoutered scenario where RM hanged while shutting down and keep on logging 
 {{2014-12-03 19:32:44,283 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
 Waiting for AsyncDispatcher to drain.}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2927) InMemorySCMStore properties are inconsistent

2014-12-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237695#comment-14237695
 ] 

Hudson commented on YARN-2927:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #33 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/33/])
YARN-2927. [YARN-1492] InMemorySCMStore properties are inconsistent. (Ray 
Chiang via kasha) (kasha: rev 120e1decd7f6861e753269690d454cb14c240857)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt


 InMemorySCMStore properties are inconsistent
 

 Key: YARN-2927
 URL: https://issues.apache.org/jira/browse/YARN-2927
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: newbie, supportability
 Fix For: 2.7.0

 Attachments: YARN-2927.001.patch, YARN-2927.002.patch


 I see these properties in the yarn-default.xml file:
   yarn.sharedcache.store.in-memory.check-period-mins
   yarn.sharedcache.store.in-memory.initial-delay-mins
   yarn.sharedcache.store.in-memory.staleness-period-mins
 YarnConfiguration looks like it's missing some properties:
   public static final String SHARED_CACHE_PREFIX = yarn.sharedcache.;
   public static final String SCM_STORE_PREFIX = SHARED_CACHE_PREFIX + 
 store.;
   public static final String IN_MEMORY_STORE_PREFIX = SHARED_CACHE_PREFIX + 
 in-memory.;
   public static final String IN_MEMORY_STALENESS_PERIOD_MINS = 
 IN_MEMORY_STORE_PREFIX + staleness-period-mins;
 It looks like the definition for IN_MEMORY_STORE_PREFIX should be:
   public static final String IN_MEMORY_STORE_PREFIX = SCM_STORE_PREFIX + 
 in-memory.;
 Just to be clear, there are properties that exist in yarn-default.xml that 
 are effectively misspelled in the *Java* file, not the .xml file.  This is 
 similar to YARN-2461 and MAPREDUCE-6087.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2014-12-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237694#comment-14237694
 ] 

Hudson commented on YARN-1492:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #33 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/33/])
YARN-2927. [YARN-1492] InMemorySCMStore properties are inconsistent. (Ray 
Chiang via kasha) (kasha: rev 120e1decd7f6861e753269690d454cb14c240857)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt


 truly shared cache for jars (jobjar/libjar)
 ---

 Key: YARN-1492
 URL: https://issues.apache.org/jira/browse/YARN-1492
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.4-alpha
Reporter: Sangjin Lee
Assignee: Chris Trezzo
Priority: Critical
 Attachments: YARN-1492-all-trunk-v1.patch, 
 YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
 YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
 shared_cache_design.pdf, shared_cache_design_v2.pdf, 
 shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
 shared_cache_design_v5.pdf, shared_cache_design_v6.pdf


 Currently there is the distributed cache that enables you to cache jars and 
 files so that attempts from the same job can reuse them. However, sharing is 
 limited with the distributed cache because it is normally on a per-job basis. 
 On a large cluster, sometimes copying of jobjars and libjars becomes so 
 prevalent that it consumes a large portion of the network bandwidth, not to 
 speak of defeating the purpose of bringing compute to where data is. This 
 is wasteful because in most cases code doesn't change much across many jobs.
 I'd like to propose and discuss feasibility of introducing a truly shared 
 cache so that multiple jobs from multiple users can share and cache jars. 
 This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2927) InMemorySCMStore properties are inconsistent

2014-12-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237703#comment-14237703
 ] 

Hudson commented on YARN-2927:
--

ABORTED: Integrated in Hadoop-Mapreduce-trunk #1984 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1984/])
YARN-2927. [YARN-1492] InMemorySCMStore properties are inconsistent. (Ray 
Chiang via kasha) (kasha: rev 120e1decd7f6861e753269690d454cb14c240857)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt


 InMemorySCMStore properties are inconsistent
 

 Key: YARN-2927
 URL: https://issues.apache.org/jira/browse/YARN-2927
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: newbie, supportability
 Fix For: 2.7.0

 Attachments: YARN-2927.001.patch, YARN-2927.002.patch


 I see these properties in the yarn-default.xml file:
   yarn.sharedcache.store.in-memory.check-period-mins
   yarn.sharedcache.store.in-memory.initial-delay-mins
   yarn.sharedcache.store.in-memory.staleness-period-mins
 YarnConfiguration looks like it's missing some properties:
   public static final String SHARED_CACHE_PREFIX = yarn.sharedcache.;
   public static final String SCM_STORE_PREFIX = SHARED_CACHE_PREFIX + 
 store.;
   public static final String IN_MEMORY_STORE_PREFIX = SHARED_CACHE_PREFIX + 
 in-memory.;
   public static final String IN_MEMORY_STALENESS_PERIOD_MINS = 
 IN_MEMORY_STORE_PREFIX + staleness-period-mins;
 It looks like the definition for IN_MEMORY_STORE_PREFIX should be:
   public static final String IN_MEMORY_STORE_PREFIX = SCM_STORE_PREFIX + 
 in-memory.;
 Just to be clear, there are properties that exist in yarn-default.xml that 
 are effectively misspelled in the *Java* file, not the .xml file.  This is 
 similar to YARN-2461 and MAPREDUCE-6087.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2014-12-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237702#comment-14237702
 ] 

Hudson commented on YARN-1492:
--

ABORTED: Integrated in Hadoop-Mapreduce-trunk #1984 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1984/])
YARN-2927. [YARN-1492] InMemorySCMStore properties are inconsistent. (Ray 
Chiang via kasha) (kasha: rev 120e1decd7f6861e753269690d454cb14c240857)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt


 truly shared cache for jars (jobjar/libjar)
 ---

 Key: YARN-1492
 URL: https://issues.apache.org/jira/browse/YARN-1492
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.4-alpha
Reporter: Sangjin Lee
Assignee: Chris Trezzo
Priority: Critical
 Attachments: YARN-1492-all-trunk-v1.patch, 
 YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
 YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
 shared_cache_design.pdf, shared_cache_design_v2.pdf, 
 shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
 shared_cache_design_v5.pdf, shared_cache_design_v6.pdf


 Currently there is the distributed cache that enables you to cache jars and 
 files so that attempts from the same job can reuse them. However, sharing is 
 limited with the distributed cache because it is normally on a per-job basis. 
 On a large cluster, sometimes copying of jobjars and libjars becomes so 
 prevalent that it consumes a large portion of the network bandwidth, not to 
 speak of defeating the purpose of bringing compute to where data is. This 
 is wasteful because in most cases code doesn't change much across many jobs.
 I'd like to propose and discuss feasibility of introducing a truly shared 
 cache so that multiple jobs from multiple users can share and cache jars. 
 This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced

2014-12-08 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237755#comment-14237755
 ] 

Varun Saxena commented on YARN-2136:


Thanks [~jianhe] for reviewing and committing this.

 RMStateStore can explicitly handle store/update events when fenced
 --

 Key: YARN-2136
 URL: https://issues.apache.org/jira/browse/YARN-2136
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Jian He
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-2136.002.patch, YARN-2136.003.patch, 
 YARN-2136.004.patch, YARN-2136.005.patch, YARN-2136.patch


 RMStateStore can choose to handle/ignore store/update events upfront instead 
 of invoking more ZK operations if state store is at fenced state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2014-12-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237766#comment-14237766
 ] 

Hudson commented on YARN-1492:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #32 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/32/])
YARN-2927. [YARN-1492] InMemorySCMStore properties are inconsistent. (Ray 
Chiang via kasha) (kasha: rev 120e1decd7f6861e753269690d454cb14c240857)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


 truly shared cache for jars (jobjar/libjar)
 ---

 Key: YARN-1492
 URL: https://issues.apache.org/jira/browse/YARN-1492
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.4-alpha
Reporter: Sangjin Lee
Assignee: Chris Trezzo
Priority: Critical
 Attachments: YARN-1492-all-trunk-v1.patch, 
 YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
 YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
 shared_cache_design.pdf, shared_cache_design_v2.pdf, 
 shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
 shared_cache_design_v5.pdf, shared_cache_design_v6.pdf


 Currently there is the distributed cache that enables you to cache jars and 
 files so that attempts from the same job can reuse them. However, sharing is 
 limited with the distributed cache because it is normally on a per-job basis. 
 On a large cluster, sometimes copying of jobjars and libjars becomes so 
 prevalent that it consumes a large portion of the network bandwidth, not to 
 speak of defeating the purpose of bringing compute to where data is. This 
 is wasteful because in most cases code doesn't change much across many jobs.
 I'd like to propose and discuss feasibility of introducing a truly shared 
 cache so that multiple jobs from multiple users can share and cache jars. 
 This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2927) InMemorySCMStore properties are inconsistent

2014-12-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237767#comment-14237767
 ] 

Hudson commented on YARN-2927:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #32 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/32/])
YARN-2927. [YARN-1492] InMemorySCMStore properties are inconsistent. (Ray 
Chiang via kasha) (kasha: rev 120e1decd7f6861e753269690d454cb14c240857)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


 InMemorySCMStore properties are inconsistent
 

 Key: YARN-2927
 URL: https://issues.apache.org/jira/browse/YARN-2927
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: newbie, supportability
 Fix For: 2.7.0

 Attachments: YARN-2927.001.patch, YARN-2927.002.patch


 I see these properties in the yarn-default.xml file:
   yarn.sharedcache.store.in-memory.check-period-mins
   yarn.sharedcache.store.in-memory.initial-delay-mins
   yarn.sharedcache.store.in-memory.staleness-period-mins
 YarnConfiguration looks like it's missing some properties:
   public static final String SHARED_CACHE_PREFIX = yarn.sharedcache.;
   public static final String SCM_STORE_PREFIX = SHARED_CACHE_PREFIX + 
 store.;
   public static final String IN_MEMORY_STORE_PREFIX = SHARED_CACHE_PREFIX + 
 in-memory.;
   public static final String IN_MEMORY_STALENESS_PERIOD_MINS = 
 IN_MEMORY_STORE_PREFIX + staleness-period-mins;
 It looks like the definition for IN_MEMORY_STORE_PREFIX should be:
   public static final String IN_MEMORY_STORE_PREFIX = SCM_STORE_PREFIX + 
 in-memory.;
 Just to be clear, there are properties that exist in yarn-default.xml that 
 are effectively misspelled in the *Java* file, not the .xml file.  This is 
 similar to YARN-2461 and MAPREDUCE-6087.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2762) RMAdminCLI node-labels-related args should be trimmed and checked before sending to RM

2014-12-08 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1423#comment-1423
 ] 

Rohith commented on YARN-2762:
--

I believe test failures are either unrelated or temporary.

 RMAdminCLI node-labels-related args should be trimmed and checked before 
 sending to RM
 --

 Key: YARN-2762
 URL: https://issues.apache.org/jira/browse/YARN-2762
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Minor
 Attachments: YARN-2762.1.patch, YARN-2762.2.patch, YARN-2762.2.patch, 
 YARN-2762.3.patch, YARN-2762.4.patch, YARN-2762.5.patch, YARN-2762.patch


 All NodeLabel args validation's are done at server side. The same can be done 
 at RMAdminCLI so that unnecessary RPC calls can be avoided.
 And for the input such as x,y,,z,, no need to add empty string instead can 
 be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation

2014-12-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237782#comment-14237782
 ] 

Hadoop QA commented on YARN-2637:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12685475/YARN-2637.16.patch
  against trunk revision 120e1de.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueParsing
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerDynamicBehavior
  
org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6030//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6030//console

This message is automatically generated.

 maximum-am-resource-percent could be violated when resource of AM is  
 minimumAllocation
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.2.patch, YARN-2637.6.patch, YARN-2637.7.patch, 
 YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2014-12-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237802#comment-14237802
 ] 

Hudson commented on YARN-1492:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #769 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/769/])
YARN-2927. [YARN-1492] InMemorySCMStore properties are inconsistent. (Ray 
Chiang via kasha) (kasha: rev 120e1decd7f6861e753269690d454cb14c240857)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt


 truly shared cache for jars (jobjar/libjar)
 ---

 Key: YARN-1492
 URL: https://issues.apache.org/jira/browse/YARN-1492
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.4-alpha
Reporter: Sangjin Lee
Assignee: Chris Trezzo
Priority: Critical
 Attachments: YARN-1492-all-trunk-v1.patch, 
 YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
 YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
 shared_cache_design.pdf, shared_cache_design_v2.pdf, 
 shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
 shared_cache_design_v5.pdf, shared_cache_design_v6.pdf


 Currently there is the distributed cache that enables you to cache jars and 
 files so that attempts from the same job can reuse them. However, sharing is 
 limited with the distributed cache because it is normally on a per-job basis. 
 On a large cluster, sometimes copying of jobjars and libjars becomes so 
 prevalent that it consumes a large portion of the network bandwidth, not to 
 speak of defeating the purpose of bringing compute to where data is. This 
 is wasteful because in most cases code doesn't change much across many jobs.
 I'd like to propose and discuss feasibility of introducing a truly shared 
 cache so that multiple jobs from multiple users can share and cache jars. 
 This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2927) InMemorySCMStore properties are inconsistent

2014-12-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237803#comment-14237803
 ] 

Hudson commented on YARN-2927:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #769 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/769/])
YARN-2927. [YARN-1492] InMemorySCMStore properties are inconsistent. (Ray 
Chiang via kasha) (kasha: rev 120e1decd7f6861e753269690d454cb14c240857)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt


 InMemorySCMStore properties are inconsistent
 

 Key: YARN-2927
 URL: https://issues.apache.org/jira/browse/YARN-2927
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: newbie, supportability
 Fix For: 2.7.0

 Attachments: YARN-2927.001.patch, YARN-2927.002.patch


 I see these properties in the yarn-default.xml file:
   yarn.sharedcache.store.in-memory.check-period-mins
   yarn.sharedcache.store.in-memory.initial-delay-mins
   yarn.sharedcache.store.in-memory.staleness-period-mins
 YarnConfiguration looks like it's missing some properties:
   public static final String SHARED_CACHE_PREFIX = yarn.sharedcache.;
   public static final String SCM_STORE_PREFIX = SHARED_CACHE_PREFIX + 
 store.;
   public static final String IN_MEMORY_STORE_PREFIX = SHARED_CACHE_PREFIX + 
 in-memory.;
   public static final String IN_MEMORY_STALENESS_PERIOD_MINS = 
 IN_MEMORY_STORE_PREFIX + staleness-period-mins;
 It looks like the definition for IN_MEMORY_STORE_PREFIX should be:
   public static final String IN_MEMORY_STORE_PREFIX = SCM_STORE_PREFIX + 
 in-memory.;
 Just to be clear, there are properties that exist in yarn-default.xml that 
 are effectively misspelled in the *Java* file, not the .xml file.  This is 
 similar to YARN-2461 and MAPREDUCE-6087.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2291) Timeline and RM web services should use same authentication code

2014-12-08 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237805#comment-14237805
 ] 

Varun Vasudev commented on YARN-2291:
-

YARN-2656 changes the RM to use the filter in hadoop-common. Closing this.

 Timeline and RM web services should use same authentication code
 

 Key: YARN-2291
 URL: https://issues.apache.org/jira/browse/YARN-2291
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.6.0


 The TimelineServer and the RM web services have very similar requirements and 
 implementation for authentication via delegation tokens apart from the fact 
 that the RM web services requires delegation tokens to be passed as a header. 
 They should use the same code base instead of different implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2291) Timeline and RM web services should use same authentication code

2014-12-08 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev resolved YARN-2291.
-
   Resolution: Fixed
Fix Version/s: 2.6.0

 Timeline and RM web services should use same authentication code
 

 Key: YARN-2291
 URL: https://issues.apache.org/jira/browse/YARN-2291
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.6.0


 The TimelineServer and the RM web services have very similar requirements and 
 implementation for authentication via delegation tokens apart from the fact 
 that the RM web services requires delegation tokens to be passed as a header. 
 They should use the same code base instead of different implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2292) RM web services should use hadoop-common for authentication using delegation tokens

2014-12-08 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev resolved YARN-2292.
-
   Resolution: Fixed
Fix Version/s: 2.6.0

YARN-2656 changes the RM to use the filter in hadoop-common. Closing this.

 RM web services should use hadoop-common for authentication using delegation 
 tokens
 ---

 Key: YARN-2292
 URL: https://issues.apache.org/jira/browse/YARN-2292
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.6.0


 HADOOP-10771 refactors the WebHDFS authentication code to hadoop-common. 
 YARN-2290 will add support for passing delegation tokens via headers. Once 
 support is added RM web services should use the authentication code from 
 hadoop-common



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2426) ResourceManger is not able renew WebHDFS token when application submitted by Yarn WebService

2014-12-08 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev resolved YARN-2426.
-
   Resolution: Fixed
Fix Version/s: 2.6.0

Fixed with HDFS-6904 exposing an API to allow clients to set the service.

 ResourceManger is not able renew WebHDFS token when application submitted by 
 Yarn WebService
 

 Key: YARN-2426
 URL: https://issues.apache.org/jira/browse/YARN-2426
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager, webapp
Affects Versions: 2.6.0
 Environment: Hadoop Keberos (Secure) cluster with 
 LinuxContainerExcutor is enabled
 With SPNEGO on for Yarn new RM web services for application submission
 So during application submission xml/json structure was pass webhdfs token
Reporter: Karam Singh
Assignee: Varun Vasudev
 Fix For: 2.6.0


 Encountered this issue during using new YARN's RM WS for application 
 submission, on single node cluster while submitting Distributed Shell 
 application using RM WS(webservice).
 For this we need  pass custom script and AppMaster jar along with webhdfs 
 token.
 Application was failing with ResouceManager was failing to renew token for 
 user (appOwner). So RM was Rejecting application with following exception 
 trace in RM log:
 {code}
 2014-08-19 03:12:54,733 WARN  security.DelegationTokenRenewer 
 (DelegationTokenRenewer.java:handleDTRenewerAppSubmitEvent(661)) - Unable to 
 add the application to the delegation token renewer.
 java.io.IOException: Failed to renew token: Kind: WEBHDFS delegation, 
 Service: NNHOST:FSPORT, Ident: (WEBHDFS delegation token  for hrt_qa)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:394)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$5(DelegationTokenRenewer.java:357)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:657)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:638)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.io.IOException: Unexpected HTTP response: code=-1 != 200, 
 op=RENEWDELEGATIONTOKEN, message=null
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:331)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:90)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:598)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:448)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:477)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:473)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.renewDelegationToken(WebHdfsFileSystem.java:1318)
 at 
 org.apache.hadoop.hdfs.web.TokenAspect$TokenManager.renew(TokenAspect.java:73)
 at org.apache.hadoop.security.token.Token.renew(Token.java:377)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:477)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:1)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:473)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:392)
 ... 6 more
 Caused by: java.io.IOException: The error stream is null.
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.jsonParse(WebHdfsFileSystem.java:304)
 at 
 

[jira] [Commented] (YARN-2927) InMemorySCMStore properties are inconsistent

2014-12-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237830#comment-14237830
 ] 

Hudson commented on YARN-2927:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1964 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1964/])
YARN-2927. [YARN-1492] InMemorySCMStore properties are inconsistent. (Ray 
Chiang via kasha) (kasha: rev 120e1decd7f6861e753269690d454cb14c240857)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


 InMemorySCMStore properties are inconsistent
 

 Key: YARN-2927
 URL: https://issues.apache.org/jira/browse/YARN-2927
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: newbie, supportability
 Fix For: 2.7.0

 Attachments: YARN-2927.001.patch, YARN-2927.002.patch


 I see these properties in the yarn-default.xml file:
   yarn.sharedcache.store.in-memory.check-period-mins
   yarn.sharedcache.store.in-memory.initial-delay-mins
   yarn.sharedcache.store.in-memory.staleness-period-mins
 YarnConfiguration looks like it's missing some properties:
   public static final String SHARED_CACHE_PREFIX = yarn.sharedcache.;
   public static final String SCM_STORE_PREFIX = SHARED_CACHE_PREFIX + 
 store.;
   public static final String IN_MEMORY_STORE_PREFIX = SHARED_CACHE_PREFIX + 
 in-memory.;
   public static final String IN_MEMORY_STALENESS_PERIOD_MINS = 
 IN_MEMORY_STORE_PREFIX + staleness-period-mins;
 It looks like the definition for IN_MEMORY_STORE_PREFIX should be:
   public static final String IN_MEMORY_STORE_PREFIX = SCM_STORE_PREFIX + 
 in-memory.;
 Just to be clear, there are properties that exist in yarn-default.xml that 
 are effectively misspelled in the *Java* file, not the .xml file.  This is 
 similar to YARN-2461 and MAPREDUCE-6087.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2014-12-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237829#comment-14237829
 ] 

Hudson commented on YARN-1492:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1964 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1964/])
YARN-2927. [YARN-1492] InMemorySCMStore properties are inconsistent. (Ray 
Chiang via kasha) (kasha: rev 120e1decd7f6861e753269690d454cb14c240857)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


 truly shared cache for jars (jobjar/libjar)
 ---

 Key: YARN-1492
 URL: https://issues.apache.org/jira/browse/YARN-1492
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.4-alpha
Reporter: Sangjin Lee
Assignee: Chris Trezzo
Priority: Critical
 Attachments: YARN-1492-all-trunk-v1.patch, 
 YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
 YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
 shared_cache_design.pdf, shared_cache_design_v2.pdf, 
 shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
 shared_cache_design_v5.pdf, shared_cache_design_v6.pdf


 Currently there is the distributed cache that enables you to cache jars and 
 files so that attempts from the same job can reuse them. However, sharing is 
 limited with the distributed cache because it is normally on a per-job basis. 
 On a large cluster, sometimes copying of jobjars and libjars becomes so 
 prevalent that it consumes a large portion of the network bandwidth, not to 
 speak of defeating the purpose of bringing compute to where data is. This 
 is wasteful because in most cases code doesn't change much across many jobs.
 I'd like to propose and discuss feasibility of introducing a truly shared 
 cache so that multiple jobs from multiple users can share and cache jars. 
 This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2927) InMemorySCMStore properties are inconsistent

2014-12-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237890#comment-14237890
 ] 

Hudson commented on YARN-2927:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #32 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/32/])
YARN-2927. [YARN-1492] InMemorySCMStore properties are inconsistent. (Ray 
Chiang via kasha) (kasha: rev 120e1decd7f6861e753269690d454cb14c240857)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt


 InMemorySCMStore properties are inconsistent
 

 Key: YARN-2927
 URL: https://issues.apache.org/jira/browse/YARN-2927
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: newbie, supportability
 Fix For: 2.7.0

 Attachments: YARN-2927.001.patch, YARN-2927.002.patch


 I see these properties in the yarn-default.xml file:
   yarn.sharedcache.store.in-memory.check-period-mins
   yarn.sharedcache.store.in-memory.initial-delay-mins
   yarn.sharedcache.store.in-memory.staleness-period-mins
 YarnConfiguration looks like it's missing some properties:
   public static final String SHARED_CACHE_PREFIX = yarn.sharedcache.;
   public static final String SCM_STORE_PREFIX = SHARED_CACHE_PREFIX + 
 store.;
   public static final String IN_MEMORY_STORE_PREFIX = SHARED_CACHE_PREFIX + 
 in-memory.;
   public static final String IN_MEMORY_STALENESS_PERIOD_MINS = 
 IN_MEMORY_STORE_PREFIX + staleness-period-mins;
 It looks like the definition for IN_MEMORY_STORE_PREFIX should be:
   public static final String IN_MEMORY_STORE_PREFIX = SCM_STORE_PREFIX + 
 in-memory.;
 Just to be clear, there are properties that exist in yarn-default.xml that 
 are effectively misspelled in the *Java* file, not the .xml file.  This is 
 similar to YARN-2461 and MAPREDUCE-6087.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2014-12-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237889#comment-14237889
 ] 

Hudson commented on YARN-1492:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #32 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/32/])
YARN-2927. [YARN-1492] InMemorySCMStore properties are inconsistent. (Ray 
Chiang via kasha) (kasha: rev 120e1decd7f6861e753269690d454cb14c240857)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt


 truly shared cache for jars (jobjar/libjar)
 ---

 Key: YARN-1492
 URL: https://issues.apache.org/jira/browse/YARN-1492
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.4-alpha
Reporter: Sangjin Lee
Assignee: Chris Trezzo
Priority: Critical
 Attachments: YARN-1492-all-trunk-v1.patch, 
 YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
 YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
 shared_cache_design.pdf, shared_cache_design_v2.pdf, 
 shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
 shared_cache_design_v5.pdf, shared_cache_design_v6.pdf


 Currently there is the distributed cache that enables you to cache jars and 
 files so that attempts from the same job can reuse them. However, sharing is 
 limited with the distributed cache because it is normally on a per-job basis. 
 On a large cluster, sometimes copying of jobjars and libjars becomes so 
 prevalent that it consumes a large portion of the network bandwidth, not to 
 speak of defeating the purpose of bringing compute to where data is. This 
 is wasteful because in most cases code doesn't change much across many jobs.
 I'd like to propose and discuss feasibility of introducing a truly shared 
 cache so that multiple jobs from multiple users can share and cache jars. 
 This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2014-12-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237910#comment-14237910
 ] 

Hadoop QA commented on YARN-2902:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12685607/YARN-2902.002.patch
  against trunk revision 8963515.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6031//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6031//console

This message is automatically generated.

 Killing a container that is localizing can orphan resources in the 
 DOWNLOADING state
 

 Key: YARN-2902
 URL: https://issues.apache.org/jira/browse/YARN-2902
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-2902.002.patch, YARN-2902.patch


 If a container is in the process of localizing when it is stopped/killed then 
 resources are left in the DOWNLOADING state.  If no other container comes 
 along and requests these resources they linger around with no reference 
 counts but aren't cleaned up during normal cache cleanup scans since it will 
 never delete resources in the DOWNLOADING state even if their reference count 
 is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2910) FSLeafQueue can throw ConcurrentModificationException

2014-12-08 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-2910:

Attachment: YARN-2910.4.patch

I did not change the assignment :-(

yes, the {{when(schedulable.getResourceUsage()).thenReturn(smallResource);}} 
should not have been in the patch, my mistake. Not sure how that ended up in 
the patch I used it during development but not in the last tests.

On my machine the test failed with just adding applications. The issue seems to 
be in the initialisation of the application attempt. When I added debug into 
the test run I can see the initialisation of the app attempt in the mock taking 
up a lot of time which meant that the {{getResourceUsage}} almost always ran 
over an empty list unless the number of iterations was raised above 1000. As 
soon as I moved the creation out of the thread the failure occurs within 5 
iterations of the {{getResourceUsage}} call in the second thread after adding 
less than 15 or so app instances.

I have attached an updated patch which passes with the new code and has a 100% 
failure rate with the old code. This version of the test runs faster and is 
more reliable than the previous ones.

 FSLeafQueue can throw ConcurrentModificationException
 -

 Key: YARN-2910
 URL: https://issues.apache.org/jira/browse/YARN-2910
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.5.2
Reporter: Wilfred Spiegelenburg
Assignee: Ray Chiang
 Attachments: FSLeafQueue_concurrent_exception.txt, 
 YARN-2910.004.patch, YARN-2910.1.patch, YARN-2910.2.patch, YARN-2910.3.patch, 
 YARN-2910.4.patch, YARN-2910.patch


 The list that maintains the runnable and the non runnable apps are a standard 
 ArrayList but there is no guarantee that it will only be manipulated by one 
 thread in the system. This can lead to the following exception:
 {noformat}
 2014-11-12 02:29:01,169 ERROR [RMCommunicator Allocator] 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN 
 CONTACTING RM.
 java.util.ConcurrentModificationException: 
 java.util.ConcurrentModificationException
 at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859)
 at java.util.ArrayList$Itr.next(ArrayList.java:831)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.getResourceUsage(FSLeafQueue.java:147)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.getHeadroom(FSAppAttempt.java:180)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:923)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:516)
 {noformat}
 Full stack trace in the attached file.
 We should guard against that by using a thread safe version from 
 java.util.concurrent.CopyOnWriteArrayList



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2922) Concurrent Modification Exception in LeafQueue when collecting applications

2014-12-08 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2922:
-
Attachment: 0001-YARN-2922.patch

 Concurrent Modification Exception in LeafQueue when collecting applications
 ---

 Key: YARN-2922
 URL: https://issues.apache.org/jira/browse/YARN-2922
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.5.1
Reporter: Jason Tufo
Assignee: Rohith
 Attachments: 0001-YARN-2922.patch


 java.util.ConcurrentModificationException
 at 
 java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115)
 at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.collectSchedulerApplications(LeafQueue.java:1618)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getAppsInQueue(CapacityScheduler.java:1119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueInfo(ClientRMService.java:798)
 at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:234)
 at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
 at 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
 at 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2892) Unable to get AMRMToken in unmanaged AM when using a secure cluster

2014-12-08 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238000#comment-14238000
 ] 

Junping Du commented on YARN-2892:
--

Wait... I looked at the patch again, looks like it will bring serious 
incompatible issue: the applicationReport return to client include short name 
now rather than full name before. We should be super carefully now as we are 
supporting YARN rolling upgrade since 2.6. Any other ways?

 Unable to get AMRMToken in unmanaged AM when using a secure cluster
 ---

 Key: YARN-2892
 URL: https://issues.apache.org/jira/browse/YARN-2892
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Sevada Abraamyan
Assignee: Sevada Abraamyan
 Attachments: YARN-2892.patch, YARN-2892.patch, YARN-2892.patch


 An AMRMToken is retrieved from the ApplicationReport by the YarnClient. 
 When the RM creates the ApplicationReport and sends it back to the client it 
 makes a simple security check whether it should include the AMRMToken in the 
 report (See createAndGetApplicationReport in RMAppImpl).This security check 
 verifies that the user who submitted the original application is the same 
 user who is requesting the ApplicationReport. If they are indeed the same 
 user then it includes the AMRMToken, otherwise it does not include it.
 The problem arises from the fact that when an application is submitted, the 
 RM  saves the short username of the user who created the application (See 
 submitApplication in ClientRmService). Afterwards when the ApplicationReport 
 is requested, the system tries to match the full username of the requester 
 against the previously stored short username. 
 In a secure cluster using Kerberos this check fails because the principle is 
 stripped from the username when we request a short username. So for example 
 the short username might be Foo whereas the full username is 
 f...@company.com
 Note: A very similar problem has been previously reported 
 ([Yarn-2232|https://issues.apache.org/jira/browse/YARN-2232])



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2910) FSLeafQueue can throw ConcurrentModificationException

2014-12-08 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238007#comment-14238007
 ] 

Wilfred Spiegelenburg commented on YARN-2910:
-

The fix causes the 
{{org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler}}
 to fail.
There is a deadlock that is created by the synchronised read access in the leaf 
queue for the {{runnableApps}}. If an app has two containers at different 
stages in the allocation it can happen that the {{appAttempt}} is locked by one 
and the {{runnableApps}} by the second causing the hang.

This is what I was afraid of when I mentioned the slow down, I did not 
anticipate it this bad but the number of reads far outnumber the writes.
The earlier proposed CopyOnWriteArrayList will also not work due to the sort 
that is called (and I overlooked) which is not supported.

 FSLeafQueue can throw ConcurrentModificationException
 -

 Key: YARN-2910
 URL: https://issues.apache.org/jira/browse/YARN-2910
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.5.2
Reporter: Wilfred Spiegelenburg
Assignee: Ray Chiang
 Attachments: FSLeafQueue_concurrent_exception.txt, 
 YARN-2910.004.patch, YARN-2910.1.patch, YARN-2910.2.patch, YARN-2910.3.patch, 
 YARN-2910.4.patch, YARN-2910.patch


 The list that maintains the runnable and the non runnable apps are a standard 
 ArrayList but there is no guarantee that it will only be manipulated by one 
 thread in the system. This can lead to the following exception:
 {noformat}
 2014-11-12 02:29:01,169 ERROR [RMCommunicator Allocator] 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN 
 CONTACTING RM.
 java.util.ConcurrentModificationException: 
 java.util.ConcurrentModificationException
 at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859)
 at java.util.ArrayList$Itr.next(ArrayList.java:831)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.getResourceUsage(FSLeafQueue.java:147)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.getHeadroom(FSAppAttempt.java:180)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:923)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:516)
 {noformat}
 Full stack trace in the attached file.
 We should guard against that by using a thread safe version from 
 java.util.concurrent.CopyOnWriteArrayList



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2464) Provide Hadoop as a local resource (on HDFS) which can be used by other projects

2014-12-08 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2464:
-
Summary: Provide Hadoop as a local resource (on HDFS) which can be used by 
other projects  (was: Provide Hadoop as a local resource (on HDFS) which can be 
used by other projcets)

 Provide Hadoop as a local resource (on HDFS) which can be used by other 
 projects
 

 Key: YARN-2464
 URL: https://issues.apache.org/jira/browse/YARN-2464
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Junping Du

 DEFAULT_YARN_APPLICATION_CLASSPATH are used by YARN projects to setup their 
 AM / task classpaths if they have a dependency on Hadoop libraries.
 It'll be useful to provide similar access to a Hadoop tarball (Hadoop libs, 
 native libraries) etc, which could be used instead - for applications which 
 do not want to rely upon Hadoop versions from a cluster node. This would also 
 require functionality to update the classpath/env for the apps based on the 
 structure of the tar.
 As an example, MR has support for a full tar (for rolling upgrades). 
 Similarly, Tez ships hadoop libraries along with it's build. I'm not sure 
 about the Spark / Storm / HBase model for this - but using a common copy 
 instead of everyone localizing Hadoop libraries would be useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2922) Concurrent Modification Exception in LeafQueue when collecting applications

2014-12-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238058#comment-14238058
 ] 

Hadoop QA commented on YARN-2922:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12685762/0001-YARN-2922.patch
  against trunk revision 8963515.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6034//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6034//console

This message is automatically generated.

 Concurrent Modification Exception in LeafQueue when collecting applications
 ---

 Key: YARN-2922
 URL: https://issues.apache.org/jira/browse/YARN-2922
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.5.1
Reporter: Jason Tufo
Assignee: Rohith
 Attachments: 0001-YARN-2922.patch


 java.util.ConcurrentModificationException
 at 
 java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115)
 at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.collectSchedulerApplications(LeafQueue.java:1618)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getAppsInQueue(CapacityScheduler.java:1119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueInfo(ClientRMService.java:798)
 at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:234)
 at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
 at 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
 at 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2900) Application (Attempt and Container) Not Found in AHS results in Internal Server Error (500)

2014-12-08 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238075#comment-14238075
 ] 

Mit Desai commented on YARN-2900:
-

Thanks. I will check that out.

 Application (Attempt and Container) Not Found in AHS results in Internal 
 Server Error (500)
 ---

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, 
 YARN-2900.patch, YARN-2900.patch, YARN-2900.patch


 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2910) FSLeafQueue can throw ConcurrentModificationException

2014-12-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238096#comment-14238096
 ] 

Hadoop QA commented on YARN-2910:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12685755/YARN-2910.4.patch
  against trunk revision 8963515.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs
  org.apache.hadoop.yarn.server.resourcemanager.TestRM
  
org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs
  
org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService
  
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens
  
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6033//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6033//console

This message is automatically generated.

 FSLeafQueue can throw ConcurrentModificationException
 -

 Key: YARN-2910
 URL: https://issues.apache.org/jira/browse/YARN-2910
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.5.2
Reporter: Wilfred Spiegelenburg
Assignee: Ray Chiang
 Attachments: FSLeafQueue_concurrent_exception.txt, 
 YARN-2910.004.patch, YARN-2910.1.patch, YARN-2910.2.patch, YARN-2910.3.patch, 
 YARN-2910.4.patch, YARN-2910.patch


 The list that maintains the runnable and the non runnable apps are a standard 
 ArrayList but there is no guarantee that it will only be manipulated by one 
 thread in the system. This can lead to the following exception:
 {noformat}
 2014-11-12 02:29:01,169 ERROR [RMCommunicator Allocator] 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN 
 CONTACTING RM.
 java.util.ConcurrentModificationException: 
 java.util.ConcurrentModificationException
 at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859)
 at java.util.ArrayList$Itr.next(ArrayList.java:831)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.getResourceUsage(FSLeafQueue.java:147)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.getHeadroom(FSAppAttempt.java:180)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:923)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:516)
 {noformat}
 Full stack trace in the attached file.
 We should guard against that by using a thread safe version from 
 java.util.concurrent.CopyOnWriteArrayList



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2683) registry config options: document and move to core-default

2014-12-08 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-2683:
-
Attachment: HADOOP-10530-005.patch

patch -005; rebased against trunk commit  144da2

 registry config options: document and move to core-default
 --

 Key: YARN-2683
 URL: https://issues.apache.org/jira/browse/YARN-2683
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Affects Versions: 2.6.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HADOOP-10530-005.patch, YARN-2683-001.patch, 
 YARN-2683-002.patch, YARN-2683-003.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 Add to {{yarn-site}} a page on registry configuration parameters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2762) RMAdminCLI node-labels-related args should be trimmed and checked before sending to RM

2014-12-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238130#comment-14238130
 ] 

Wangda Tan commented on YARN-2762:
--

[~rohithsharma],
Thanks for update,
Last two minor comments:
- Rename {{NO_LABEL}} to {{NO_LABEL_ERR_MSG}}.
- Make {{No node-to-labels mappings are specified}} to a final field as well, 
like {{NO_MAPPING_ERR_MSG}}.

Wangda

 RMAdminCLI node-labels-related args should be trimmed and checked before 
 sending to RM
 --

 Key: YARN-2762
 URL: https://issues.apache.org/jira/browse/YARN-2762
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Minor
 Attachments: YARN-2762.1.patch, YARN-2762.2.patch, YARN-2762.2.patch, 
 YARN-2762.3.patch, YARN-2762.4.patch, YARN-2762.5.patch, YARN-2762.patch


 All NodeLabel args validation's are done at server side. The same can be done 
 at RMAdminCLI so that unnecessary RPC calls can be avoided.
 And for the input such as x,y,,z,, no need to add empty string instead can 
 be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2892) Unable to get AMRMToken in unmanaged AM when using a secure cluster

2014-12-08 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238137#comment-14238137
 ] 

Rohith commented on YARN-2892:
--

bq. the applicationReport return to client include short name now rather than 
full name before.
I did not get where exactly breaking compatibility.. Before patch also, 
application report sends short name instead of full name. Am I missing anything?

 Unable to get AMRMToken in unmanaged AM when using a secure cluster
 ---

 Key: YARN-2892
 URL: https://issues.apache.org/jira/browse/YARN-2892
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Sevada Abraamyan
Assignee: Sevada Abraamyan
 Attachments: YARN-2892.patch, YARN-2892.patch, YARN-2892.patch


 An AMRMToken is retrieved from the ApplicationReport by the YarnClient. 
 When the RM creates the ApplicationReport and sends it back to the client it 
 makes a simple security check whether it should include the AMRMToken in the 
 report (See createAndGetApplicationReport in RMAppImpl).This security check 
 verifies that the user who submitted the original application is the same 
 user who is requesting the ApplicationReport. If they are indeed the same 
 user then it includes the AMRMToken, otherwise it does not include it.
 The problem arises from the fact that when an application is submitted, the 
 RM  saves the short username of the user who created the application (See 
 submitApplication in ClientRmService). Afterwards when the ApplicationReport 
 is requested, the system tries to match the full username of the requester 
 against the previously stored short username. 
 In a secure cluster using Kerberos this check fails because the principle is 
 stripped from the username when we request a short username. So for example 
 the short username might be Foo whereas the full username is 
 f...@company.com
 Note: A very similar problem has been previously reported 
 ([Yarn-2232|https://issues.apache.org/jira/browse/YARN-2232])



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation

2014-12-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238140#comment-14238140
 ] 

Hadoop QA commented on YARN-2637:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12685475/YARN-2637.16.patch
  against trunk revision 144da2e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6035//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6035//console

This message is automatically generated.

 maximum-am-resource-percent could be violated when resource of AM is  
 minimumAllocation
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.2.patch, YARN-2637.6.patch, YARN-2637.7.patch, 
 YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation

2014-12-08 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238139#comment-14238139
 ] 

Junping Du commented on YARN-2637:
--

I manually kick off Jenkins test again for latest patch.

 maximum-am-resource-percent could be violated when resource of AM is  
 minimumAllocation
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.2.patch, YARN-2637.6.patch, YARN-2637.7.patch, 
 YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2910) FSLeafQueue can throw ConcurrentModificationException

2014-12-08 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-2910:
-
Assignee: Wilfred Spiegelenburg  (was: Ray Chiang)

 FSLeafQueue can throw ConcurrentModificationException
 -

 Key: YARN-2910
 URL: https://issues.apache.org/jira/browse/YARN-2910
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.5.2
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg
 Attachments: FSLeafQueue_concurrent_exception.txt, 
 YARN-2910.004.patch, YARN-2910.1.patch, YARN-2910.2.patch, YARN-2910.3.patch, 
 YARN-2910.4.patch, YARN-2910.patch


 The list that maintains the runnable and the non runnable apps are a standard 
 ArrayList but there is no guarantee that it will only be manipulated by one 
 thread in the system. This can lead to the following exception:
 {noformat}
 2014-11-12 02:29:01,169 ERROR [RMCommunicator Allocator] 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN 
 CONTACTING RM.
 java.util.ConcurrentModificationException: 
 java.util.ConcurrentModificationException
 at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859)
 at java.util.ArrayList$Itr.next(ArrayList.java:831)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.getResourceUsage(FSLeafQueue.java:147)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.getHeadroom(FSAppAttempt.java:180)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:923)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:516)
 {noformat}
 Full stack trace in the attached file.
 We should guard against that by using a thread safe version from 
 java.util.concurrent.CopyOnWriteArrayList



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2014-12-08 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2495:

Attachment: YARN-2495.20141208-1.patch

hi [~wangda],
I have set the default implementation class to be null as per your comment, but 
as you mentioned after review, based on feedback need to set default class with 
either configuration based or script based Node Labels provider as default 
provider class 

 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
 YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
 using script suggested by [~aw] (YARN-2729) )
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2925) Internal fields in LeafQueue access should be protected when accessed from FiCaSchedulerApp to calculate Headroom

2014-12-08 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238155#comment-14238155
 ] 

Craig Welch commented on YARN-2925:
---

Hmm, there might be an even simpler approach - if we placed lock(s) (just a 
single lock, or potentially read/write) in the LeafQueue and then just held 
them around the final headroom calculation and the two locations where other 
changes occur (user comsumed +- and queue usedResources +-), all of which I 
believe occur in leaf queue, and then setup the lastClusterResource to be 
copied (inside the (write)lock), I think this would be resolved, and it would 
not be much of a change / much code.  In fact, we would not need the 
queueresourceinfo at all, and could potentially drop the headroominfo as well.  
[~leftnoteasy] I think this might actually bethe simplest approach, Thoughts?

 Internal fields in LeafQueue access should be protected when accessed from 
 FiCaSchedulerApp to calculate Headroom
 -

 Key: YARN-2925
 URL: https://issues.apache.org/jira/browse/YARN-2925
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Critical
 Attachments: YARN-2925.1.patch


 Upon YARN-2644, FiCaScheduler will calculation up-to-date headroom before 
 sending back Allocation response to AM.
 Headroom calculation is happened in LeafQueue side, uses fields like used 
 resource, etc. But it is not protected by any lock of LeafQueue, so it might 
 be corrupted is someone else is editing it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2014-12-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238157#comment-14238157
 ] 

Wangda Tan commented on YARN-2495:
--

[~Naganarasimha],
I meant you can leave it empty in this patch and set it after you have 
completed script/conf-based patch, just add a TODO comment should be fine.
Make sense?

 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
 YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
 using script suggested by [~aw] (YARN-2729) )
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2517) Implement TimelineClientAsync

2014-12-08 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238158#comment-14238158
 ] 

Zhijie Shen commented on YARN-2517:
---

[~ozawa], we may hay to hang on a bit about the async call implementation. 
Recently folks have some offline discussion around the timeline server next 
gen. Some architecture may be changed in the future, Would you please keep an 
eye on YARN-2928? 

 Implement TimelineClientAsync
 -

 Key: YARN-2517
 URL: https://issues.apache.org/jira/browse/YARN-2517
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2517.1.patch, YARN-2517.2.patch


 In some scenarios, we'd like to put timeline entities in another thread no to 
 block the current one.
 It's good to have a TimelineClientAsync like AMRMClientAsync and 
 NMClientAsync. It can buffer entities, put them in a separate thread, and 
 have callback to handle the responses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2014-12-08 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238165#comment-14238165
 ] 

Sangjin Lee commented on YARN-2928:
---

Thanks [~vinodkv]! I'll post the design doc pretty soon (today or tomorrow).

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee

 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be address.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2762) RMAdminCLI node-labels-related args should be trimmed and checked before sending to RM

2014-12-08 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2762:
-
Attachment: YARN-2762.6.patch

 RMAdminCLI node-labels-related args should be trimmed and checked before 
 sending to RM
 --

 Key: YARN-2762
 URL: https://issues.apache.org/jira/browse/YARN-2762
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Minor
 Attachments: YARN-2762.1.patch, YARN-2762.2.patch, YARN-2762.2.patch, 
 YARN-2762.3.patch, YARN-2762.4.patch, YARN-2762.5.patch, YARN-2762.6.patch, 
 YARN-2762.patch


 All NodeLabel args validation's are done at server side. The same can be done 
 at RMAdminCLI so that unnecessary RPC calls can be avoided.
 And for the input such as x,y,,z,, no need to add empty string instead can 
 be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2762) RMAdminCLI node-labels-related args should be trimmed and checked before sending to RM

2014-12-08 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238176#comment-14238176
 ] 

Rohith commented on YARN-2762:
--

Updated the patch fixing review comment. Kindly review updated patch

 RMAdminCLI node-labels-related args should be trimmed and checked before 
 sending to RM
 --

 Key: YARN-2762
 URL: https://issues.apache.org/jira/browse/YARN-2762
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Minor
 Attachments: YARN-2762.1.patch, YARN-2762.2.patch, YARN-2762.2.patch, 
 YARN-2762.3.patch, YARN-2762.4.patch, YARN-2762.5.patch, YARN-2762.6.patch, 
 YARN-2762.patch


 All NodeLabel args validation's are done at server side. The same can be done 
 at RMAdminCLI so that unnecessary RPC calls can be avoided.
 And for the input such as x,y,,z,, no need to add empty string instead can 
 be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2284) Find missing config options in YarnConfiguration and yarn-default.xml

2014-12-08 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238183#comment-14238183
 ] 

Ray Chiang commented on YARN-2284:
--

Initial version of this fix needs all Configuration properties to exist within 
the .xml files.

 Find missing config options in YarnConfiguration and yarn-default.xml
 -

 Key: YARN-2284
 URL: https://issues.apache.org/jira/browse/YARN-2284
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.4.1
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Minor
  Labels: supportability
 Attachments: YARN-2284-04.patch, YARN-2284-05.patch, 
 YARN-2284-06.patch, YARN-2284-07.patch, YARN-2284-08.patch, 
 YARN2284-01.patch, YARN2284-02.patch, YARN2284-03.patch


 YarnConfiguration has one set of properties.  yarn-default.xml has another 
 set of properties.  Ideally, there should be an automatic way to find missing 
 properties in either location.
 This is analogous to MAPREDUCE-5130, but for yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2910) FSLeafQueue can throw ConcurrentModificationException

2014-12-08 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238184#comment-14238184
 ] 

Ray Chiang commented on YARN-2910:
--

Never mind, I misread the earlier JIRA history.  I must have accidentally 
clicked on Assign to me while scrolling around.

With the newest unit test and *without* the code fix (i.e. expecting failures), 
I'm seeing a failure rate around 70%.  I think it would still be a good idea to 
increase the modifications to get the failure rate higher (as Tsuyoshi 
suggested earlier).  I can get 10/10 failures with a value of 400 in the modify 
for loop.

 FSLeafQueue can throw ConcurrentModificationException
 -

 Key: YARN-2910
 URL: https://issues.apache.org/jira/browse/YARN-2910
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.5.2
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg
 Attachments: FSLeafQueue_concurrent_exception.txt, 
 YARN-2910.004.patch, YARN-2910.1.patch, YARN-2910.2.patch, YARN-2910.3.patch, 
 YARN-2910.4.patch, YARN-2910.patch


 The list that maintains the runnable and the non runnable apps are a standard 
 ArrayList but there is no guarantee that it will only be manipulated by one 
 thread in the system. This can lead to the following exception:
 {noformat}
 2014-11-12 02:29:01,169 ERROR [RMCommunicator Allocator] 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN 
 CONTACTING RM.
 java.util.ConcurrentModificationException: 
 java.util.ConcurrentModificationException
 at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859)
 at java.util.ArrayList$Itr.next(ArrayList.java:831)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.getResourceUsage(FSLeafQueue.java:147)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.getHeadroom(FSAppAttempt.java:180)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:923)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:516)
 {noformat}
 Full stack trace in the attached file.
 We should guard against that by using a thread safe version from 
 java.util.concurrent.CopyOnWriteArrayList



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2284) Find missing config options in YarnConfiguration and yarn-default.xml

2014-12-08 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-2284:
-
Attachment: YARN-2284-09.patch

Updated for latest Configuration variables.

 Find missing config options in YarnConfiguration and yarn-default.xml
 -

 Key: YARN-2284
 URL: https://issues.apache.org/jira/browse/YARN-2284
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.4.1
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Minor
  Labels: supportability
 Attachments: YARN-2284-04.patch, YARN-2284-05.patch, 
 YARN-2284-06.patch, YARN-2284-07.patch, YARN-2284-08.patch, 
 YARN-2284-09.patch, YARN2284-01.patch, YARN2284-02.patch, YARN2284-03.patch


 YarnConfiguration has one set of properties.  yarn-default.xml has another 
 set of properties.  Ideally, there should be an automatic way to find missing 
 properties in either location.
 This is analogous to MAPREDUCE-5130, but for yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2927) InMemorySCMStore properties are inconsistent

2014-12-08 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238211#comment-14238211
 ] 

Chris Trezzo commented on YARN-2927:


Thanks [~rchiang] for the fix!

 InMemorySCMStore properties are inconsistent
 

 Key: YARN-2927
 URL: https://issues.apache.org/jira/browse/YARN-2927
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: newbie, supportability
 Fix For: 2.7.0

 Attachments: YARN-2927.001.patch, YARN-2927.002.patch


 I see these properties in the yarn-default.xml file:
   yarn.sharedcache.store.in-memory.check-period-mins
   yarn.sharedcache.store.in-memory.initial-delay-mins
   yarn.sharedcache.store.in-memory.staleness-period-mins
 YarnConfiguration looks like it's missing some properties:
   public static final String SHARED_CACHE_PREFIX = yarn.sharedcache.;
   public static final String SCM_STORE_PREFIX = SHARED_CACHE_PREFIX + 
 store.;
   public static final String IN_MEMORY_STORE_PREFIX = SHARED_CACHE_PREFIX + 
 in-memory.;
   public static final String IN_MEMORY_STALENESS_PERIOD_MINS = 
 IN_MEMORY_STORE_PREFIX + staleness-period-mins;
 It looks like the definition for IN_MEMORY_STORE_PREFIX should be:
   public static final String IN_MEMORY_STORE_PREFIX = SCM_STORE_PREFIX + 
 in-memory.;
 Just to be clear, there are properties that exist in yarn-default.xml that 
 are effectively misspelled in the *Java* file, not the .xml file.  This is 
 similar to YARN-2461 and MAPREDUCE-6087.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2762) RMAdminCLI node-labels-related args should be trimmed and checked before sending to RM

2014-12-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238237#comment-14238237
 ] 

Hadoop QA commented on YARN-2762:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12685790/YARN-2762.6.patch
  against trunk revision 144da2e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 31 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

  org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
  
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6037//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6037//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6037//console

This message is automatically generated.

 RMAdminCLI node-labels-related args should be trimmed and checked before 
 sending to RM
 --

 Key: YARN-2762
 URL: https://issues.apache.org/jira/browse/YARN-2762
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Minor
 Attachments: YARN-2762.1.patch, YARN-2762.2.patch, YARN-2762.2.patch, 
 YARN-2762.3.patch, YARN-2762.4.patch, YARN-2762.5.patch, YARN-2762.6.patch, 
 YARN-2762.patch


 All NodeLabel args validation's are done at server side. The same can be done 
 at RMAdminCLI so that unnecessary RPC calls can be avoided.
 And for the input such as x,y,,z,, no need to add empty string instead can 
 be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2900) Application (Attempt and Container) Not Found in AHS results in Internal Server Error (500)

2014-12-08 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238255#comment-14238255
 ] 

Mit Desai commented on YARN-2900:
-

[~zjshen], from the changes that the patch makes, only time that the NotFound 
is thrown is when there is no application|attempt|container that the client is 
asking for. I am not sure why the timelineserver throws some exception and we 
get a NotFound on the browser. Can you explain what was the test that you did 
here?

 Application (Attempt and Container) Not Found in AHS results in Internal 
 Server Error (500)
 ---

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, 
 YARN-2900.patch, YARN-2900.patch, YARN-2900.patch


 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2618) Avoid over-allocation of disk resources

2014-12-08 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2618:
--
Attachment: YARN-2618-2.patch

Thanks, [~kasha]. Update a new patch to fix the comments.
The existing patch works well with FairScheduler. But for FifoScheduler and 
CapacityScheduler, it cannot avoid over-allocating disk resources. This is 
because both Fifo and Capacity only care memory capacity when assigning 
containers to nodes, and support over-consuming for cpu resources. [~jianhe], 
do u know any special reason why CapacityScheduler support over-consuming cpu 
resources?

 Avoid over-allocation of disk resources
 ---

 Key: YARN-2618
 URL: https://issues.apache.org/jira/browse/YARN-2618
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2618-1.patch, YARN-2618-2.patch


 Subtask of YARN-2139. 
 This should include
 - Add API support for introducing disk I/O as the 3rd type resource.
 - NM should report this information to the RM
 - RM should consider this to avoid over-allocation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2683) registry config options: document and move to core-default

2014-12-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238303#comment-14238303
 ] 

Hadoop QA commented on YARN-2683:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12685776/HADOOP-10530-005.patch
  against trunk revision 57cb43b.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6039//console

This message is automatically generated.

 registry config options: document and move to core-default
 --

 Key: YARN-2683
 URL: https://issues.apache.org/jira/browse/YARN-2683
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Affects Versions: 2.6.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HADOOP-10530-005.patch, YARN-2683-001.patch, 
 YARN-2683-002.patch, YARN-2683-003.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 Add to {{yarn-site}} a page on registry configuration parameters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2892) Unable to get AMRMToken in unmanaged AM when using a secure cluster

2014-12-08 Thread Sevada Abraamyan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238307#comment-14238307
 ] 

Sevada Abraamyan commented on YARN-2892:


I don't see this either. However, one thing I did notice is that with the patch 
we are now changing how ClientToAMToken is constructed as we are using the 
short name instead of full name. 

{code}
 @Override
  public ApplicationReport createAndGetApplicationReport(String clientUserName, 
boolean allowAccess) {

if (UserGroupInformation.isSecurityEnabled()) {
  // get a token so the client can communicate with the app attempt
  // NOTE: token may be unavailable if the attempt is not running
  TokenClientToAMTokenIdentifier attemptClientToAMToken =
  this.currentAttempt.createClientToken(clientUserName);
  if (attemptClientToAMToken != null) {
clientToAMToken = BuilderUtils.newClientToAMToken(
attemptClientToAMToken.getIdentifier(),
attemptClientToAMToken.getKind().toString(),
attemptClientToAMToken.getPassword(),
attemptClientToAMToken.getService().toString());
   }
...
{code}

 Unable to get AMRMToken in unmanaged AM when using a secure cluster
 ---

 Key: YARN-2892
 URL: https://issues.apache.org/jira/browse/YARN-2892
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Sevada Abraamyan
Assignee: Sevada Abraamyan
 Attachments: YARN-2892.patch, YARN-2892.patch, YARN-2892.patch


 An AMRMToken is retrieved from the ApplicationReport by the YarnClient. 
 When the RM creates the ApplicationReport and sends it back to the client it 
 makes a simple security check whether it should include the AMRMToken in the 
 report (See createAndGetApplicationReport in RMAppImpl).This security check 
 verifies that the user who submitted the original application is the same 
 user who is requesting the ApplicationReport. If they are indeed the same 
 user then it includes the AMRMToken, otherwise it does not include it.
 The problem arises from the fact that when an application is submitted, the 
 RM  saves the short username of the user who created the application (See 
 submitApplication in ClientRmService). Afterwards when the ApplicationReport 
 is requested, the system tries to match the full username of the requester 
 against the previously stored short username. 
 In a secure cluster using Kerberos this check fails because the principle is 
 stripped from the username when we request a short username. So for example 
 the short username might be Foo whereas the full username is 
 f...@company.com
 Note: A very similar problem has been previously reported 
 ([Yarn-2232|https://issues.apache.org/jira/browse/YARN-2232])



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-405) Add command start-up time to environment of a container to track launch costs

2014-12-08 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah resolved YARN-405.
--
Resolution: Not a Problem

 Add command start-up time to environment of a container to track launch costs
 -

 Key: YARN-405
 URL: https://issues.apache.org/jira/browse/YARN-405
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Hitesh Shah
Assignee: Hitesh Shah
Priority: Minor
  Labels: container
 Attachments: YARN-405.1.patch


 For applications like MapReduce, jvm launch cost has always been considered a 
 factor in performance. Adding some basic information into the environment 
 will allow an application to track its startup costs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-510) Writing Yarn Applications documentation should be changed to signify use of of fully qualified paths when localizing resources

2014-12-08 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-510:
-
Assignee: (was: Hitesh Shah)

 Writing Yarn Applications documentation should be changed to signify use of 
 of fully qualified paths when localizing resources
 --

 Key: YARN-510
 URL: https://issues.apache.org/jira/browse/YARN-510
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.0.0-alpha
Reporter: Hitesh Shah

 Path jarPath = new Path(/Working_HDFS_DIR/+ appId +/+AM_JAR);
 fs.copyFromLocalFile(new Path(/local/src/AM.jar), jarPath); // VALIDATED 
 jar is in HDFS under correct PATH
 FileStatus jarStatus = fs.getFileStatus(jarPath);
 LocalResource amJarRsrc = Records.newRecord(LocalResource.class);
 amJarRsrc.setType(LocalResourceType.FILE);
 amJarRsrc.setVisibility(LocalResourceVisibility.APPLICATION);
 amJarRsrc.setResource(ConverterUtils.getYarnUrlFromPath(jarPath));
 amJarRsrc.setTimestamp(jarStatus.getModificationTime());
 amJarRsrc.setSize(jarStatus.getLen());
 localResources.put(AppMaster.jar,  amJarRsrc);
 amContainer.setLocalResources(localResources);
 Error logs (nodeManager.log)
 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Application application_1364219323374_0016 transitioned from INITING to 
 RUNNING
 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Got exception parsing AppMaster.jar and value resource {, port: -1, file: 
 /Working_HDFS_DIR/application_1364219323374_0016/AM.jar, }, size: 13940, 
 timestamp: 1364230436600, type: FILE, visibility: APPLICATION, 
 2013-03-25 17:53:57,391 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Failed to parse resource-request
 java.net.URISyntaxException: Expected scheme name at index 0: 
 :///Working_HDFS_DIR/application_1364219323374_0016/AM.jar
   at java.net.URI$Parser.fail(URI.java:2810)
   at java.net.URI$Parser.failExpecting(URI.java:2816)
   at java.net.URI$Parser.parse(URI.java:3008)
   at java.net.URI.init(URI.java:735)
   at 
 org.apache.hadoop.yarn.util.ConverterUtils.getPathFromYarnURL(ConverterUtils.java:70)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourceRequest.init(LocalResourceRequest.java:46)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:501)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:472)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:382)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:299)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMa



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-436) Document how to use DistributedShell yarn application

2014-12-08 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-436:
-
Assignee: (was: Hitesh Shah)

 Document how to use DistributedShell yarn application
 -

 Key: YARN-436
 URL: https://issues.apache.org/jira/browse/YARN-436
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Hitesh Shah





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2284) Find missing config options in YarnConfiguration and yarn-default.xml

2014-12-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238332#comment-14238332
 ] 

Hadoop QA commented on YARN-2284:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12685803/YARN-2284-09.patch
  against trunk revision ffe942b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 74 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6038//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6038//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6038//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6038//artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6038//console

This message is automatically generated.

 Find missing config options in YarnConfiguration and yarn-default.xml
 -

 Key: YARN-2284
 URL: https://issues.apache.org/jira/browse/YARN-2284
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.4.1
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Minor
  Labels: supportability
 Attachments: YARN-2284-04.patch, YARN-2284-05.patch, 
 YARN-2284-06.patch, YARN-2284-07.patch, YARN-2284-08.patch, 
 YARN-2284-09.patch, YARN2284-01.patch, YARN2284-02.patch, YARN2284-03.patch


 YarnConfiguration has one set of properties.  yarn-default.xml has another 
 set of properties.  Ideally, there should be an automatic way to find missing 
 properties in either location.
 This is analogous to MAPREDUCE-5130, but for yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2014-12-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238336#comment-14238336
 ] 

Hadoop QA commented on YARN-2495:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12685787/YARN-2495.20141208-1.patch
  against trunk revision 144da2e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 38 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6036//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6036//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6036//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6036//console

This message is automatically generated.

 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
 YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
 using script suggested by [~aw] (YARN-2729) )
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2284) Find missing config options in YarnConfiguration and yarn-default.xml

2014-12-08 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238356#comment-14238356
 ] 

Ray Chiang commented on YARN-2284:
--

RE: Findbugs.  I don't see the any of the findbugs warnings in the code 
added/deleted by this patch.

 Find missing config options in YarnConfiguration and yarn-default.xml
 -

 Key: YARN-2284
 URL: https://issues.apache.org/jira/browse/YARN-2284
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.4.1
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Minor
  Labels: supportability
 Attachments: YARN-2284-04.patch, YARN-2284-05.patch, 
 YARN-2284-06.patch, YARN-2284-07.patch, YARN-2284-08.patch, 
 YARN-2284-09.patch, YARN2284-01.patch, YARN2284-02.patch, YARN2284-03.patch


 YarnConfiguration has one set of properties.  yarn-default.xml has another 
 set of properties.  Ideally, there should be an automatic way to find missing 
 properties in either location.
 This is analogous to MAPREDUCE-5130, but for yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2762) RMAdminCLI node-labels-related args should be trimmed and checked before sending to RM

2014-12-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238357#comment-14238357
 ] 

Wangda Tan commented on YARN-2762:
--

Looks good, thanks for update, test failure shouldn't related, but could you 
take a look at findbugs warning?

 RMAdminCLI node-labels-related args should be trimmed and checked before 
 sending to RM
 --

 Key: YARN-2762
 URL: https://issues.apache.org/jira/browse/YARN-2762
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Minor
 Attachments: YARN-2762.1.patch, YARN-2762.2.patch, YARN-2762.2.patch, 
 YARN-2762.3.patch, YARN-2762.4.patch, YARN-2762.5.patch, YARN-2762.6.patch, 
 YARN-2762.patch


 All NodeLabel args validation's are done at server side. The same can be done 
 at RMAdminCLI so that unnecessary RPC calls can be avoided.
 And for the input such as x,y,,z,, no need to add empty string instead can 
 be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2571) RM to support YARN registry

2014-12-08 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-2571:
-
Attachment: YARN-2571-009.patch

patch -009 in sync with trunk

 RM to support YARN registry 
 

 Key: YARN-2571
 URL: https://issues.apache.org/jira/browse/YARN-2571
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-2571-001.patch, YARN-2571-002.patch, 
 YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch, 
 YARN-2571-008.patch, YARN-2571-009.patch


 The RM needs to (optionally) integrate with the YARN registry:
 # startup: create the /services and /users paths with system ACLs (yarn, hdfs 
 principals)
 # app-launch: create the user directory /users/$username with the relevant 
 permissions (CRD) for them to create subnodes.
 # attempt, container, app completion: remove service records with the 
 matching persistence and ID



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2925) Internal fields in LeafQueue access should be protected when accessed from FiCaSchedulerApp to calculate Headroom

2014-12-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238365#comment-14238365
 ] 

Wangda Tan commented on YARN-2925:
--

[~cwelch],
Thanks for your comments, your suggestion makes sense to me, I will:
- Drop the existing QueueResourceInfo implementation, and will do refactoring 
in future patch
- Add a fine grain lock only for headroom computing to resolve both consistent 
and stale issue, will include user consumed resource and used resource. I 
suggest to use read/write to achieve better performance. 

Any thoughts? Will work on a patch later.

Thanks,

 Internal fields in LeafQueue access should be protected when accessed from 
 FiCaSchedulerApp to calculate Headroom
 -

 Key: YARN-2925
 URL: https://issues.apache.org/jira/browse/YARN-2925
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Critical
 Attachments: YARN-2925.1.patch


 Upon YARN-2644, FiCaScheduler will calculation up-to-date headroom before 
 sending back Allocation response to AM.
 Headroom calculation is happened in LeafQueue side, uses fields like used 
 resource, etc. But it is not protected by any lock of LeafQueue, so it might 
 be corrupted is someone else is editing it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2931) PublicLocalizer may fail with FileNotFoundException until directory gets initialized by LocalizeRunner

2014-12-08 Thread Anubhav Dhoot (JIRA)
Anubhav Dhoot created YARN-2931:
---

 Summary: PublicLocalizer may fail with FileNotFoundException until 
directory gets initialized by LocalizeRunner
 Key: YARN-2931
 URL: https://issues.apache.org/jira/browse/YARN-2931
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot


When the data directory is cleaned up and NM is started with existing recovery 
state, because of YARN-90, it will not recreate the local dirs.
This causes a PublicLocalizer to fail until getInitializedLocalDirs is called 
due to some LocalizeRunner for private localization.

Instead we can have PublicLocalizer not depend on this and also call 
getInitializedLocalDirs so it can handle initialization on its own similar to 
non public localization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2931) PublicLocalizer may fail with FileNotFoundException until directory gets initialized by LocalizeRunner

2014-12-08 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2931:

Description: 
When the data directory is cleaned up and NM is started with existing recovery 
state, because of YARN-90, it will not recreate the local dirs.
This causes a PublicLocalizer to fail until getInitializedLocalDirs is called 
due to some LocalizeRunner for private localization.

Instead we can have PublicLocalizer not depend on this and also call 
getInitializedLocalDirs so it can handle initialization on its own similar to 
non public localization

Example error 

{noformat}
2014-12-02 22:57:32,629 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Failed to download rsrc { { hdfs:/blah 
machine:8020/tmp/hive-hive/hive_2014-12-02_22-56-58_741_2045919883676051996-3/-mr-10004/8060c9dd-54b6-42fc-9d77-34b655fa5e82/reduce.xml,
 1417589819618, FILE, null 
},pending,[(container_1417589109512_0001_02_03)],119413444132127,DOWNLOADING}
java.io.FileNotFoundException: File /data/yarn/nm/filecache does not exist
at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:524)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514)
at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1051)
at 
org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:162)
at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:197)
at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:724)
at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:720)
at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:720)
at org.apache.hadoop.yarn.util.FSDownload.createDir(FSDownload.java:104)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:351)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2014-12-02 22:57:32,629 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1417589109512_0001_02_03 transitioned from LOCALIZING 
to LOCALIZATION_FAILED
{noformat}

  was:
When the data directory is cleaned up and NM is started with existing recovery 
state, because of YARN-90, it will not recreate the local dirs.
This causes a PublicLocalizer to fail until getInitializedLocalDirs is called 
due to some LocalizeRunner for private localization.

Instead we can have PublicLocalizer not depend on this and also call 
getInitializedLocalDirs so it can handle initialization on its own similar to 
non public localization


 PublicLocalizer may fail with FileNotFoundException until directory gets 
 initialized by LocalizeRunner
 --

 Key: YARN-2931
 URL: https://issues.apache.org/jira/browse/YARN-2931
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot

 When the data directory is cleaned up and NM is started with existing 
 recovery state, because of YARN-90, it will not recreate the local dirs.
 This causes a PublicLocalizer to fail until getInitializedLocalDirs is called 
 due to some LocalizeRunner for private localization.
 Instead we can have PublicLocalizer not depend on this and also call 
 getInitializedLocalDirs so it can handle initialization on its own similar to 
 non public localization
 Example error 
 {noformat}
 2014-12-02 22:57:32,629 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc { { hdfs:/blah 
 machine:8020/tmp/hive-hive/hive_2014-12-02_22-56-58_741_2045919883676051996-3/-mr-10004/8060c9dd-54b6-42fc-9d77-34b655fa5e82/reduce.xml,
  1417589819618, FILE, null 
 },pending,[(container_1417589109512_0001_02_03)],119413444132127,DOWNLOADING}
 java.io.FileNotFoundException: File /data/yarn/nm/filecache does not exist
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:524)
   at 
 

[jira] [Assigned] (YARN-2931) PublicLocalizer may fail with FileNotFoundException until directory gets initialized by LocalizeRunner

2014-12-08 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot reassigned YARN-2931:
---

Assignee: Anubhav Dhoot

 PublicLocalizer may fail with FileNotFoundException until directory gets 
 initialized by LocalizeRunner
 --

 Key: YARN-2931
 URL: https://issues.apache.org/jira/browse/YARN-2931
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot

 When the data directory is cleaned up and NM is started with existing 
 recovery state, because of YARN-90, it will not recreate the local dirs.
 This causes a PublicLocalizer to fail until getInitializedLocalDirs is called 
 due to some LocalizeRunner for private localization.
 Instead we can have PublicLocalizer not depend on this and also call 
 getInitializedLocalDirs so it can handle initialization on its own similar to 
 non public localization
 Example error 
 {noformat}
 2014-12-02 22:57:32,629 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc { { hdfs:/blah 
 machine:8020/tmp/hive-hive/hive_2014-12-02_22-56-58_741_2045919883676051996-3/-mr-10004/8060c9dd-54b6-42fc-9d77-34b655fa5e82/reduce.xml,
  1417589819618, FILE, null 
 },pending,[(container_1417589109512_0001_02_03)],119413444132127,DOWNLOADING}
 java.io.FileNotFoundException: File /data/yarn/nm/filecache does not exist
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:524)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514)
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1051)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:162)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:197)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:724)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:720)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:720)
   at org.apache.hadoop.yarn.util.FSDownload.createDir(FSDownload.java:104)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:351)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 2014-12-02 22:57:32,629 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1417589109512_0001_02_03 transitioned from 
 LOCALIZING to LOCALIZATION_FAILED
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2931) PublicLocalizer may fail with FileNotFoundException until directory gets initialized by LocalizeRunner

2014-12-08 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2931:

Attachment: YARN-2931.001.patch

Let PublicLocalizer also initialize the local directories similar to 
LocalizerRunner

 PublicLocalizer may fail with FileNotFoundException until directory gets 
 initialized by LocalizeRunner
 --

 Key: YARN-2931
 URL: https://issues.apache.org/jira/browse/YARN-2931
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2931.001.patch


 When the data directory is cleaned up and NM is started with existing 
 recovery state, because of YARN-90, it will not recreate the local dirs.
 This causes a PublicLocalizer to fail until getInitializedLocalDirs is called 
 due to some LocalizeRunner for private localization.
 Instead we can have PublicLocalizer not depend on this and also call 
 getInitializedLocalDirs so it can handle initialization on its own similar to 
 non public localization
 Example error 
 {noformat}
 2014-12-02 22:57:32,629 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc { { hdfs:/blah 
 machine:8020/tmp/hive-hive/hive_2014-12-02_22-56-58_741_2045919883676051996-3/-mr-10004/8060c9dd-54b6-42fc-9d77-34b655fa5e82/reduce.xml,
  1417589819618, FILE, null 
 },pending,[(container_1417589109512_0001_02_03)],119413444132127,DOWNLOADING}
 java.io.FileNotFoundException: File /data/yarn/nm/filecache does not exist
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:524)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514)
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1051)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:162)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:197)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:724)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:720)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:720)
   at org.apache.hadoop.yarn.util.FSDownload.createDir(FSDownload.java:104)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:351)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 2014-12-02 22:57:32,629 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1417589109512_0001_02_03 transitioned from 
 LOCALIZING to LOCALIZATION_FAILED
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2931) PublicLocalizer may fail with FileNotFoundException until directory gets initialized by LocalizeRunner

2014-12-08 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238480#comment-14238480
 ] 

Karthik Kambatla commented on YARN-2931:


Initially in the description, from Anubhav: 

Instead we can have PublicLocalizer not depend on this and also call 
getInitializedLocalDirs so it can handle initialization on its own similar to 
non public localization


 PublicLocalizer may fail with FileNotFoundException until directory gets 
 initialized by LocalizeRunner
 --

 Key: YARN-2931
 URL: https://issues.apache.org/jira/browse/YARN-2931
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2931.001.patch


 When the data directory is cleaned up and NM is started with existing 
 recovery state, because of YARN-90, it will not recreate the local dirs.
 This causes a PublicLocalizer to fail until getInitializedLocalDirs is called 
 due to some LocalizeRunner for private localization.
 Example error 
 {noformat}
 2014-12-02 22:57:32,629 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc { { hdfs:/blah 
 machine:8020/tmp/hive-hive/hive_2014-12-02_22-56-58_741_2045919883676051996-3/-mr-10004/8060c9dd-54b6-42fc-9d77-34b655fa5e82/reduce.xml,
  1417589819618, FILE, null 
 },pending,[(container_1417589109512_0001_02_03)],119413444132127,DOWNLOADING}
 java.io.FileNotFoundException: File /data/yarn/nm/filecache does not exist
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:524)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514)
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1051)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:162)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:197)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:724)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:720)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:720)
   at org.apache.hadoop.yarn.util.FSDownload.createDir(FSDownload.java:104)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:351)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 2014-12-02 22:57:32,629 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1417589109512_0001_02_03 transitioned from 
 LOCALIZING to LOCALIZATION_FAILED
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2892) Unable to get AMRMToken in unmanaged AM when using a secure cluster

2014-12-08 Thread Sevada Abraamyan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238520#comment-14238520
 ] 

Sevada Abraamyan commented on YARN-2892:


On second thought I think [~djp] was referring directly to the code I 
referenced above. Since we'd rather not modify the public interface of RMApp, 
maybe we should continue passing in the full username to 
_createAndGetApplicationReport_ and prior to AMRM security check we can use 
this full username to construct a short username. It seems a bit hacky but I'm 
not sure how else we can avoid not breaking the public interface. 

The easiest way I can see doing this is by using something like the following:

{code}
UserGroupInformation remoteUser = 
UserGroupInformation.getRemoteUser(clientUserName);
String shortUsername = remoteUser.getShortUsername();
{code}

Another solution could be to do the following:

{code}
//if security is set to kerberos...
HadoopKerberosName kbName = new HadoopKerberosName(clientUserName);
String shortUsername = kbName.getShortUsername()
{code} 

The first solution is a bit strange but looks more attractive to me as it 
allows _RMAppImpl_ to stay agnostic to the underlying security framework. Any 
suggestions?


 Unable to get AMRMToken in unmanaged AM when using a secure cluster
 ---

 Key: YARN-2892
 URL: https://issues.apache.org/jira/browse/YARN-2892
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Sevada Abraamyan
Assignee: Sevada Abraamyan
 Attachments: YARN-2892.patch, YARN-2892.patch, YARN-2892.patch


 An AMRMToken is retrieved from the ApplicationReport by the YarnClient. 
 When the RM creates the ApplicationReport and sends it back to the client it 
 makes a simple security check whether it should include the AMRMToken in the 
 report (See createAndGetApplicationReport in RMAppImpl).This security check 
 verifies that the user who submitted the original application is the same 
 user who is requesting the ApplicationReport. If they are indeed the same 
 user then it includes the AMRMToken, otherwise it does not include it.
 The problem arises from the fact that when an application is submitted, the 
 RM  saves the short username of the user who created the application (See 
 submitApplication in ClientRmService). Afterwards when the ApplicationReport 
 is requested, the system tries to match the full username of the requester 
 against the previously stored short username. 
 In a secure cluster using Kerberos this check fails because the principle is 
 stripped from the username when we request a short username. So for example 
 the short username might be Foo whereas the full username is 
 f...@company.com
 Note: A very similar problem has been previously reported 
 ([Yarn-2232|https://issues.apache.org/jira/browse/YARN-2232])



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2931) PublicLocalizer may fail with FileNotFoundException until directory gets initialized by LocalizeRunner

2014-12-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238529#comment-14238529
 ] 

Hadoop QA commented on YARN-2931:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12685852/YARN-2931.001.patch
  against trunk revision 6c5bbd7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1219 javac 
compiler warnings (more than the trunk's current 1217 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6040//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6040//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6040//console

This message is automatically generated.

 PublicLocalizer may fail with FileNotFoundException until directory gets 
 initialized by LocalizeRunner
 --

 Key: YARN-2931
 URL: https://issues.apache.org/jira/browse/YARN-2931
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2931.001.patch


 When the data directory is cleaned up and NM is started with existing 
 recovery state, because of YARN-90, it will not recreate the local dirs.
 This causes a PublicLocalizer to fail until getInitializedLocalDirs is called 
 due to some LocalizeRunner for private localization.
 Example error 
 {noformat}
 2014-12-02 22:57:32,629 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc { { hdfs:/blah 
 machine:8020/tmp/hive-hive/hive_2014-12-02_22-56-58_741_2045919883676051996-3/-mr-10004/8060c9dd-54b6-42fc-9d77-34b655fa5e82/reduce.xml,
  1417589819618, FILE, null 
 },pending,[(container_1417589109512_0001_02_03)],119413444132127,DOWNLOADING}
 java.io.FileNotFoundException: File /data/yarn/nm/filecache does not exist
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:524)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514)
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1051)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:162)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:197)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:724)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:720)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:720)
   at org.apache.hadoop.yarn.util.FSDownload.createDir(FSDownload.java:104)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:351)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 2014-12-02 22:57:32,629 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1417589109512_0001_02_03 transitioned from 
 LOCALIZING to LOCALIZATION_FAILED
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2932) Add entry for preemption setting to queue status screen and startup/refresh logging

2014-12-08 Thread Eric Payne (JIRA)
Eric Payne created YARN-2932:


 Summary: Add entry for preemption setting to queue status screen 
and startup/refresh logging
 Key: YARN-2932
 URL: https://issues.apache.org/jira/browse/YARN-2932
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.7.0
Reporter: Eric Payne


YARN-2056 enables the ability to turn preemption on or off on a per-queue 
level. This JIRA will provide the preemption status for each queue in the 
{{HOST:8088/cluster/scheduler}} UI and in the RM log during startup/queue 
refresh.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2932) Add entry for preemption setting to queue status screen and startup/refresh logging

2014-12-08 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne reassigned YARN-2932:


Assignee: Eric Payne

 Add entry for preemption setting to queue status screen and startup/refresh 
 logging
 ---

 Key: YARN-2932
 URL: https://issues.apache.org/jira/browse/YARN-2932
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.7.0
Reporter: Eric Payne
Assignee: Eric Payne

 YARN-2056 enables the ability to turn preemption on or off on a per-queue 
 level. This JIRA will provide the preemption status for each queue in the 
 {{HOST:8088/cluster/scheduler}} UI and in the RM log during startup/queue 
 refresh.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2931) PublicLocalizer may fail with FileNotFoundException until directory gets initialized by LocalizeRunner

2014-12-08 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2931:

Attachment: YARN-2931.002.patch

Fixed javac warnings

 PublicLocalizer may fail with FileNotFoundException until directory gets 
 initialized by LocalizeRunner
 --

 Key: YARN-2931
 URL: https://issues.apache.org/jira/browse/YARN-2931
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2931.001.patch, YARN-2931.002.patch


 When the data directory is cleaned up and NM is started with existing 
 recovery state, because of YARN-90, it will not recreate the local dirs.
 This causes a PublicLocalizer to fail until getInitializedLocalDirs is called 
 due to some LocalizeRunner for private localization.
 Example error 
 {noformat}
 2014-12-02 22:57:32,629 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc { { hdfs:/blah 
 machine:8020/tmp/hive-hive/hive_2014-12-02_22-56-58_741_2045919883676051996-3/-mr-10004/8060c9dd-54b6-42fc-9d77-34b655fa5e82/reduce.xml,
  1417589819618, FILE, null 
 },pending,[(container_1417589109512_0001_02_03)],119413444132127,DOWNLOADING}
 java.io.FileNotFoundException: File /data/yarn/nm/filecache does not exist
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:524)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514)
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1051)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:162)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:197)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:724)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:720)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:720)
   at org.apache.hadoop.yarn.util.FSDownload.createDir(FSDownload.java:104)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:351)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 2014-12-02 22:57:32,629 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1417589109512_0001_02_03 transitioned from 
 LOCALIZING to LOCALIZATION_FAILED
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2932) Add entry for preemption setting to queue status screen and startup/refresh logging

2014-12-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238582#comment-14238582
 ] 

Wangda Tan commented on YARN-2932:
--

Thanks for raising this up [~eepayne], it is a good adding. 

IIRC, YARN-2056 putting the disable preemption configuration for queue code in 
ProportionalCapacityPreemptionPolicy instead of putting them into 
CapacitySchedulerConfiguration. But after read this proposal, I think we should 
move them to CapacitySchedulerConfiguration, and getIsPreemptionDisabled should 
be a method of CSQueue interface, thoughts?

 Add entry for preemption setting to queue status screen and startup/refresh 
 logging
 ---

 Key: YARN-2932
 URL: https://issues.apache.org/jira/browse/YARN-2932
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.7.0
Reporter: Eric Payne
Assignee: Eric Payne

 YARN-2056 enables the ability to turn preemption on or off on a per-queue 
 level. This JIRA will provide the preemption status for each queue in the 
 {{HOST:8088/cluster/scheduler}} UI and in the RM log during startup/queue 
 refresh.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2900) Application (Attempt and Container) Not Found in AHS results in Internal Server Error (500)

2014-12-08 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-2900:

Attachment: YARN-2900.patch

Attaching the patch that addresses the the NFE and indenting. I'll wait for 
your response on the IllegalStateException

 Application (Attempt and Container) Not Found in AHS results in Internal 
 Server Error (500)
 ---

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, 
 YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, YARN-2900.patch


 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2900) Application (Attempt and Container) Not Found in AHS results in Internal Server Error (500)

2014-12-08 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-2900:

Attachment: YARN-2900.patch

Refining patch. Missed to remove unused import

 Application (Attempt and Container) Not Found in AHS results in Internal 
 Server Error (500)
 ---

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, 
 YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, 
 YARN-2900.patch


 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2900) Application (Attempt and Container) Not Found in AHS results in Internal Server Error (500)

2014-12-08 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238619#comment-14238619
 ] 

Zhijie Shen commented on YARN-2900:
---

bq.  I am not sure why the timelineserver throws some exception and we get a 
NotFound on the browser. Can you explain what was the test that you did here?

What i did:

1. Start the timeline server while system metrics publisher is enabled for RM.
2. Submit a MR example job.
3. Type 
{{http://localhost:8188/ws/v1/applicationhistory/apps/application_1417818619773_0001?user.name=zshen}}
 in browser, and check the output, which is right.
4. Type 
{{http://localhost:8188/ws/v1/applicationhistory/apps/application_1417818619773_0002?user.name=zshen}},
 and look for NOT_FOUND message. However no response at all. And I see the 
aforementioned exception in the timeline server log.

I applied this patch on trunk, and I could reproduce this issue. Undoing this 
patch, 
{{http://localhost:8188/ws/v1/applicationhistory/apps/application_1417818619773_0002?user.name=zshen}}
 will return Internal Server Error (500), which is the expected current 
behavior. Did you have a chance to reproduce it at ur side?

 Application (Attempt and Container) Not Found in AHS results in Internal 
 Server Error (500)
 ---

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, 
 YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, 
 YARN-2900.patch


 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2837) Timeline server needs to recover the timeline DT when restarting

2014-12-08 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2837:
--
Attachment: YARN-2837.5.patch

Do two more things in the new patch:

1. Correct the logic of storing version by differentiating the cases of create 
a new state store and the existing state store. It seems that 
LeveldbTimelineStore needs to be fixed too. Let's treat it as a separate issue.

2. Like RMDelegationTokenIdentifierData, create a 
TimelineDelegationTokenIndentifierData to wrap all fields to be serialized into 
leveldb for better compatibility if we add more fields in the future.

 Timeline server needs to recover the timeline DT when restarting
 

 Key: YARN-2837
 URL: https://issues.apache.org/jira/browse/YARN-2837
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-2837.1.patch, YARN-2837.2.patch, YARN-2837.3.patch, 
 YARN-2837.4.patch, YARN-2837.5.patch


 Timeline server needs to recover the stateful information when restarting as 
 RM/NM/JHS does now. So far the stateful information only includes the 
 timeline DT. Without recovery, the timeline DT of the existing YARN apps is 
 not long valid, and cannot be renewed any more after the timeline server is 
 restarted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2900) Application (Attempt and Container) Not Found in AHS results in Internal Server Error (500)

2014-12-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238675#comment-14238675
 ] 

Hadoop QA commented on YARN-2900:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12685879/YARN-2900.patch
  against trunk revision ddffcd8.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6041//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6041//console

This message is automatically generated.

 Application (Attempt and Container) Not Found in AHS results in Internal 
 Server Error (500)
 ---

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, 
 YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, 
 YARN-2900.patch


 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2920) CapacityScheduler should be notified when labels on nodes changed

2014-12-08 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2920:
-
Attachment: YARN-2920.2.patch

Updated patch, dropped some unnecessary refactoring code which will cause 
deadlock (tracked by YARN-2925).

Resolved UT failures. 

 CapacityScheduler should be notified when labels on nodes changed
 -

 Key: YARN-2920
 URL: https://issues.apache.org/jira/browse/YARN-2920
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2920.1.patch, YARN-2920.2.patch


 Currently, labels on nodes changes will only be handled by 
 RMNodeLabelsManager, but that is not enough upon labels on nodes changes:
 - Scheduler should be able to do take actions to running containers. (Like 
 kill/preempt/do-nothing)
 - Used / available capacity in scheduler should be updated for future 
 planning.
 We need add a new event to pass such updates to scheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2931) PublicLocalizer may fail with FileNotFoundException until directory gets initialized by LocalizeRunner

2014-12-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238683#comment-14238683
 ] 

Hadoop QA commented on YARN-2931:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12685860/YARN-2931.002.patch
  against trunk revision ddffcd8.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 7 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6042//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6042//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6042//console

This message is automatically generated.

 PublicLocalizer may fail with FileNotFoundException until directory gets 
 initialized by LocalizeRunner
 --

 Key: YARN-2931
 URL: https://issues.apache.org/jira/browse/YARN-2931
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2931.001.patch, YARN-2931.002.patch


 When the data directory is cleaned up and NM is started with existing 
 recovery state, because of YARN-90, it will not recreate the local dirs.
 This causes a PublicLocalizer to fail until getInitializedLocalDirs is called 
 due to some LocalizeRunner for private localization.
 Example error 
 {noformat}
 2014-12-02 22:57:32,629 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc { { hdfs:/blah 
 machine:8020/tmp/hive-hive/hive_2014-12-02_22-56-58_741_2045919883676051996-3/-mr-10004/8060c9dd-54b6-42fc-9d77-34b655fa5e82/reduce.xml,
  1417589819618, FILE, null 
 },pending,[(container_1417589109512_0001_02_03)],119413444132127,DOWNLOADING}
 java.io.FileNotFoundException: File /data/yarn/nm/filecache does not exist
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:524)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514)
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1051)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:162)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:197)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:724)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:720)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:720)
   at org.apache.hadoop.yarn.util.FSDownload.createDir(FSDownload.java:104)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:351)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 2014-12-02 22:57:32,629 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1417589109512_0001_02_03 transitioned from 
 LOCALIZING to LOCALIZATION_FAILED
 {noformat}



--
This message was sent by Atlassian JIRA

[jira] [Commented] (YARN-2837) Timeline server needs to recover the timeline DT when restarting

2014-12-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238684#comment-14238684
 ] 

Hadoop QA commented on YARN-2837:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12685889/YARN-2837.5.patch
  against trunk revision ddffcd8.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6043//console

This message is automatically generated.

 Timeline server needs to recover the timeline DT when restarting
 

 Key: YARN-2837
 URL: https://issues.apache.org/jira/browse/YARN-2837
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-2837.1.patch, YARN-2837.2.patch, YARN-2837.3.patch, 
 YARN-2837.4.patch, YARN-2837.5.patch


 Timeline server needs to recover the stateful information when restarting as 
 RM/NM/JHS does now. So far the stateful information only includes the 
 timeline DT. Without recovery, the timeline DT of the existing YARN apps is 
 not long valid, and cannot be renewed any more after the timeline server is 
 restarted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2618) Avoid over-allocation of disk resources

2014-12-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238704#comment-14238704
 ] 

Wangda Tan commented on YARN-2618:
--

[~ywskycn], Capacity Scheduler already support multi-dimension resource by 
DominateResourceCalculator and it should work when DRC updated to support disk. 
The statement is not true:
bq. This is because both Fifo and Capacity only care memory capacity when 
assigning containers to nodes
See {{CapacitySchedulerConfiguration.getResourceCalculator}}.

 Avoid over-allocation of disk resources
 ---

 Key: YARN-2618
 URL: https://issues.apache.org/jira/browse/YARN-2618
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2618-1.patch, YARN-2618-2.patch


 Subtask of YARN-2139. 
 This should include
 - Add API support for introducing disk I/O as the 3rd type resource.
 - NM should report this information to the RM
 - RM should consider this to avoid over-allocation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2837) Timeline server needs to recover the timeline DT when restarting

2014-12-08 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2837:
--
Attachment: (was: YARN-2837.5.patch)

 Timeline server needs to recover the timeline DT when restarting
 

 Key: YARN-2837
 URL: https://issues.apache.org/jira/browse/YARN-2837
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-2837.1.patch, YARN-2837.2.patch, YARN-2837.3.patch, 
 YARN-2837.4.patch


 Timeline server needs to recover the stateful information when restarting as 
 RM/NM/JHS does now. So far the stateful information only includes the 
 timeline DT. Without recovery, the timeline DT of the existing YARN apps is 
 not long valid, and cannot be renewed any more after the timeline server is 
 restarted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2837) Timeline server needs to recover the timeline DT when restarting

2014-12-08 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2837:
--
Attachment: YARN-2837.5.patch

 Timeline server needs to recover the timeline DT when restarting
 

 Key: YARN-2837
 URL: https://issues.apache.org/jira/browse/YARN-2837
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-2837.1.patch, YARN-2837.2.patch, YARN-2837.3.patch, 
 YARN-2837.4.patch, YARN-2837.5.patch


 Timeline server needs to recover the stateful information when restarting as 
 RM/NM/JHS does now. So far the stateful information only includes the 
 timeline DT. Without recovery, the timeline DT of the existing YARN apps is 
 not long valid, and cannot be renewed any more after the timeline server is 
 restarted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2931) PublicLocalizer may fail with FileNotFoundException until directory gets initialized by LocalizeRunner

2014-12-08 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2931:

Attachment: YARN-2931.002.patch

 PublicLocalizer may fail with FileNotFoundException until directory gets 
 initialized by LocalizeRunner
 --

 Key: YARN-2931
 URL: https://issues.apache.org/jira/browse/YARN-2931
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2931.001.patch, YARN-2931.002.patch, 
 YARN-2931.002.patch


 When the data directory is cleaned up and NM is started with existing 
 recovery state, because of YARN-90, it will not recreate the local dirs.
 This causes a PublicLocalizer to fail until getInitializedLocalDirs is called 
 due to some LocalizeRunner for private localization.
 Example error 
 {noformat}
 2014-12-02 22:57:32,629 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc { { hdfs:/blah 
 machine:8020/tmp/hive-hive/hive_2014-12-02_22-56-58_741_2045919883676051996-3/-mr-10004/8060c9dd-54b6-42fc-9d77-34b655fa5e82/reduce.xml,
  1417589819618, FILE, null 
 },pending,[(container_1417589109512_0001_02_03)],119413444132127,DOWNLOADING}
 java.io.FileNotFoundException: File /data/yarn/nm/filecache does not exist
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:524)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514)
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1051)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:162)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:197)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:724)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:720)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:720)
   at org.apache.hadoop.yarn.util.FSDownload.createDir(FSDownload.java:104)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:351)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 2014-12-02 22:57:32,629 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1417589109512_0001_02_03 transitioned from 
 LOCALIZING to LOCALIZATION_FAILED
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2931) PublicLocalizer may fail with FileNotFoundException until directory gets initialized by LocalizeRunner

2014-12-08 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238725#comment-14238725
 ] 

Anubhav Dhoot commented on YARN-2931:
-

Findbugs donot seem related to the patch. Uploading again to retrigger

 PublicLocalizer may fail with FileNotFoundException until directory gets 
 initialized by LocalizeRunner
 --

 Key: YARN-2931
 URL: https://issues.apache.org/jira/browse/YARN-2931
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2931.001.patch, YARN-2931.002.patch, 
 YARN-2931.002.patch


 When the data directory is cleaned up and NM is started with existing 
 recovery state, because of YARN-90, it will not recreate the local dirs.
 This causes a PublicLocalizer to fail until getInitializedLocalDirs is called 
 due to some LocalizeRunner for private localization.
 Example error 
 {noformat}
 2014-12-02 22:57:32,629 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc { { hdfs:/blah 
 machine:8020/tmp/hive-hive/hive_2014-12-02_22-56-58_741_2045919883676051996-3/-mr-10004/8060c9dd-54b6-42fc-9d77-34b655fa5e82/reduce.xml,
  1417589819618, FILE, null 
 },pending,[(container_1417589109512_0001_02_03)],119413444132127,DOWNLOADING}
 java.io.FileNotFoundException: File /data/yarn/nm/filecache does not exist
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:524)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514)
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1051)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:162)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:197)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:724)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:720)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:720)
   at org.apache.hadoop.yarn.util.FSDownload.createDir(FSDownload.java:104)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:351)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 2014-12-02 22:57:32,629 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1417589109512_0001_02_03 transitioned from 
 LOCALIZING to LOCALIZATION_FAILED
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2931) PublicLocalizer may fail with FileNotFoundException until directory gets initialized by LocalizeRunner

2014-12-08 Thread bc Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238734#comment-14238734
 ] 

bc Wong commented on YARN-2931:
---

Thanks for the fix! Some nits. ResourceLocalizationService.java
* Instead of commenting out code, would just remove it.

TestResourceLocalizationService.java
* L950: Remove code that commented out.

 PublicLocalizer may fail with FileNotFoundException until directory gets 
 initialized by LocalizeRunner
 --

 Key: YARN-2931
 URL: https://issues.apache.org/jira/browse/YARN-2931
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2931.001.patch, YARN-2931.002.patch, 
 YARN-2931.002.patch


 When the data directory is cleaned up and NM is started with existing 
 recovery state, because of YARN-90, it will not recreate the local dirs.
 This causes a PublicLocalizer to fail until getInitializedLocalDirs is called 
 due to some LocalizeRunner for private localization.
 Example error 
 {noformat}
 2014-12-02 22:57:32,629 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc { { hdfs:/blah 
 machine:8020/tmp/hive-hive/hive_2014-12-02_22-56-58_741_2045919883676051996-3/-mr-10004/8060c9dd-54b6-42fc-9d77-34b655fa5e82/reduce.xml,
  1417589819618, FILE, null 
 },pending,[(container_1417589109512_0001_02_03)],119413444132127,DOWNLOADING}
 java.io.FileNotFoundException: File /data/yarn/nm/filecache does not exist
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:524)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514)
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1051)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:162)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:197)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:724)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:720)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:720)
   at org.apache.hadoop.yarn.util.FSDownload.createDir(FSDownload.java:104)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:351)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 2014-12-02 22:57:32,629 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1417589109512_0001_02_03 transitioned from 
 LOCALIZING to LOCALIZATION_FAILED
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2618) Avoid over-allocation of disk resources

2014-12-08 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238753#comment-14238753
 ] 

Wei Yan commented on YARN-2618:
---

Thanks for pointing out, [~leftnoteasy]. I'll check that and update testcases 
for Capacity.

 Avoid over-allocation of disk resources
 ---

 Key: YARN-2618
 URL: https://issues.apache.org/jira/browse/YARN-2618
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2618-1.patch, YARN-2618-2.patch


 Subtask of YARN-2139. 
 This should include
 - Add API support for introducing disk I/O as the 3rd type resource.
 - NM should report this information to the RM
 - RM should consider this to avoid over-allocation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2837) Timeline server needs to recover the timeline DT when restarting

2014-12-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238763#comment-14238763
 ] 

Hadoop QA commented on YARN-2837:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12685897/YARN-2837.5.patch
  against trunk revision ddffcd8.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 7 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6045//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6045//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6045//console

This message is automatically generated.

 Timeline server needs to recover the timeline DT when restarting
 

 Key: YARN-2837
 URL: https://issues.apache.org/jira/browse/YARN-2837
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-2837.1.patch, YARN-2837.2.patch, YARN-2837.3.patch, 
 YARN-2837.4.patch, YARN-2837.5.patch


 Timeline server needs to recover the stateful information when restarting as 
 RM/NM/JHS does now. So far the stateful information only includes the 
 timeline DT. Without recovery, the timeline DT of the existing YARN apps is 
 not long valid, and cannot be renewed any more after the timeline server is 
 restarted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2931) PublicLocalizer may fail with FileNotFoundException until directory gets initialized by LocalizeRunner

2014-12-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238770#comment-14238770
 ] 

Hadoop QA commented on YARN-2931:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12685898/YARN-2931.002.patch
  against trunk revision ddffcd8.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6046//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6046//console

This message is automatically generated.

 PublicLocalizer may fail with FileNotFoundException until directory gets 
 initialized by LocalizeRunner
 --

 Key: YARN-2931
 URL: https://issues.apache.org/jira/browse/YARN-2931
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2931.001.patch, YARN-2931.002.patch, 
 YARN-2931.002.patch


 When the data directory is cleaned up and NM is started with existing 
 recovery state, because of YARN-90, it will not recreate the local dirs.
 This causes a PublicLocalizer to fail until getInitializedLocalDirs is called 
 due to some LocalizeRunner for private localization.
 Example error 
 {noformat}
 2014-12-02 22:57:32,629 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc { { hdfs:/blah 
 machine:8020/tmp/hive-hive/hive_2014-12-02_22-56-58_741_2045919883676051996-3/-mr-10004/8060c9dd-54b6-42fc-9d77-34b655fa5e82/reduce.xml,
  1417589819618, FILE, null 
 },pending,[(container_1417589109512_0001_02_03)],119413444132127,DOWNLOADING}
 java.io.FileNotFoundException: File /data/yarn/nm/filecache does not exist
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:524)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514)
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1051)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:162)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:197)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:724)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:720)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:720)
   at org.apache.hadoop.yarn.util.FSDownload.createDir(FSDownload.java:104)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:351)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 2014-12-02 22:57:32,629 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1417589109512_0001_02_03 transitioned from 
 LOCALIZING to LOCALIZATION_FAILED
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2920) CapacityScheduler should be notified when labels on nodes changed

2014-12-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238784#comment-14238784
 ] 

Hadoop QA commented on YARN-2920:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12685892/YARN-2920.2.patch
  against trunk revision ddffcd8.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1218 javac 
compiler warnings (more than the trunk's current 1217 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 2 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6044//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6044//artifact/patchprocess/patchReleaseAuditProblems.txt
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6044//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6044//console

This message is automatically generated.

 CapacityScheduler should be notified when labels on nodes changed
 -

 Key: YARN-2920
 URL: https://issues.apache.org/jira/browse/YARN-2920
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2920.1.patch, YARN-2920.2.patch


 Currently, labels on nodes changes will only be handled by 
 RMNodeLabelsManager, but that is not enough upon labels on nodes changes:
 - Scheduler should be able to do take actions to running containers. (Like 
 kill/preempt/do-nothing)
 - Used / available capacity in scheduler should be updated for future 
 planning.
 We need add a new event to pass such updates to scheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2930) TestRMRestart#testRMRestartRecoveringNodeLabelManager sometimes fails against Java 8

2014-12-08 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned YARN-2930:


Assignee: Wangda Tan  (was: Rohith)

 TestRMRestart#testRMRestartRecoveringNodeLabelManager sometimes fails against 
 Java 8
 

 Key: YARN-2930
 URL: https://issues.apache.org/jira/browse/YARN-2930
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Wangda Tan
Priority: Minor

 From https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/31/console :
 {code}
 testRMRestartRecoveringNodeLabelManager[0](org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
   Time elapsed: 0.136 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:2
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartRecoveringNodeLabelManager(TestRMRestart.java:2100)
 testRMRestartRecoveringNodeLabelManager[1](org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
   Time elapsed: 0.081 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:2
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartRecoveringNodeLabelManager(TestRMRestart.java:2100)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2930) TestRMRestart#testRMRestartRecoveringNodeLabelManager sometimes fails against Java 8

2014-12-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238818#comment-14238818
 ] 

Wangda Tan commented on YARN-2930:
--

[~rohithsharma], I've looked into this, found the root cause, so I took over 
since it causes other Jenkins job failure.

It is caused by some other test(s) write node label to FS, and 
TestRMRestart.testRMRestartRecoveringNodeLabelManager loaded previously wrote 
node label store from FS when starting.

I've done a patch will allocate a random temp directory for writing node labels 
data, and will cleanup when JVM exit. Please let me know your comments.

Thanks,
Wangda

 TestRMRestart#testRMRestartRecoveringNodeLabelManager sometimes fails against 
 Java 8
 

 Key: YARN-2930
 URL: https://issues.apache.org/jira/browse/YARN-2930
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Rohith
Priority: Minor

 From https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/31/console :
 {code}
 testRMRestartRecoveringNodeLabelManager[0](org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
   Time elapsed: 0.136 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:2
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartRecoveringNodeLabelManager(TestRMRestart.java:2100)
 testRMRestartRecoveringNodeLabelManager[1](org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
   Time elapsed: 0.081 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:2
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartRecoveringNodeLabelManager(TestRMRestart.java:2100)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2930) TestRMRestart#testRMRestartRecoveringNodeLabelManager sometimes fails against Java 8

2014-12-08 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2930:
-
Attachment: YARN-2930.1.patch

 TestRMRestart#testRMRestartRecoveringNodeLabelManager sometimes fails against 
 Java 8
 

 Key: YARN-2930
 URL: https://issues.apache.org/jira/browse/YARN-2930
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Wangda Tan
Priority: Minor
 Attachments: YARN-2930.1.patch


 From https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/31/console :
 {code}
 testRMRestartRecoveringNodeLabelManager[0](org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
   Time elapsed: 0.136 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:2
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartRecoveringNodeLabelManager(TestRMRestart.java:2100)
 testRMRestartRecoveringNodeLabelManager[1](org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
   Time elapsed: 0.081 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:2
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartRecoveringNodeLabelManager(TestRMRestart.java:2100)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2910) FSLeafQueue can throw ConcurrentModificationException

2014-12-08 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238838#comment-14238838
 ] 

Karthik Kambatla commented on YARN-2910:


Here is the deadlock Wilfred was mentioning:
{noformat}
FairSchedulerContinuousScheduling:
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:553)
- waiting to lock 0x0007f6bc8f58 (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:769)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:228)
- locked 0x0007f6b5ec00 (a 
java.util.Collections$SynchronizedRandomAccessList)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1072)
- locked 0x0007f68f25e8 (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1005)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:280)
Thread-434:
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.getResourceUsage(FSLeafQueue.java:152)
- waiting to lock 0x0007f6b5ec00 (a 
java.util.Collections$SynchronizedRandomAccessList)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.getHeadroom(FSAppAttempt.java:180)
- locked 0x0007f6bc8f58 (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:939)
- locked 0x0007f6bc8f58 (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3509)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{noformat}

 FSLeafQueue can throw ConcurrentModificationException
 -

 Key: YARN-2910
 URL: https://issues.apache.org/jira/browse/YARN-2910
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.5.2
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg
 Attachments: FSLeafQueue_concurrent_exception.txt, 
 YARN-2910.004.patch, YARN-2910.1.patch, YARN-2910.2.patch, YARN-2910.3.patch, 
 YARN-2910.4.patch, YARN-2910.patch


 The list that maintains the runnable and the non runnable apps are a standard 
 ArrayList but there is no guarantee that it will only be manipulated by one 
 thread in the system. This can lead to the following exception:
 {noformat}
 2014-11-12 02:29:01,169 ERROR [RMCommunicator Allocator] 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN 
 CONTACTING RM.
 java.util.ConcurrentModificationException: 
 java.util.ConcurrentModificationException
 at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859)
 at java.util.ArrayList$Itr.next(ArrayList.java:831)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.getResourceUsage(FSLeafQueue.java:147)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.getHeadroom(FSAppAttempt.java:180)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:923)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:516)
 {noformat}
 Full stack trace in the attached 

[jira] [Commented] (YARN-2910) FSLeafQueue can throw ConcurrentModificationException

2014-12-08 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238884#comment-14238884
 ] 

Karthik Kambatla commented on YARN-2910:


Looking around, we don't need the synchronization for FSAppAttempt#getHeadroom. 
That and changing the locking to use read-write locks should get us a long way 
towards avoiding this situation. Also, if we are locking on each access, we 
should be able to drop the use of Collections.synchronizedList. 

 FSLeafQueue can throw ConcurrentModificationException
 -

 Key: YARN-2910
 URL: https://issues.apache.org/jira/browse/YARN-2910
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.5.2
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg
 Attachments: FSLeafQueue_concurrent_exception.txt, 
 YARN-2910.004.patch, YARN-2910.1.patch, YARN-2910.2.patch, YARN-2910.3.patch, 
 YARN-2910.4.patch, YARN-2910.patch


 The list that maintains the runnable and the non runnable apps are a standard 
 ArrayList but there is no guarantee that it will only be manipulated by one 
 thread in the system. This can lead to the following exception:
 {noformat}
 2014-11-12 02:29:01,169 ERROR [RMCommunicator Allocator] 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN 
 CONTACTING RM.
 java.util.ConcurrentModificationException: 
 java.util.ConcurrentModificationException
 at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859)
 at java.util.ArrayList$Itr.next(ArrayList.java:831)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.getResourceUsage(FSLeafQueue.java:147)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.getHeadroom(FSAppAttempt.java:180)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:923)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:516)
 {noformat}
 Full stack trace in the attached file.
 We should guard against that by using a thread safe version from 
 java.util.concurrent.CopyOnWriteArrayList



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2931) PublicLocalizer may fail with FileNotFoundException until directory gets initialized by LocalizeRunner

2014-12-08 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2931:

Attachment: YARN-2931.003.patch

Addressed comments and made test more robust to verifying the fix

 PublicLocalizer may fail with FileNotFoundException until directory gets 
 initialized by LocalizeRunner
 --

 Key: YARN-2931
 URL: https://issues.apache.org/jira/browse/YARN-2931
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2931.001.patch, YARN-2931.002.patch, 
 YARN-2931.002.patch, YARN-2931.003.patch


 When the data directory is cleaned up and NM is started with existing 
 recovery state, because of YARN-90, it will not recreate the local dirs.
 This causes a PublicLocalizer to fail until getInitializedLocalDirs is called 
 due to some LocalizeRunner for private localization.
 Example error 
 {noformat}
 2014-12-02 22:57:32,629 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc { { hdfs:/blah 
 machine:8020/tmp/hive-hive/hive_2014-12-02_22-56-58_741_2045919883676051996-3/-mr-10004/8060c9dd-54b6-42fc-9d77-34b655fa5e82/reduce.xml,
  1417589819618, FILE, null 
 },pending,[(container_1417589109512_0001_02_03)],119413444132127,DOWNLOADING}
 java.io.FileNotFoundException: File /data/yarn/nm/filecache does not exist
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:524)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514)
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1051)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:162)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:197)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:724)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:720)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:720)
   at org.apache.hadoop.yarn.util.FSDownload.createDir(FSDownload.java:104)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:351)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 2014-12-02 22:57:32,629 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1417589109512_0001_02_03 transitioned from 
 LOCALIZING to LOCALIZATION_FAILED
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2762) RMAdminCLI node-labels-related args should be trimmed and checked before sending to RM

2014-12-08 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238902#comment-14238902
 ] 

Rohith commented on YARN-2762:
--

For all the syserr and sysout , findbug warning has been generated includes 
other class files also. I doubt that any findbug rule has been modified?

 RMAdminCLI node-labels-related args should be trimmed and checked before 
 sending to RM
 --

 Key: YARN-2762
 URL: https://issues.apache.org/jira/browse/YARN-2762
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Minor
 Attachments: YARN-2762.1.patch, YARN-2762.2.patch, YARN-2762.2.patch, 
 YARN-2762.3.patch, YARN-2762.4.patch, YARN-2762.5.patch, YARN-2762.6.patch, 
 YARN-2762.patch


 All NodeLabel args validation's are done at server side. The same can be done 
 at RMAdminCLI so that unnecessary RPC calls can be avoided.
 And for the input such as x,y,,z,, no need to add empty string instead can 
 be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2917) Potential deadlock in AsyncDispatcher when system.exit called in AsyncDispatcher#dispatch and AsyscDispatcher#serviceStop from shutdown hook

2014-12-08 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238905#comment-14238905
 ] 

Rohith commented on YARN-2917:
--

Hi [~kasha], [~jianhe], [~vinodkv] Kindly review the analysis and patch. This 
issue is causing RM to hang.

 Potential deadlock in AsyncDispatcher when system.exit called in 
 AsyncDispatcher#dispatch and AsyscDispatcher#serviceStop from shutdown hook
 

 Key: YARN-2917
 URL: https://issues.apache.org/jira/browse/YARN-2917
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Critical
 Attachments: 0001-YARN-2917.patch


 I encoutered scenario where RM hanged while shutting down and keep on logging 
 {{2014-12-03 19:32:44,283 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
 Waiting for AsyncDispatcher to drain.}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2931) PublicLocalizer may fail with FileNotFoundException until directory gets initialized by LocalizeRunner

2014-12-08 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2931:

Attachment: YARN-2931.004.patch

 PublicLocalizer may fail with FileNotFoundException until directory gets 
 initialized by LocalizeRunner
 --

 Key: YARN-2931
 URL: https://issues.apache.org/jira/browse/YARN-2931
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2931.001.patch, YARN-2931.002.patch, 
 YARN-2931.002.patch, YARN-2931.003.patch, YARN-2931.004.patch


 When the data directory is cleaned up and NM is started with existing 
 recovery state, because of YARN-90, it will not recreate the local dirs.
 This causes a PublicLocalizer to fail until getInitializedLocalDirs is called 
 due to some LocalizeRunner for private localization.
 Example error 
 {noformat}
 2014-12-02 22:57:32,629 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc { { hdfs:/blah 
 machine:8020/tmp/hive-hive/hive_2014-12-02_22-56-58_741_2045919883676051996-3/-mr-10004/8060c9dd-54b6-42fc-9d77-34b655fa5e82/reduce.xml,
  1417589819618, FILE, null 
 },pending,[(container_1417589109512_0001_02_03)],119413444132127,DOWNLOADING}
 java.io.FileNotFoundException: File /data/yarn/nm/filecache does not exist
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:524)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514)
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1051)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:162)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:197)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:724)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:720)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:720)
   at org.apache.hadoop.yarn.util.FSDownload.createDir(FSDownload.java:104)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:351)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 2014-12-02 22:57:32,629 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1417589109512_0001_02_03 transitioned from 
 LOCALIZING to LOCALIZATION_FAILED
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2931) PublicLocalizer may fail with FileNotFoundException until directory gets initialized by LocalizeRunner

2014-12-08 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2931:
---
Priority: Critical  (was: Major)

 PublicLocalizer may fail with FileNotFoundException until directory gets 
 initialized by LocalizeRunner
 --

 Key: YARN-2931
 URL: https://issues.apache.org/jira/browse/YARN-2931
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical
 Attachments: YARN-2931.001.patch, YARN-2931.002.patch, 
 YARN-2931.002.patch, YARN-2931.003.patch, YARN-2931.004.patch


 When the data directory is cleaned up and NM is started with existing 
 recovery state, because of YARN-90, it will not recreate the local dirs.
 This causes a PublicLocalizer to fail until getInitializedLocalDirs is called 
 due to some LocalizeRunner for private localization.
 Example error 
 {noformat}
 2014-12-02 22:57:32,629 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc { { hdfs:/blah 
 machine:8020/tmp/hive-hive/hive_2014-12-02_22-56-58_741_2045919883676051996-3/-mr-10004/8060c9dd-54b6-42fc-9d77-34b655fa5e82/reduce.xml,
  1417589819618, FILE, null 
 },pending,[(container_1417589109512_0001_02_03)],119413444132127,DOWNLOADING}
 java.io.FileNotFoundException: File /data/yarn/nm/filecache does not exist
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:524)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514)
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1051)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:162)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:197)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:724)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:720)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:720)
   at org.apache.hadoop.yarn.util.FSDownload.createDir(FSDownload.java:104)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:351)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 2014-12-02 22:57:32,629 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1417589109512_0001_02_03 transitioned from 
 LOCALIZING to LOCALIZATION_FAILED
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2931) PublicLocalizer may fail with FileNotFoundException until directory gets initialized by LocalizeRunner

2014-12-08 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238919#comment-14238919
 ] 

Karthik Kambatla commented on YARN-2931:


+1, pending Jenkins. 

 PublicLocalizer may fail with FileNotFoundException until directory gets 
 initialized by LocalizeRunner
 --

 Key: YARN-2931
 URL: https://issues.apache.org/jira/browse/YARN-2931
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical
 Attachments: YARN-2931.001.patch, YARN-2931.002.patch, 
 YARN-2931.002.patch, YARN-2931.003.patch, YARN-2931.004.patch


 When the data directory is cleaned up and NM is started with existing 
 recovery state, because of YARN-90, it will not recreate the local dirs.
 This causes a PublicLocalizer to fail until getInitializedLocalDirs is called 
 due to some LocalizeRunner for private localization.
 Example error 
 {noformat}
 2014-12-02 22:57:32,629 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc { { hdfs:/blah 
 machine:8020/tmp/hive-hive/hive_2014-12-02_22-56-58_741_2045919883676051996-3/-mr-10004/8060c9dd-54b6-42fc-9d77-34b655fa5e82/reduce.xml,
  1417589819618, FILE, null 
 },pending,[(container_1417589109512_0001_02_03)],119413444132127,DOWNLOADING}
 java.io.FileNotFoundException: File /data/yarn/nm/filecache does not exist
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:524)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514)
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1051)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:162)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:197)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:724)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:720)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:720)
   at org.apache.hadoop.yarn.util.FSDownload.createDir(FSDownload.java:104)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:351)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 2014-12-02 22:57:32,629 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1417589109512_0001_02_03 transitioned from 
 LOCALIZING to LOCALIZATION_FAILED
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >