[jira] [Created] (YARN-4795) ContainerMetrics drops records

2016-03-11 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-4795:
--

 Summary: ContainerMetrics drops records
 Key: YARN-4795
 URL: https://issues.apache.org/jira/browse/YARN-4795
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.9.0
Reporter: Daniel Templeton
Assignee: Daniel Templeton


The metrics2 system was implemented to deal with persistent sources.  
{{ContainerMetrics}} is an ephemeral source, and so it causes problems.  
Specifically, the {{ContainerMetrics}} only reports metrics once after the 
container has been stopped.  This behavior is a problem because the metrics2 
system can ask sources for reports that will be quietly dropped by the sinks 
that care.  (It's a metrics2 feature, not a bug.)  If that final report is 
silently dropped, it's lost, because the {{ContainerMetrics}} won't report 
anything else ever anymore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4719) Add a helper library to maintain node state and allows common queries

2016-03-11 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4719:
---
Attachment: yarn-4719-6.patch

> Add a helper library to maintain node state and allows common queries
> -
>
> Key: YARN-4719
> URL: https://issues.apache.org/jira/browse/YARN-4719
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-4719-1.patch, yarn-4719-2.patch, yarn-4719-3.patch, 
> yarn-4719-4.patch, yarn-4719-5.patch, yarn-4719-6.patch
>
>
> The scheduler could use a helper library to maintain node state and allowing 
> matching/sorting queries. Several reasons for this:
> # Today, a lot of the node state management is done separately in each 
> scheduler. Having a single library will take us that much closer to reducing 
> duplication among schedulers.
> # Adding a filtering/matching API would simplify node labels and locality 
> significantly. 
> # An API that returns a sorted list for a custom comparator would help 
> YARN-1011 where we want to sort by allocation and utilization for 
> continuous/asynchronous and opportunistic scheduling respectively. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request

2016-03-11 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4108:
-
Attachment: YARN-4108.9.patch

> CapacityScheduler: Improve preemption to preempt only those containers that 
> would satisfy the incoming request
> --
>
> Key: YARN-4108
> URL: https://issues.apache.org/jira/browse/YARN-4108
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4108-design-doc-V3.pdf, 
> YARN-4108-design-doc-v1.pdf, YARN-4108-design-doc-v2.pdf, YARN-4108.1.patch, 
> YARN-4108.2.patch, YARN-4108.3.patch, YARN-4108.4.patch, YARN-4108.5.patch, 
> YARN-4108.6.patch, YARN-4108.7.patch, YARN-4108.8.patch, YARN-4108.9.patch, 
> YARN-4108.poc.1.patch, YARN-4108.poc.2-WIP.patch, YARN-4108.poc.3-WIP.patch, 
> YARN-4108.poc.4-WIP.patch
>
>
> This is sibling JIRA for YARN-2154. We should make sure container preemption 
> is more effective.
> *Requirements:*:
> 1) Can handle case of user-limit preemption
> 2) Can handle case of resource placement requirements, such as: hard-locality 
> (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I 
> don't want to use rack1 and host\[1-3\])
> 3) Can handle preemption within a queue: cross user preemption (YARN-2113), 
> cross applicaiton preemption (such as priority-based (YARN-1963) / 
> fairness-based (YARN-3319)).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4719) Add a helper library to maintain node state and allows common queries

2016-03-11 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191596#comment-15191596
 ] 

Karthik Kambatla commented on YARN-4719:


bq. Can we make ClusterNodeTracker an interface and make the current one in 
latest patch a Default Implementation ?
Good idea, but as Wangda says, we can do this lazily. The longer we wait, we 
probably also will have a good understanding of what the interface should be 
like. 

Updating patch to fix the nit shortly. 

> Add a helper library to maintain node state and allows common queries
> -
>
> Key: YARN-4719
> URL: https://issues.apache.org/jira/browse/YARN-4719
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-4719-1.patch, yarn-4719-2.patch, yarn-4719-3.patch, 
> yarn-4719-4.patch, yarn-4719-5.patch
>
>
> The scheduler could use a helper library to maintain node state and allowing 
> matching/sorting queries. Several reasons for this:
> # Today, a lot of the node state management is done separately in each 
> scheduler. Having a single library will take us that much closer to reducing 
> duplication among schedulers.
> # Adding a filtering/matching API would simplify node labels and locality 
> significantly. 
> # An API that returns a sorted list for a custom comparator would help 
> YARN-1011 where we want to sort by allocation and utilization for 
> continuous/asynchronous and opportunistic scheduling respectively. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4784) fair scheduler: defaultQueueSchedulingPolicy should not accept fifo as a value

2016-03-11 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4784:
---
Fix Version/s: (was: 2.9.0)

> fair scheduler: defaultQueueSchedulingPolicy should not accept fifo as a value
> --
>
> Key: YARN-4784
> URL: https://issues.apache.org/jira/browse/YARN-4784
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-4784.001.patch
>
>
> The configure item defaultQueueSchedulingPolicy should not accept fifo as a 
> value since it is an invalid value for non-leaf queues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4784) fair scheduler: defaultQueueSchedulingPolicy should not accept fifo as a value

2016-03-11 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4784:
---
Component/s: (was: yarn)

> fair scheduler: defaultQueueSchedulingPolicy should not accept fifo as a value
> --
>
> Key: YARN-4784
> URL: https://issues.apache.org/jira/browse/YARN-4784
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-4784.001.patch
>
>
> The configure item defaultQueueSchedulingPolicy should not accept fifo as a 
> value since it is an invalid value for non-leaf queues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3054) Preempt policy in FairScheduler may cause mapreduce job never finish

2016-03-11 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191556#comment-15191556
 ] 

Karthik Kambatla commented on YARN-3054:


Also, the approach being proposed on YARN-4752 doesn't take container priority 
into consideration anymore. So, we shouldn't systemic bias towards preempting 
low priority containers. 

> Preempt policy in FairScheduler may cause mapreduce job never finish
> 
>
> Key: YARN-3054
> URL: https://issues.apache.org/jira/browse/YARN-3054
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>
> Preemption policy is related with schedule policy now. Using comparator of 
> schedule policy to find preemption candidate cannot guarantee a subset of 
> containers never be preempted. And this may cause tasks to be preempted 
> periodically before they finish. So job cannot make any progress. 
> I think preemption in YARN should got below assurance:
> 1. Mapreduce jobs can get additional resources when others are idle;
> 2. Mapreduce jobs for one user in one queue can still progress with its min 
> share when others preempt resources back.
> Maybe always preempt the latest app and container can get this? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3054) Preempt policy in FairScheduler may cause mapreduce job never finish

2016-03-11 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191554#comment-15191554
 ] 

Karthik Kambatla commented on YARN-3054:


Also, the approach being proposed on YARN-4752 doesn't take container priority 
into consideration anymore. So, we shouldn't systemic bias towards preempting 
low priority containers. 

> Preempt policy in FairScheduler may cause mapreduce job never finish
> 
>
> Key: YARN-3054
> URL: https://issues.apache.org/jira/browse/YARN-3054
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>
> Preemption policy is related with schedule policy now. Using comparator of 
> schedule policy to find preemption candidate cannot guarantee a subset of 
> containers never be preempted. And this may cause tasks to be preempted 
> periodically before they finish. So job cannot make any progress. 
> I think preemption in YARN should got below assurance:
> 1. Mapreduce jobs can get additional resources when others are idle;
> 2. Mapreduce jobs for one user in one queue can still progress with its min 
> share when others preempt resources back.
> Maybe always preempt the latest app and container can get this? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4792) FairScheduler should do sanity checks on its configuration

2016-03-11 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4792:
---
Component/s: (was: yarn)

> FairScheduler should do sanity checks on its configuration
> --
>
> Key: YARN-4792
> URL: https://issues.apache.org/jira/browse/YARN-4792
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>
> The FairScheduler does not perform any sanity checks on the configuration 
> that it uses.
> This can lead to a configuration that is legal but does not make sense in a 
> cluster to use, and causes support cases to be filed:
> limit the root queue to a certain size for memory or vcores: the cluster 
> resources limit the root queue already and there should be no need for an 
> extra artificial limit
> setting max running applications on a leaf queue larger than the parent 
> queue(s)
> setting max running applications larger than the queue size (example: minimum 
> vcore allocation * number of apps > total vcores in the queue)
> There are possibly more checks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1961) Fair scheduler preemption doesn't work for non-leaf queues

2016-03-11 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla resolved YARN-1961.

Resolution: Won't Fix

Closing this as "Won't Fix". Please reopen if there is a good reason to do 
this. 

> Fair scheduler preemption doesn't work for non-leaf queues
> --
>
> Key: YARN-1961
> URL: https://issues.apache.org/jira/browse/YARN-1961
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, scheduler
>Affects Versions: 2.4.0
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
>  Labels: scheduler
>
> Setting minResources and minSharePreemptionTimeout to a non-leaf queue 
> doesn't cause preemption to happen when that non-leaf queue is below 
> minResources and there are outstanding demands in that non-leaf queue.
> Here is an example fs allocation config(partial) :
> {code:xml}
> 
>   3072 mb,0 vcores
>   30
> 
> 
> 
> 
>  
>  {code}
> With the above configs,preemption doesn't seem to happen if queue abc is 
> below minShare and it has outstanding unsatisfied demands from apps in its 
> child queues. Ideally in such cases we would like preemption to kick off and 
> reclaim resources from other queues(not under queue abc).
> Looking at the code it seems like preemption checks for starvation only at 
> the leaf queue level and not at the parent level.
> {code:title=FairScheduler.java|borderStyle=solid}
> boolean isStarvedForMinShare(FSLeafQueue sched)
> boolean isStarvedForFairShare(FSLeafQueue sched)
> {code}
> This affects our use case where we have a parent queue with probably a 100 
> unconfigured leaf queues under it.We want to give a minshare to the parent 
> queue to protect all the leaf queues under it,but we cannot do it due to this 
> bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4006) YARN ATS Alternate Kerberos HTTP Authentication Changes

2016-03-11 Thread Greg Senia (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Senia updated YARN-4006:
-
Attachment: YARN-4006-branch-trunk.patch

Updated to handle an error.. Sample code coming shortly

> YARN ATS Alternate Kerberos HTTP Authentication Changes
> ---
>
> Key: YARN-4006
> URL: https://issues.apache.org/jira/browse/YARN-4006
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security, timelineserver
>Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.5.1, 2.6.1, 2.8.0, 2.7.1, 2.7.2
>Reporter: Greg Senia
>Assignee: Greg Senia
> Attachments: YARN-4006-branch-trunk.patch, YARN-4006-branch2.6.0.patch
>
>
> When attempting to use The Hadoop Alternate Authentication Classes. They do 
> not exactly work with what was built with 
> https://issues.apache.org/jira/browse/YARN-1935.
> I went ahead and made the following changes to support using a Custom 
> AltKerberos DelegationToken custom class.
> Changes to: TimelineAuthenticationFilterInitializer.class
> {code}
>String authType = filterConfig.get(AuthenticationFilter.AUTH_TYPE);
> LOG.info("AuthType Configured: "+authType);
> if (authType.equals(PseudoAuthenticationHandler.TYPE)) {
>   filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   PseudoDelegationTokenAuthenticationHandler.class.getName());
> LOG.info("AuthType: PseudoDelegationTokenAuthenticationHandler");
> } else if (authType.equals(KerberosAuthenticationHandler.TYPE) || 
> (UserGroupInformation.isSecurityEnabled() && 
> conf.get("hadoop.security.authentication").equals(KerberosAuthenticationHandler.TYPE)))
>  {
>   if (!(authType.equals(KerberosAuthenticationHandler.TYPE))) {
> filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   authType);
> LOG.info("AuthType: "+authType);
>   } else {
> filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   KerberosDelegationTokenAuthenticationHandler.class.getName());
> LOG.info("AuthType: KerberosDelegationTokenAuthenticationHandler");
>   } 
>   // Resolve _HOST into bind address
>   String bindAddress = conf.get(HttpServer2.BIND_ADDRESS);
>   String principal =
>   filterConfig.get(KerberosAuthenticationHandler.PRINCIPAL);
>   if (principal != null) {
> try {
>   principal = SecurityUtil.getServerPrincipal(principal, bindAddress);
> } catch (IOException ex) {
>   throw new RuntimeException(
>   "Could not resolve Kerberos principal name: " + ex.toString(), 
> ex);
> }
> filterConfig.put(KerberosAuthenticationHandler.PRINCIPAL,
> principal);
>   }
> }
>  {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4006) YARN ATS Alternate Kerberos HTTP Authentication Changes

2016-03-11 Thread Greg Senia (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Senia updated YARN-4006:
-
Attachment: (was: YARN-4006-branch-trunk.patch)

> YARN ATS Alternate Kerberos HTTP Authentication Changes
> ---
>
> Key: YARN-4006
> URL: https://issues.apache.org/jira/browse/YARN-4006
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security, timelineserver
>Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.5.1, 2.6.1, 2.8.0, 2.7.1, 2.7.2
>Reporter: Greg Senia
>Assignee: Greg Senia
> Attachments: YARN-4006-branch2.6.0.patch
>
>
> When attempting to use The Hadoop Alternate Authentication Classes. They do 
> not exactly work with what was built with 
> https://issues.apache.org/jira/browse/YARN-1935.
> I went ahead and made the following changes to support using a Custom 
> AltKerberos DelegationToken custom class.
> Changes to: TimelineAuthenticationFilterInitializer.class
> {code}
>String authType = filterConfig.get(AuthenticationFilter.AUTH_TYPE);
> LOG.info("AuthType Configured: "+authType);
> if (authType.equals(PseudoAuthenticationHandler.TYPE)) {
>   filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   PseudoDelegationTokenAuthenticationHandler.class.getName());
> LOG.info("AuthType: PseudoDelegationTokenAuthenticationHandler");
> } else if (authType.equals(KerberosAuthenticationHandler.TYPE) || 
> (UserGroupInformation.isSecurityEnabled() && 
> conf.get("hadoop.security.authentication").equals(KerberosAuthenticationHandler.TYPE)))
>  {
>   if (!(authType.equals(KerberosAuthenticationHandler.TYPE))) {
> filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   authType);
> LOG.info("AuthType: "+authType);
>   } else {
> filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   KerberosDelegationTokenAuthenticationHandler.class.getName());
> LOG.info("AuthType: KerberosDelegationTokenAuthenticationHandler");
>   } 
>   // Resolve _HOST into bind address
>   String bindAddress = conf.get(HttpServer2.BIND_ADDRESS);
>   String principal =
>   filterConfig.get(KerberosAuthenticationHandler.PRINCIPAL);
>   if (principal != null) {
> try {
>   principal = SecurityUtil.getServerPrincipal(principal, bindAddress);
> } catch (IOException ex) {
>   throw new RuntimeException(
>   "Could not resolve Kerberos principal name: " + ex.toString(), 
> ex);
> }
> filterConfig.put(KerberosAuthenticationHandler.PRINCIPAL,
> principal);
>   }
> }
>  {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3054) Preempt policy in FairScheduler may cause mapreduce job never finish

2016-03-11 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191538#comment-15191538
 ] 

Karthik Kambatla commented on YARN-3054:


Thanks [~peng.zhang]. I understand it now. 

The elegant way of handling this would be to have a preemption priority or even 
a preemption cost per container, which is different from the priority that is 
used for allocation. That is a larger conversation to be had. Let us move this 
out of this umbrella and look at it for both schedulers together. 

That said, I would expect MapReduce to realize that pending mappers are blocked 
on waiting reducers and resolve this. MAPREDUCE-6302 and co. attempt to fix 
this, so you shouldn't see issues with job completion itself. 

> Preempt policy in FairScheduler may cause mapreduce job never finish
> 
>
> Key: YARN-3054
> URL: https://issues.apache.org/jira/browse/YARN-3054
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>
> Preemption policy is related with schedule policy now. Using comparator of 
> schedule policy to find preemption candidate cannot guarantee a subset of 
> containers never be preempted. And this may cause tasks to be preempted 
> periodically before they finish. So job cannot make any progress. 
> I think preemption in YARN should got below assurance:
> 1. Mapreduce jobs can get additional resources when others are idle;
> 2. Mapreduce jobs for one user in one queue can still progress with its min 
> share when others preempt resources back.
> Maybe always preempt the latest app and container can get this? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4150) Failure in TestNMClient because nodereports were not available

2016-03-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191537#comment-15191537
 ] 

Hadoop QA commented on YARN-4150:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
37s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
34s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 30s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_74. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 48s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 142m 29s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_74 Failed junit tests | hadoop.yarn.client.TestGetGroups |
| JDK v1.8.0_74 Timed out junit tests | 
org.apache.hadoop.yarn.client.cli.TestYarnCLI |
|   | org.apache.hadoop.yarn.client.api.impl.TestYarnClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestNMClient |
| JDK v1.7.0_95 Failed junit tests | hadoop.yarn.client.TestGetGroups |
| JDK v1.7.0_95 Timed out junit tests | 
org.apache.hadoop.yarn.client.cli.TestYarnCLI |
|   | org.apache.hadoop.yarn.client.api.impl.TestYarnClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient |
|   | 

[jira] [Commented] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request

2016-03-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191518#comment-15191518
 ] 

Hadoop QA commented on YARN-4108:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 16 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
23s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 3m 1s {color} 
| {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_74
 with JDK v1.8.0_74 generated 1 new + 1 unchanged - 1 fixed = 2 total (was 2) 
{color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 21s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 56 new + 477 unchanged - 15 fixed = 533 total (was 492) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 3s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 73m 44s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 162m 31s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_74 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK 

[jira] [Commented] (YARN-4545) Allow YARN distributed shell to use ATS v1.5 APIs

2016-03-11 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191508#comment-15191508
 ] 

Naganarasimha G R commented on YARN-4545:
-

Hi [~gtCarrera],
Overall approach is fine and having the method in TimelineUtils is better 
approach and Some small nits :-
# In ApplicationMaster getConf has default acccess and has following 
annotations : @VisibleForTesting & @Private, i checked the reference and seems 
like not used any where apart from NMCallbackHandler, so i think it can be set 
as private and we can remove the annotations
# {{publishContainerStartEvent}} is kept as static and private so that it can 
be made accessible from static class NMCallbackHandler, but we have the 
reference of ApplicationMaster so i dont feel we need to make it static we can 
call through the reference directly and thus we need not get the conf object in 
NMCallbackHandler and add additional argument in {{publishContainerStartEvent}}.
# Seems like similar nits exist for all static methods i dont see any necessity 
for static methods as RMCallbackHandler is *not* static inner class and even if 
required it would be better to have reference to ApplicationMaster.
# There seems to be findbugs issue either we need to add it findbugs exclude 
file or better to remove {{private Configuration configuration}} as its 
available from get parent by using *getConfig*
# some of the checkstyle issues (like line greater than 80 chars can be fixed)

> Allow YARN distributed shell to use ATS v1.5 APIs
> -
>
> Key: YARN-4545
> URL: https://issues.apache.org/jira/browse/YARN-4545
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4545-YARN-4265.001.patch, 
> YARN-4545-trunk.001.patch, YARN-4545-trunk.002.patch, 
> YARN-4545-trunk.003.patch, YARN-4545-trunk.004.patch, 
> YARN-4545-trunk.005.patch, YARN-4545-trunk.006.patch, 
> YARN-4545-trunk.007.patch
>
>
> We can use YARN distributed shell as a demo for the ATS v1.5 APIs. We need to 
> allow distributed shell post data with ATS v1.5 API if 1.5 is enabled in the 
> system. We also need to provide a sample plugin to read those data out. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4117) End to end unit test with mini YARN cluster for AMRMProxy Service

2016-03-11 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191478#comment-15191478
 ] 

Giovanni Matteo Fumarola commented on YARN-4117:


Increase the timeout for the tests.

> End to end unit test with mini YARN cluster for AMRMProxy Service
> -
>
> Key: YARN-4117
> URL: https://issues.apache.org/jira/browse/YARN-4117
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Reporter: Kishore Chaliparambil
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-4117.v0.patch, YARN-4117.v1.patch, 
> YARN-4117.v2.patch, YARN-4117.v3.patch, YARN-4117.v4.patch
>
>
> YARN-2884 introduces a proxy between AM and RM. This JIRA proposes an end to 
> end unit test using mini YARN cluster to the AMRMProxy service. This test 
> will validate register, allocate and finish application and token renewal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4117) End to end unit test with mini YARN cluster for AMRMProxy Service

2016-03-11 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-4117:
---
Attachment: YARN-4117.v4.patch

> End to end unit test with mini YARN cluster for AMRMProxy Service
> -
>
> Key: YARN-4117
> URL: https://issues.apache.org/jira/browse/YARN-4117
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Reporter: Kishore Chaliparambil
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-4117.v0.patch, YARN-4117.v1.patch, 
> YARN-4117.v2.patch, YARN-4117.v3.patch, YARN-4117.v4.patch
>
>
> YARN-2884 introduces a proxy between AM and RM. This JIRA proposes an end to 
> end unit test using mini YARN cluster to the AMRMProxy service. This test 
> will validate register, allocate and finish application and token renewal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4794) Distributed shell app gets stuck on stopping containers after App completes

2016-03-11 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191453#comment-15191453
 ] 

Jian He commented on YARN-4794:
---

there's a dead lock in NMClient. 
In NMClientImpl#startContainer, it first grabs "startingContainer" lock, and if 
exception happens, it calls removeStartedContainer which grabs "NMClient" lock. 
On the other hand, if NMClient is stopping in the meantime, it first calls 
cleanupRunningContainers which takes "NMClient" lock, and then calls 
stopContainer which takes "container" lock. Thus, deadlock.


> Distributed shell app gets stuck on stopping containers after App completes
> ---
>
> Key: YARN-4794
> URL: https://issues.apache.org/jira/browse/YARN-4794
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
>
> Distributed shell app gets stuck on stopping containers after App completes 
> with the following exception
> {code:title = app log}
> 15/12/10 14:52:20 INFO distributedshell.ApplicationMaster: Application 
> completed. Stopping running containers
> 15/12/10 14:52:20 WARN ipc.Client: Exception encountered while connecting to 
> the server : java.nio.channels.ClosedByInterruptException
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4794) Distributed shell app gets stuck on stopping containers after App completes

2016-03-11 Thread Sumana Sathish (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumana Sathish updated YARN-4794:
-
Description: 
Distributed shell app gets stuck on stopping containers after App completes 
with the following exception
{code:title = app log}
15/12/10 14:52:20 INFO distributedshell.ApplicationMaster: Application 
completed. Stopping running containers
15/12/10 14:52:20 WARN ipc.Client: Exception encountered while connecting to 
the server : java.nio.channels.ClosedByInterruptException
{code}

  was:
Distributed shell app gets stuck on stopping containers after App completes 
with the following exception
{code:title = app log}
16/03/10 23:19:41 WARN ipc.Client: Exception encountered while connecting to 
the server : 
java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:407)
at 
org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:367)
at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:558)
at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:373)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:727)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:723)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:722)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:373)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1493)
at org.apache.hadoop.ipc.Client.call(Client.java:1397)
at org.apache.hadoop.ipc.Client.call(Client.java:1358)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy30.startContainers(Unknown Source)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy31.startContainers(Unknown Source)
at 
org.apache.hadoop.yarn.client.api.impl.NMClientImpl.startContainer(NMClientImpl.java:205)
at 
org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:382)
at 
org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:368)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer.handle(NMClientAsyncImpl.java:503)
at 
org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$ContainerEventProcessor.run(NMClientAsyncImpl.java:562)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}


> Distributed shell app gets stuck on stopping containers after App completes
> ---
>
> Key: YARN-4794
> URL: https://issues.apache.org/jira/browse/YARN-4794
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Jian 

[jira] [Updated] (YARN-4794) Distributed shell app gets stuck on stopping containers after App completes

2016-03-11 Thread Sumana Sathish (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumana Sathish updated YARN-4794:
-
Description: 
Distributed shell app gets stuck on stopping containers after App completes 
with the following exception
{code:title = app log}
16/03/10 23:19:41 WARN ipc.Client: Exception encountered while connecting to 
the server : 
java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:407)
at 
org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:367)
at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:558)
at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:373)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:727)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:723)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:722)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:373)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1493)
at org.apache.hadoop.ipc.Client.call(Client.java:1397)
at org.apache.hadoop.ipc.Client.call(Client.java:1358)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy30.startContainers(Unknown Source)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy31.startContainers(Unknown Source)
at 
org.apache.hadoop.yarn.client.api.impl.NMClientImpl.startContainer(NMClientImpl.java:205)
at 
org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:382)
at 
org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:368)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer.handle(NMClientAsyncImpl.java:503)
at 
org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$ContainerEventProcessor.run(NMClientAsyncImpl.java:562)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

  was:
Distributed shell app gets stuck on stopping containers after App completes 
with the following exception
{code:title = app log}
16/03/10 23:19:41 INFO distributedshell.ApplicationMaster: Application 
completed. Stopping running containers
16/03/10 23:19:41 INFO impl.NMClientAsyncImpl: NM Client is being stopped.
16/03/10 23:19:41 INFO impl.NMClientAsyncImpl: Waiting for 
eventDispatcherThread to be interrupted.
16/03/10 23:19:41 INFO impl.NMClientAsyncImpl: eventDispatcherThread exited.
16/03/10 23:19:41 INFO impl.NMClientAsyncImpl: Stopping NM client.
16/03/10 23:19:41 INFO impl.NMClientImpl: Clean up running containers on stop.
16/03/10 23:19:41 INFO impl.NMClientImpl: Stopping 
container_e05_1457650296862_0046_01_08
16/03/10 23:19:41 INFO impl.NMClientImpl: 

[jira] [Assigned] (YARN-4794) Distributed shell app gets stuck on stopping containers after App completes

2016-03-11 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He reassigned YARN-4794:
-

Assignee: Jian He

> Distributed shell app gets stuck on stopping containers after App completes
> ---
>
> Key: YARN-4794
> URL: https://issues.apache.org/jira/browse/YARN-4794
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
>
> Distributed shell app gets stuck on stopping containers after App completes 
> with the following exception
> {code:title = app log}
> 16/03/10 23:19:41 INFO distributedshell.ApplicationMaster: Application 
> completed. Stopping running containers
> 16/03/10 23:19:41 INFO impl.NMClientAsyncImpl: NM Client is being stopped.
> 16/03/10 23:19:41 INFO impl.NMClientAsyncImpl: Waiting for 
> eventDispatcherThread to be interrupted.
> 16/03/10 23:19:41 INFO impl.NMClientAsyncImpl: eventDispatcherThread exited.
> 16/03/10 23:19:41 INFO impl.NMClientAsyncImpl: Stopping NM client.
> 16/03/10 23:19:41 INFO impl.NMClientImpl: Clean up running containers on stop.
> 16/03/10 23:19:41 INFO impl.NMClientImpl: Stopping 
> container_e05_1457650296862_0046_01_08
> 16/03/10 23:19:41 INFO impl.NMClientImpl: ok, stopContainerInternal.. 
> container_e05_1457650296862_0046_01_08
> 16/03/10 23:19:41 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> yarn-sumana-1.novalocal:25454
> 16/03/10 23:19:41 WARN ipc.Client: Exception encountered while connecting to 
> the server : 
> java.nio.channels.ClosedByInterruptException
>   at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>   at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:407)
>   at 
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>   at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
>   at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
>   at java.io.FilterInputStream.read(FilterInputStream.java:133)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>   at java.io.DataInputStream.readInt(DataInputStream.java:387)
>   at 
> org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:367)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:558)
>   at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:373)
>   at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:727)
>   at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:723)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:722)
>   at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:373)
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1493)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1397)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1358)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>   at com.sun.proxy.$Proxy30.startContainers(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>   at com.sun.proxy.$Proxy31.startContainers(Unknown Source)
>   at 
> org.apache.hadoop.yarn.client.api.impl.NMClientImpl.startContainer(NMClientImpl.java:205)
>   at 
> org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:382)
>   at 
> org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:368)
>   at 
> 

[jira] [Commented] (YARN-4150) Failure in TestNMClient because nodereports were not available

2016-03-11 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191429#comment-15191429
 ] 

Robert Kanter commented on YARN-4150:
-

I'd say that the test failures were unrelated given the patch, except that 
TestNMClient was one of the failures.  I've kicked off another run because it 
looks like something funny must have happened on the last one.

> Failure in TestNMClient because nodereports were not available
> --
>
> Key: YARN-4150
> URL: https://issues.apache.org/jira/browse/YARN-4150
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-4150.001.patch
>
>
> Saw a failure in a test run
> https://builds.apache.org/job/PreCommit-YARN-Build/9010/testReport/
> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>   at java.util.ArrayList.get(ArrayList.java:411)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.allocateContainers(TestNMClient.java:244)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClientNoCleanupOnStop(TestNMClient.java:210)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4794) Distributed shell app gets stuck on stopping containers after App completes

2016-03-11 Thread Sumana Sathish (JIRA)
Sumana Sathish created YARN-4794:


 Summary: Distributed shell app gets stuck on stopping containers 
after App completes
 Key: YARN-4794
 URL: https://issues.apache.org/jira/browse/YARN-4794
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Sumana Sathish
Priority: Critical


Distributed shell app gets stuck on stopping containers after App completes 
with the following exception
{code:title = app log}
16/03/10 23:19:41 INFO distributedshell.ApplicationMaster: Application 
completed. Stopping running containers
16/03/10 23:19:41 INFO impl.NMClientAsyncImpl: NM Client is being stopped.
16/03/10 23:19:41 INFO impl.NMClientAsyncImpl: Waiting for 
eventDispatcherThread to be interrupted.
16/03/10 23:19:41 INFO impl.NMClientAsyncImpl: eventDispatcherThread exited.
16/03/10 23:19:41 INFO impl.NMClientAsyncImpl: Stopping NM client.
16/03/10 23:19:41 INFO impl.NMClientImpl: Clean up running containers on stop.
16/03/10 23:19:41 INFO impl.NMClientImpl: Stopping 
container_e05_1457650296862_0046_01_08
16/03/10 23:19:41 INFO impl.NMClientImpl: ok, stopContainerInternal.. 
container_e05_1457650296862_0046_01_08
16/03/10 23:19:41 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
yarn-sumana-1.novalocal:25454
16/03/10 23:19:41 WARN ipc.Client: Exception encountered while connecting to 
the server : 
java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:407)
at 
org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at 
org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:367)
at 
org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:558)
at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:373)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:727)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:723)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:722)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:373)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1493)
at org.apache.hadoop.ipc.Client.call(Client.java:1397)
at org.apache.hadoop.ipc.Client.call(Client.java:1358)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy30.startContainers(Unknown Source)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy31.startContainers(Unknown Source)
at 
org.apache.hadoop.yarn.client.api.impl.NMClientImpl.startContainer(NMClientImpl.java:205)
at 
org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:382)
at 
org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:368)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 

[jira] [Commented] (YARN-4789) Provide helpful exception for non-PATH-like conflict with admin.user.env

2016-03-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191411#comment-15191411
 ] 

Hadoop QA commented on YARN-4789:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
48s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 8s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 53s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_74. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 9s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 20m 53s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12792776/YARN-4789.001.patch |
| JIRA Issue | YARN-4789 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 0e37b3c48be5 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 7542996 |
| Default Java | 1.7.0_95 |
| Multi-JDK versions |  

[jira] [Commented] (YARN-998) Persistent resource change during NM/RM restart

2016-03-11 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191389#comment-15191389
 ] 

Junping Du commented on YARN-998:
-

Thanks Kenji for providing a sample patch. 
Several improvements should make over the current sample patch:
1. We should load DynamicResourceConfiguration only one time in 
ResourceTrackerService in initialization instead of load it every time NM 
register it.
2. The config will be refreshed when RMAdmin CLI - "refreshNodeResources" is 
triggered to make sure NM restart case is addressed.
3. No need to persistent individual node update CLI - "updateNodeResource" to 
dynamic resource file. If user want it get persistent, then it should update 
the dynamic resource file as long as trigger the node CLI.
4. Some duplicated code with YARN-1911 should get removed.
Will deliver an updated patch soon.

> Persistent resource change during NM/RM restart
> ---
>
> Key: YARN-998
> URL: https://issues.apache.org/jira/browse/YARN-998
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-998-sample.patch
>
>
> When NM is restarted by plan or from a failure, previous dynamic resource 
> setting should be kept for consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4789) Provide helpful exception for non-PATH-like conflict with admin.user.env

2016-03-11 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191386#comment-15191386
 ] 

Gera Shegalov commented on YARN-4789:
-

Thanks Jason, this is indeed the case. The question is whether we can do a 
small change fast before a more involved MAPREDUCE-6491 is committed?

> Provide helpful exception for non-PATH-like conflict with admin.user.env
> 
>
> Key: YARN-4789
> URL: https://issues.apache.org/jira/browse/YARN-4789
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
> Attachments: YARN-4789.001.patch
>
>
> Environment variables specified in mapreduce.admin.user.env are supposed to 
> be paths (class, shell, library) and they can be merged with the 
> user-provided values. However, it's also possible that the cluster admins 
> specify some non-PATH-like variable such as JAVA_HOME. In this case if there 
> is the same variable provided by the user, we'll get a concatenation that 
> does not make sense and is difficult to debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4789) Provide helpful exception for non-PATH-like conflict with admin.user.env

2016-03-11 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191384#comment-15191384
 ] 

Gera Shegalov commented on YARN-4789:
-

The patch is throwing an exception only when both the user and the admin 
specified an environment variable that cannot be reconciled via concatenation 
as it happens with various *PATHs, which is the intent of the option 2.

Following option 1 would replace the user env. But this seemed to me to violate 
the spirit of this conf. This conf is designed to preserve the admin settings  
while allowing to be overridden by the user. That's why I thought warning both 
sides about misconfig is the best course of action here.

> Provide helpful exception for non-PATH-like conflict with admin.user.env
> 
>
> Key: YARN-4789
> URL: https://issues.apache.org/jira/browse/YARN-4789
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
> Attachments: YARN-4789.001.patch
>
>
> Environment variables specified in mapreduce.admin.user.env are supposed to 
> be paths (class, shell, library) and they can be merged with the 
> user-provided values. However, it's also possible that the cluster admins 
> specify some non-PATH-like variable such as JAVA_HOME. In this case if there 
> is the same variable provided by the user, we'll get a concatenation that 
> does not make sense and is difficult to debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4789) Provide helpful exception for non-PATH-like conflict with admin.user.env

2016-03-11 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191371#comment-15191371
 ] 

Jason Lowe commented on YARN-4789:
--

This looks related to MAPREDUCE-6491.

> Provide helpful exception for non-PATH-like conflict with admin.user.env
> 
>
> Key: YARN-4789
> URL: https://issues.apache.org/jira/browse/YARN-4789
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
> Attachments: YARN-4789.001.patch
>
>
> Environment variables specified in mapreduce.admin.user.env are supposed to 
> be paths (class, shell, library) and they can be merged with the 
> user-provided values. However, it's also possible that the cluster admins 
> specify some non-PATH-like variable such as JAVA_HOME. In this case if there 
> is the same variable provided by the user, we'll get a concatenation that 
> does not make sense and is difficult to debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4793) [Umbrella] Simplified API layer for services and beyond

2016-03-11 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-4793:
-

 Summary: [Umbrella] Simplified API layer for services and beyond
 Key: YARN-4793
 URL: https://issues.apache.org/jira/browse/YARN-4793
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


[See overview doc at YARN-4692, modifying and copy-pasting some of the relevant 
pieces and sub-section 3.3.2 to track the specific sub-item.]

Bringing a new service on YARN today is not a simple experience. The APIs of 
existing frameworks are either too low­ level (native YARN), require writing 
new code (for frameworks with programmatic APIs ) or writing a complex spec 
(for declarative frameworks).

In addition to building critical building blocks inside YARN (as part of other 
efforts at YARN-4692), we should also look to simplifying the user facing story 
for building services. Experience of projects like Slider building real-­life 
services like HBase, Storm, Accumulo, Solr etc gives us some very good 
learnings on how simplified APIs for building services will look like.

To this end, we should look at a new simple-services API layer backed by REST 
interfaces. The REST layer can act as a single point of entry for creation and 
lifecycle management of YARN services. Services here can range from simple 
single-­component apps to the most complex, multi­-component applications 
needing special orchestration needs.

We should also look at making this a unified REST based entry point for other 
important features like resource­-profile management (YARN-3926), 
package-definitions' lifecycle­-management and service­-discovery (YARN-913 / 
YARN-4757). We also need to flesh out its relation to our present much ­lower 
level REST APIs (YARN-1695) in YARN for application-­submission and management.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4719) Add a helper library to maintain node state and allows common queries

2016-03-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191351#comment-15191351
 ] 

Wangda Tan commented on YARN-4719:
--

Looks good to me in general, +1 

One nit: CS#reintializeQueues:

{code}
root.reinitialize(newRoot, getClusterResource());
updatePlacementRules();

// Re-calculate headroom for active applications
Resource clusterResource = nodeTracker.getClusterCapacity();
{code}
Should use getClusterResource instead of nodeTracker.getClusterCapacity.

bq. Can we make ClusterNodeTracker an interface and make the current one in 
latest patch a Default Implementation ?
I would suggest keep it as-is, we can refactor it in the future when we have 
to. Modifying interface needs some additional efforts, we should avoid them now.

> Add a helper library to maintain node state and allows common queries
> -
>
> Key: YARN-4719
> URL: https://issues.apache.org/jira/browse/YARN-4719
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-4719-1.patch, yarn-4719-2.patch, yarn-4719-3.patch, 
> yarn-4719-4.patch, yarn-4719-5.patch
>
>
> The scheduler could use a helper library to maintain node state and allowing 
> matching/sorting queries. Several reasons for this:
> # Today, a lot of the node state management is done separately in each 
> scheduler. Having a single library will take us that much closer to reducing 
> duplication among schedulers.
> # Adding a filtering/matching API would simplify node labels and locality 
> significantly. 
> # An API that returns a sorted list for a custom comparator would help 
> YARN-1011 where we want to sort by allocation and utilization for 
> continuous/asynchronous and opportunistic scheduling respectively. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4789) Provide helpful exception for non-PATH-like conflict with admin.user.env

2016-03-11 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191345#comment-15191345
 ] 

Sangjin Lee commented on YARN-4789:
---

Thanks for filing the issue and the patch [~jira.shegalov]! This definitely is 
a bug.

I also find the descriptions of the admin env properties very confusing:
{noformat}

  mapreduce.admin.user.env
  
  
  Expert: Additional execution environment entries for
  map and reduce task processes. This is not an additive property.
  You must preserve the original value if you want your map and
  reduce tasks to have access to native libraries (compression, etc).
  When this value is empty, the command to set execution
  envrionment will be OS dependent:
  For linux, use LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native.
  For windows, use PATH = %PATH%;%HADOOP_COMMON_HOME%\\bin.
  


  yarn.app.mapreduce.am.admin.user.env
  
   Environment variables for the MR App Master
  processes for admin purposes. These values are set first and can be
  overridden by the user env (yarn.app.mapreduce.am.env) Example :
  1) A=foo  This will set the env variable A to foo
  2) B=$B:c This is inherit app master's B env variable.
  

{noformat}

The statement about it being non-additive is confusing, as for PATH-like 
variables they are very much additive. Also, 
{{yarn.app.mapreduce.am.admin.user.env}} says 
bq. These values are set first and can be overridden by the user env 
(yarn.app.mapreduce.am.env)

So it seems to me that we should revisit these descriptions and improve them, 
especially if we are going to reject user values for non-PATH environment 
variables.

I only took a quick look at the patch, but it appears the exception will be 
thrown every time {{Apps.addToEnvironment()}} is called for a non-PATH 
environment variable? That would cause exceptions for more types of situations 
than we're targeting, right? For example, I suspect even the attempt to set 
{{yarn.app.mapreduce.am.admin.user.env}} would fail:

{code}
// Setup the environment variables for Admin first
MRApps.setEnvFromInputString(environment, 
conf.get(MRJobConfig.MR_AM_ADMIN_USER_ENV,
MRJobConfig.DEFAULT_MR_AM_ADMIN_USER_ENV), conf);
{code}

I think it'd be better to target this change more narrowly.

> Provide helpful exception for non-PATH-like conflict with admin.user.env
> 
>
> Key: YARN-4789
> URL: https://issues.apache.org/jira/browse/YARN-4789
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
> Attachments: YARN-4789.001.patch
>
>
> Environment variables specified in mapreduce.admin.user.env are supposed to 
> be paths (class, shell, library) and they can be merged with the 
> user-provided values. However, it's also possible that the cluster admins 
> specify some non-PATH-like variable such as JAVA_HOME. In this case if there 
> is the same variable provided by the user, we'll get a concatenation that 
> does not make sense and is difficult to debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4719) Add a helper library to maintain node state and allows common queries

2016-03-11 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191326#comment-15191326
 ] 

Arun Suresh commented on YARN-4719:
---

Can we make {{ClusterNodeTracker}} an interface and make the current one in 
latest patch a Default Implementation ?
I can forsee many usecases of having pluggable implementation For eg. an 
in-memory / external Db that can handle filtering / matching of nodes more 
efficiently.

> Add a helper library to maintain node state and allows common queries
> -
>
> Key: YARN-4719
> URL: https://issues.apache.org/jira/browse/YARN-4719
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-4719-1.patch, yarn-4719-2.patch, yarn-4719-3.patch, 
> yarn-4719-4.patch, yarn-4719-5.patch
>
>
> The scheduler could use a helper library to maintain node state and allowing 
> matching/sorting queries. Several reasons for this:
> # Today, a lot of the node state management is done separately in each 
> scheduler. Having a single library will take us that much closer to reducing 
> duplication among schedulers.
> # Adding a filtering/matching API would simplify node labels and locality 
> significantly. 
> # An API that returns a sorted list for a custom comparator would help 
> YARN-1011 where we want to sort by allocation and utilization for 
> continuous/asynchronous and opportunistic scheduling respectively. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request

2016-03-11 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4108:
-
Attachment: YARN-4108.8.patch

Attached ver.8, fixed javac warnings and test failures.

> CapacityScheduler: Improve preemption to preempt only those containers that 
> would satisfy the incoming request
> --
>
> Key: YARN-4108
> URL: https://issues.apache.org/jira/browse/YARN-4108
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4108-design-doc-V3.pdf, 
> YARN-4108-design-doc-v1.pdf, YARN-4108-design-doc-v2.pdf, YARN-4108.1.patch, 
> YARN-4108.2.patch, YARN-4108.3.patch, YARN-4108.4.patch, YARN-4108.5.patch, 
> YARN-4108.6.patch, YARN-4108.7.patch, YARN-4108.8.patch, 
> YARN-4108.poc.1.patch, YARN-4108.poc.2-WIP.patch, YARN-4108.poc.3-WIP.patch, 
> YARN-4108.poc.4-WIP.patch
>
>
> This is sibling JIRA for YARN-2154. We should make sure container preemption 
> is more effective.
> *Requirements:*:
> 1) Can handle case of user-limit preemption
> 2) Can handle case of resource placement requirements, such as: hard-locality 
> (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I 
> don't want to use rack1 and host\[1-3\])
> 3) Can handle preemption within a queue: cross user preemption (YARN-2113), 
> cross applicaiton preemption (such as priority-based (YARN-1963) / 
> fairness-based (YARN-3319)).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4792) FairScheduler should do sanity checks on its configuration

2016-03-11 Thread Yufei Gu (JIRA)
Yufei Gu created YARN-4792:
--

 Summary: FairScheduler should do sanity checks on its configuration
 Key: YARN-4792
 URL: https://issues.apache.org/jira/browse/YARN-4792
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler, yarn
Affects Versions: 2.9.0
Reporter: Yufei Gu
Assignee: Yufei Gu


The FairScheduler does not perform any sanity checks on the configuration that 
it uses.
This can lead to a configuration that is legal but does not make sense in a 
cluster to use, and causes support cases to be filed:
limit the root queue to a certain size for memory or vcores: the cluster 
resources limit the root queue already and there should be no need for an extra 
artificial limit
setting max running applications on a leaf queue larger than the parent queue(s)
setting max running applications larger than the queue size (example: minimum 
vcore allocation * number of apps > total vcores in the queue)
There are possibly more checks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4790) Per user blacklist node for user specific error for container launch failure.

2016-03-11 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191217#comment-15191217
 ] 

Vinod Kumar Vavilapalli commented on YARN-4790:
---

I agree with the problem statement but not necessarily the proposal. Please 
edit the title so that it highlights the problem only so that we can figure out 
whatever the solution is.

What we need is to *not* penalize applications for system related issues. When 
YARN finds a node with configuration / permission issues, it should itself take 
an action to (a) avoid scheduling on that node, (b) alert administrators etc.

Implementing heuristics for app / user level blacklisting to work-around 
platform problems should be a last-ditch effort. We did that in Hadoop 1 
MapReduce as we didn't have clear demarcation between app vs system failures. 
But that isn't the case with YARN - part of the reason why we never implemented 
heuristics based per-app blacklisting *in YARN* - we left that completely up to 
applications.

> Per user blacklist node for user specific error for container launch failure.
> -
>
> Key: YARN-4790
> URL: https://issues.apache.org/jira/browse/YARN-4790
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications
>Reporter: Junping Du
>Assignee: Junping Du
>
> There are some user specific error for container launch failure, like:
> when enabling LinuxContainerExecutor, but some node doesn't have such user 
> exists, so container launch should get failed with following information:
> {noformat}
> 2016-02-14 15:37:03,111 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1434045496283_0036_02 State change from LAUNCHED to FAILED 
> 2016-02-14 15:37:03,111 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application 
> application_1434045496283_0036 failed 2 times due to AM Container for 
> appattempt_1434045496283_0036_02 exited with exitCode: -1000 due to: 
> Application application_1434045496283_0036 initialization failed 
> (exitCode=255) with output: User jdu not found 
> {noformat}
> Obviously, this node is not suitable for launching container for this user's 
> other applications. We need a per user blacklist track mechanism rather than 
> per application now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started

2016-03-11 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191211#comment-15191211
 ] 

Eric Badger commented on YARN-4686:
---

Thanks for finding this failure, [~eepayne]. 

I narrowed down the issue to being a race condition between the MiniYARNCluster 
being completely started and the reservation being placed via the test. When 
the CapacityScheduler starts up, it creates a a PlanFollower (via 
startPlanFollower()). The thread created by startPlanFollower() executes the 
synchronizePlan() function in a loop. The main test code in 
TestYarnClient#testReservationAPIs is running in a different thread and calls 
submitReservation (TetsYarnClient.java:1213) once the cluster is up and 
running. The race is between the synchronizePlan thread calling 
plan.setTotalCapacity (indirectly through CapacityScheduler.java:137) and the 
submitReservation thread calling plan.getTotalCapacity (indirectly through 
ReservationInputValidator.java:148). 

The patch that I submitted before makes sure that the MiniYARNCluster won't 
return until the CapacityScheduler has registered all of the nodes, but it 
doesn't wait for the totalCapacity to be set to the correct value. Is there a 
good way to make sure that the cluster won't start until the scheduler has 
totalCapacity set to the correct value?

> MiniYARNCluster.start() returns before cluster is completely started
> 
>
> Key: YARN-4686
> URL: https://issues.apache.org/jira/browse/YARN-4686
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Rohith Sharma K S
>Assignee: Eric Badger
> Attachments: MAPREDUCE-6507.001.patch, YARN-4686.001.patch, 
> YARN-4686.002.patch, YARN-4686.003.patch
>
>
> TestRMNMInfo fails intermittently. Below is trace for the failure
> {noformat}
> testRMNMInfo(org.apache.hadoop.mapreduce.v2.TestRMNMInfo)  Time elapsed: 0.28 
> sec  <<< FAILURE!
> java.lang.AssertionError: Unexpected number of live nodes: expected:<4> but 
> was:<3>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.mapreduce.v2.TestRMNMInfo.testRMNMInfo(TestRMNMInfo.java:111)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4678) Cluster used capacity is > 100 when container reserved

2016-03-11 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191208#comment-15191208
 ] 

Sunil G commented on YARN-4678:
---

[~brahmareddy] Thanks for sharing the scenario.

We have made changes only to parent queue in current patch. We could have been 
done similar change in LeafQueue, but I had one concern. To select an under 
serving queue while processing a node heartbeat, we still consider 
usedCapacity(inclusive of reservedCapacity), so there might be difference is 
showing queue capacity on these two cases. I still feel its fine to go ahead 
with UI changes as reservation metrics are displayed in LeafQueue page.. Let me 
make change to consider this scenario. Meantime if you or [~bibinchundatt] see 
the above mentioned scenario as a pblm, pls share your thoughts.

> Cluster used capacity is > 100 when container reserved 
> ---
>
> Key: YARN-4678
> URL: https://issues.apache.org/jira/browse/YARN-4678
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Sunil G
> Attachments: 0001-YARN-4678.patch, 0002-YARN-4678.patch
>
>
>  *Scenario:* 
> * Start cluster with Three NM's each having 8GB (cluster memory:24GB).
> * Configure queues with elasticity and userlimitfactor=10.
> * disable pre-emption.
> * run two job with different priority in different queue at the same time
> ** yarn jar hadoop-mapreduce-examples-2.7.2.jar pi -Dyarn.app.priority=LOW 
> -Dmapreduce.job.queuename=QueueA -Dmapreduce.map.memory.mb=4096 
> -Dyarn.app.mapreduce.am.resource.mb=1536 
> -Dmapreduce.job.reduce.slowstart.completedmaps=1.0 10 1
> ** yarn jar hadoop-mapreduce-examples-2.7.2.jar pi -Dyarn.app.priority=HIGH 
> -Dmapreduce.job.queuename=QueueB -Dmapreduce.map.memory.mb=4096 
> -Dyarn.app.mapreduce.am.resource.mb=1536 3 1
> * observe the cluster capacity which was used in RM web UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4791) Per user blacklist node for user specific error for container launch failure.

2016-03-11 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-4791.
---
Resolution: Duplicate

Closing as duplicate.

> Per user blacklist node for user specific error for container launch failure.
> -
>
> Key: YARN-4791
> URL: https://issues.apache.org/jira/browse/YARN-4791
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications
>Reporter: Junping Du
>Assignee: Junping Du
>
> There are some user specific error for container launch failure, like:
> when enabling LinuxContainerExecutor, but some node doesn't have such user 
> exists, so container launch should get failed with following information:
> {noformat}
> 2016-02-14 15:37:03,111 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1434045496283_0036_02 State change from LAUNCHED to FAILED 
> 2016-02-14 15:37:03,111 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application 
> application_1434045496283_0036 failed 2 times due to AM Container for 
> appattempt_1434045496283_0036_02 exited with exitCode: -1000 due to: 
> Application application_1434045496283_0036 initialization failed 
> (exitCode=255) with output: User jdu not found 
> {noformat}
> Obviously, this node is not suitable for launching container for this user's 
> other applications. We need a per user blacklist track mechanism rather than 
> per application now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4791) Per user blacklist node for user specific error for container launch failure.

2016-03-11 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191110#comment-15191110
 ] 

Sunil G commented on YARN-4791:
---

[~djp] Is its same as YARN-4790? 

> Per user blacklist node for user specific error for container launch failure.
> -
>
> Key: YARN-4791
> URL: https://issues.apache.org/jira/browse/YARN-4791
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications
>Reporter: Junping Du
>Assignee: Junping Du
>
> There are some user specific error for container launch failure, like:
> when enabling LinuxContainerExecutor, but some node doesn't have such user 
> exists, so container launch should get failed with following information:
> {noformat}
> 2016-02-14 15:37:03,111 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1434045496283_0036_02 State change from LAUNCHED to FAILED 
> 2016-02-14 15:37:03,111 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application 
> application_1434045496283_0036 failed 2 times due to AM Container for 
> appattempt_1434045496283_0036_02 exited with exitCode: -1000 due to: 
> Application application_1434045496283_0036 initialization failed 
> (exitCode=255) with output: User jdu not found 
> {noformat}
> Obviously, this node is not suitable for launching container for this user's 
> other applications. We need a per user blacklist track mechanism rather than 
> per application now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4791) Per user blacklist node for user specific error for container launch failure.

2016-03-11 Thread Junping Du (JIRA)
Junping Du created YARN-4791:


 Summary: Per user blacklist node for user specific error for 
container launch failure.
 Key: YARN-4791
 URL: https://issues.apache.org/jira/browse/YARN-4791
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications
Reporter: Junping Du
Assignee: Junping Du


There are some user specific error for container launch failure, like:
when enabling LinuxContainerExecutor, but some node doesn't have such user 
exists, so container launch should get failed with following information:
{noformat}
2016-02-14 15:37:03,111 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1434045496283_0036_02 State change from LAUNCHED to FAILED 
2016-02-14 15:37:03,111 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application 
application_1434045496283_0036 failed 2 times due to AM Container for 
appattempt_1434045496283_0036_02 exited with exitCode: -1000 due to: 
Application application_1434045496283_0036 initialization failed (exitCode=255) 
with output: User jdu not found 
{noformat}
Obviously, this node is not suitable for launching container for this user's 
other applications. We need a per user blacklist track mechanism rather than 
per application now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4790) Per user blacklist node for user specific error for container launch failure.

2016-03-11 Thread Junping Du (JIRA)
Junping Du created YARN-4790:


 Summary: Per user blacklist node for user specific error for 
container launch failure.
 Key: YARN-4790
 URL: https://issues.apache.org/jira/browse/YARN-4790
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications
Reporter: Junping Du
Assignee: Junping Du


There are some user specific error for container launch failure, like:
when enabling LinuxContainerExecutor, but some node doesn't have such user 
exists, so container launch should get failed with following information:
{noformat}
2016-02-14 15:37:03,111 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1434045496283_0036_02 State change from LAUNCHED to FAILED 
2016-02-14 15:37:03,111 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application 
application_1434045496283_0036 failed 2 times due to AM Container for 
appattempt_1434045496283_0036_02 exited with exitCode: -1000 due to: 
Application application_1434045496283_0036 initialization failed (exitCode=255) 
with output: User jdu not found 
{noformat}
Obviously, this node is not suitable for launching container for this user's 
other applications. We need a per user blacklist track mechanism rather than 
per application now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-313) Add Admin API for supporting node resource configuration in command line

2016-03-11 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-313:

Release Note: 
After this patch, the feature to support NM resource dynamically configuration 
is completed, so that user can configure NM with new resource without bring NM 
down or decommissioned.
Two CLIs are provided to support update resources on individual node or a batch 
of nodes:
1. Update resource on single node: yarn rmadmin -updateNodeResource [NodeID] 
[MemSize] [vCores] 
2. Update resource on a batch of nodes: yarn rmadmin -refreshNodesResources, 
that reflect nodes' resource configuration defined in dynamic-resources.xml 
which is loaded by RM dynamically (like capacity-scheduler.xml or 
fair-scheduler.xml). 
The first version of configuration format is:

  
yarn.resource.dynamic.nodes
h1:1234
  
  
yarn.resource.dynamic.h1:1234.vcores
16
  
  
yarn.resource.dynamic.h1:1234.memory
1024
  
 

  was:
Since this patch, we are providing CLI to support NM resource dynamically 
configuration that user can configure NM with new resource without bring NM 
down or decommissioned.
Two CLIs are provided to support update resources on individual node or a batch 
of nodes:
1. Update resource on single node: yarn rmadmin -updateNodeResource [NodeID] 
[MemSize] [vCores] 
2. Update resource on a batch of nodes: yarn rmadmin -refreshNodesResources, 
that reflect nodes' resource configuration defined in dynamic-resources.xml, 
the format of configuration is get optimized in YARN-4160.


> Add Admin API for supporting node resource configuration in command line
> 
>
> Key: YARN-313
> URL: https://issues.apache.org/jira/browse/YARN-313
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client, graceful
>Reporter: Junping Du
>Assignee: Inigo Goiri
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-313-sample.patch, YARN-313-v1.patch, 
> YARN-313-v10.patch, YARN-313-v11.patch, YARN-313-v2.patch, YARN-313-v3.patch, 
> YARN-313-v4.patch, YARN-313-v5.patch, YARN-313-v6.patch, YARN-313-v7.patch, 
> YARN-313-v8.patch, YARN-313-v9.patch
>
>
> We should provide some admin interface, e.g. "yarn rmadmin -refreshResources" 
> to support changes of node's resource specified in a config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2883) Queuing of container requests in the NM

2016-03-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15190870#comment-15190870
 ] 

Hadoop QA commented on YARN-2883:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 50s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
51s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 33s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 5s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
34s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 42s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 4s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 37s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 26 new + 
500 unchanged - 4 fixed = 526 total (was 504) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 43s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 16 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 1s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 55s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 20s {color} 
| {color:red} hadoop-yarn-api in the patch failed with JDK v1.8.0_74. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 20s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK 
v1.8.0_74. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 44s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_74. 

[jira] [Commented] (YARN-4789) Provide helpful exception for non-PATH-like conflict with admin.user.env

2016-03-11 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15190773#comment-15190773
 ] 

Gera Shegalov commented on YARN-4789:
-

I see the following options to deal with it:
# silently ignore/replace the user-provided value by the one in admin.env
# inform the user that the variable is provided by the cluster admins.

001 patch for the latter

> Provide helpful exception for non-PATH-like conflict with admin.user.env
> 
>
> Key: YARN-4789
> URL: https://issues.apache.org/jira/browse/YARN-4789
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
> Attachments: YARN-4789.001.patch
>
>
> Environment variables specified in mapreduce.admin.user.env are supposed to 
> be paths (class, shell, library) and they can be merged with the 
> user-provided values. However, it's also possible that the cluster admins 
> specify some non-PATH-like variable such as JAVA_HOME. In this case if there 
> is the same variable provided by the user, we'll get a concatenation that 
> does not make sense and is difficult to debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4789) Provide helpful exception for non-PATH-like conflict with admin.user.env

2016-03-11 Thread Gera Shegalov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated YARN-4789:

Attachment: YARN-4789.001.patch

> Provide helpful exception for non-PATH-like conflict with admin.user.env
> 
>
> Key: YARN-4789
> URL: https://issues.apache.org/jira/browse/YARN-4789
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
> Attachments: YARN-4789.001.patch
>
>
> Environment variables specified in mapreduce.admin.user.env are supposed to 
> be paths (class, shell, library) and they can be merged with the 
> user-provided values. However, it's also possible that the cluster admins 
> specify some non-PATH-like variable such as JAVA_HOME. In this case if there 
> is the same variable provided by the user, we'll get a concatenation that 
> does not make sense and is difficult to debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4789) Provide helpful exception for non-PATH-like conflict with admin.user.env

2016-03-11 Thread Gera Shegalov (JIRA)
Gera Shegalov created YARN-4789:
---

 Summary: Provide helpful exception for non-PATH-like conflict with 
admin.user.env
 Key: YARN-4789
 URL: https://issues.apache.org/jira/browse/YARN-4789
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.2
Reporter: Gera Shegalov
Assignee: Gera Shegalov


Environment variables specified in mapreduce.admin.user.env are supposed to be 
paths (class, shell, library) and they can be merged with the user-provided 
values. However, it's also possible that the cluster admins specify some 
non-PATH-like variable such as JAVA_HOME. In this case if there is the same 
variable provided by the user, we'll get a concatenation that does not make 
sense and is difficult to debug.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request

2016-03-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15190704#comment-15190704
 ] 

Hadoop QA commented on YARN-4108:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 16 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
51s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
24s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 3m 5s {color} 
| {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_74
 with JDK v1.8.0_74 generated 1 new + 1 unchanged - 1 fixed = 2 total (was 2) 
{color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 22s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 56 new + 476 unchanged - 15 fixed = 532 total (was 491) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 21s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 20s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
20s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 152m 45s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_74 Failed junit tests | 
hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | 

[jira] [Updated] (YARN-2883) Queuing of container requests in the NM

2016-03-11 Thread Konstantinos Karanasos (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantinos Karanasos updated YARN-2883:
-
Attachment: YARN-2883-trunk.004.patch

Attaching patch that applies directly to current trunk.

> Queuing of container requests in the NM
> ---
>
> Key: YARN-2883
> URL: https://issues.apache.org/jira/browse/YARN-2883
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-2883-trunk.004.patch, 
> YARN-2883-yarn-2877.001.patch, YARN-2883-yarn-2877.002.patch, 
> YARN-2883-yarn-2877.003.patch, YARN-2883-yarn-2877.004.patch
>
>
> We propose to add a queue in each NM, where queueable container requests can 
> be held.
> Based on the available resources in the node and the containers in the queue, 
> the NM will decide when to allow the execution of a queued container.
> In order to ensure the instantaneous start of a guaranteed-start container, 
> the NM may decide to pre-empt/kill running queueable containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)