[jira] [Updated] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations

2015-11-11 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-4344:

Attachment: YARN-4344.001.patch

Uploaded a patch with the fix. [~zxu], [~jlowe] - can you please take a look?

> NMs reconnecting with changed capabilities can lead to wrong cluster resource 
> calculations
> --
>
> Key: YARN-4344
> URL: https://issues.apache.org/jira/browse/YARN-4344
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Critical
> Attachments: YARN-4344.001.patch
>
>
> After YARN-3802, if an NM re-connects to the RM with changed capabilities, 
> there can arise situations where the overall cluster resource calculation for 
> the cluster will be incorrect leading to inconsistencies in scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4241) Fix typo of property name in yarn-default.xml

2015-11-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000238#comment-15000238
 ] 

Hudson commented on YARN-4241:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2534 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2534/])
YARN-4241. Fix typo of property name in yarn-default.xml. Contributed by 
(aajisaka: rev 23d0db551cc63def9acbab2473e58fb1c52f85e0)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt


> Fix typo of property name in yarn-default.xml
> -
>
> Key: YARN-4241
> URL: https://issues.apache.org/jira/browse/YARN-4241
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.6.0
>Reporter: Anthony Rojas
>Assignee: Anthony Rojas
>  Labels: newbie
> Fix For: 2.8.0, 2.6.3, 2.7.3
>
> Attachments: YARN-4241.002.patch, YARN-4241.003.patch, 
> YARN-4241.branch-2.7.patch, YARN-4241.patch, YARN-4241.patch.1
>
>
> Typo in description section of yarn-default.xml, under the properties:
> yarn.nodemanager.disk-health-checker.min-healthy-disks
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage
> yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb
> yarn.nodemanager.disk-health-checker.disk-utilization-watermark-low-per-disk-percentage
> The reference to yarn-nodemanager.local-dirs should be 
> yarn.nodemanager.local-dirs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4241) Fix typo of property name in yarn-default.xml

2015-11-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000111#comment-15000111
 ] 

Hudson commented on YARN-4241:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2595 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2595/])
YARN-4241. Fix typo of property name in yarn-default.xml. Contributed by 
(aajisaka: rev 23d0db551cc63def9acbab2473e58fb1c52f85e0)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt


> Fix typo of property name in yarn-default.xml
> -
>
> Key: YARN-4241
> URL: https://issues.apache.org/jira/browse/YARN-4241
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.6.0
>Reporter: Anthony Rojas
>Assignee: Anthony Rojas
>  Labels: newbie
> Fix For: 2.8.0, 2.6.3, 2.7.3
>
> Attachments: YARN-4241.002.patch, YARN-4241.003.patch, 
> YARN-4241.branch-2.7.patch, YARN-4241.patch, YARN-4241.patch.1
>
>
> Typo in description section of yarn-default.xml, under the properties:
> yarn.nodemanager.disk-health-checker.min-healthy-disks
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage
> yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb
> yarn.nodemanager.disk-health-checker.disk-utilization-watermark-low-per-disk-percentage
> The reference to yarn-nodemanager.local-dirs should be 
> yarn.nodemanager.local-dirs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-11-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000770#comment-15000770
 ] 

Wangda Tan commented on YARN-4287:
--

Thanks [~nroberts]
Patch looks good +1, will commit in a few days if no opposite opinions.

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, 
> YARN-4287-minimal-v4.patch, YARN-4287-minimal.patch, YARN-4287-v2.patch, 
> YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4345) yarn rmadmin -updateNodeResource doesn't work

2015-11-11 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4345:
-
Target Version/s: 2.8.0  (was: 2.8.0, 2.7.3)

> yarn rmadmin -updateNodeResource doesn't work
> -
>
> Key: YARN-4345
> URL: https://issues.apache.org/jira/browse/YARN-4345
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
>
> YARN-313 add CLI to update node resource. It works fine for batch mode 
> update. However, for single node update "yarn rmadmin -updateNodeResource" 
> failed to work because resource is not set properly in sending request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Moved] (YARN-4345) yarn rmadmin -updateNodeResource doesn't work

2015-11-11 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du moved MAPREDUCE-6544 to YARN-4345:
-

Affects Version/s: (was: 2.7.1)
   2.7.1
 Target Version/s: 2.8.0, 2.7.3  (was: 2.7.3)
  Component/s: (was: resourcemanager)
   resourcemanager
  Key: YARN-4345  (was: MAPREDUCE-6544)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

> yarn rmadmin -updateNodeResource doesn't work
> -
>
> Key: YARN-4345
> URL: https://issues.apache.org/jira/browse/YARN-4345
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
>
> YARN-313 add CLI to update node resource. It works fine for batch mode 
> update. However, for single node update "yarn rmadmin -updateNodeResource" 
> failed to work because resource is not set properly in sending request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4345) yarn rmadmin -updateNodeResource doesn't work

2015-11-11 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4345:
-
Reporter: Sushmitha Sreenivasan  (was: Junping Du)

> yarn rmadmin -updateNodeResource doesn't work
> -
>
> Key: YARN-4345
> URL: https://issues.apache.org/jira/browse/YARN-4345
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Sushmitha Sreenivasan
>Assignee: Junping Du
>Priority: Critical
>
> YARN-313 add CLI to update node resource. It works fine for batch mode 
> update. However, for single node update "yarn rmadmin -updateNodeResource" 
> failed to work because resource is not set properly in sending request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4345) yarn rmadmin -updateNodeResource doesn't work

2015-11-11 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4345:
-
Issue Type: Sub-task  (was: Bug)
Parent: YARN-291

> yarn rmadmin -updateNodeResource doesn't work
> -
>
> Key: YARN-4345
> URL: https://issues.apache.org/jira/browse/YARN-4345
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Sushmitha Sreenivasan
>Assignee: Junping Du
>Priority: Critical
>
> YARN-313 add CLI to update node resource. It works fine for batch mode 
> update. However, for single node update "yarn rmadmin -updateNodeResource" 
> failed to work because resource is not set properly in sending request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4345) yarn rmadmin -updateNodeResource doesn't work

2015-11-11 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4345:
-
Attachment: YARN-4345.patch

The problem is quite straight-forward. Update a quick patch to fix it.

> yarn rmadmin -updateNodeResource doesn't work
> -
>
> Key: YARN-4345
> URL: https://issues.apache.org/jira/browse/YARN-4345
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Sushmitha Sreenivasan
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4345.patch
>
>
> YARN-313 add CLI to update node resource. It works fine for batch mode 
> update. However, for single node update "yarn rmadmin -updateNodeResource" 
> failed to work because resource is not set properly in sending request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3784) Indicate preemption timout along with the list of containers to AM (preemption message)

2015-11-11 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3784:
--
Attachment: 0004-YARN-3784.patch

Rebasing patch as patch has gone stale. [~djp], could you please take a look.

> Indicate preemption timout along with the list of containers to AM 
> (preemption message)
> ---
>
> Key: YARN-3784
> URL: https://issues.apache.org/jira/browse/YARN-3784
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-3784.patch, 0002-YARN-3784.patch, 
> 0003-YARN-3784.patch, 0004-YARN-3784.patch
>
>
> Currently during preemption, AM is notified with a list of containers which 
> are marked for preemption. Introducing a timeout duration also along with 
> this container list so that AM can know how much time it will get to do a 
> graceful shutdown to its containers (assuming one of preemption policy is 
> loaded in AM).
> This will help in decommissioning NM scenarios, where NM will be 
> decommissioned after a timeout (also killing containers on it). This timeout 
> will be helpful to indicate AM that those containers can be killed by RM 
> forcefully after the timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4132) Nodemanagers should try harder to connect to the RM

2015-11-11 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4132:
---
Attachment: YARN-4132.5.patch

Thanks [~jlowe] for review! .5 patch only has one createRMProxy which takes two 
additional inputs of retry time and retry interval. ServerRMProxy and 
ClientRMProxy pass those two inputs according to different values in conf.
Conf naming is fixed. Test is also tuned down to around 4 seconds. 

> Nodemanagers should try harder to connect to the RM
> ---
>
> Key: YARN-4132
> URL: https://issues.apache.org/jira/browse/YARN-4132
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4132.2.patch, YARN-4132.3.patch, YARN-4132.4.patch, 
> YARN-4132.5.patch, YARN-4132.patch
>
>
> Being part of the cluster, nodemanagers should try very hard (and possibly 
> never give up) to connect to a resourcemanager. Minimally we should have a 
> separate config to set how aggressively a nodemanager will connect to the RM 
> separate from what clients will do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations

2015-11-11 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000521#comment-15000521
 ] 

Jason Lowe commented on YARN-4344:
--

Thanks for the patch, Varun!  I think the change will fix the reported issue, 
but I'm a bit skeptical of the vastly different handling of the event based on 
whether apps are running or not.  For example, if the http port is changing 
when the node re-registers, why are we treating it as a node removal then 
addition if there aren't any apps running but not if there are apps running?  
Seems like that should be consistent.

Comments on the patch itself:

The comment about sending the node removal event at the start of the main block 
in the transition is no longer very accurate.
 
Please don't put large sleeps (on the order of seconds) in tests.  These extra 
sleep seconds quickly add up to a significant amount of time over many tests.  
If we need to sleep for polling reasons the sleep should be much shorter, like 
on the order of 10ms.  Better than sleep-polling is flushing the event 
dispatcher and then checking since we can avoid polling entirely.

Nit: isCapabilityChanged init can be simplified to the following, similar to 
the noRunningApps boolean init above it:
{code}
  boolean isCapabilityChanged =
  !rmNode.getTotalCapability().equals(newNode.getTotalCapability());
 {code}

Nit: is this conditional check even necessary?  We can just update the total 
capability with no semantic effect if it hasn't changed.  Since this is just 
updating a reference with another precomputed one, it's not like we're avoiding 
some expensive code. ;-)
{code}
if (isCapabilityChanged) {
  rmNode.totalCapability = newNode.getTotalCapability();
}
{code}

> NMs reconnecting with changed capabilities can lead to wrong cluster resource 
> calculations
> --
>
> Key: YARN-4344
> URL: https://issues.apache.org/jira/browse/YARN-4344
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Critical
> Attachments: YARN-4344.001.patch
>
>
> After YARN-3802, if an NM re-connects to the RM with changed capabilities, 
> there can arise situations where the overall cluster resource calculation for 
> the cluster will be incorrect leading to inconsistencies in scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3223) Resource update during NM graceful decommission

2015-11-11 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000598#comment-15000598
 ] 

Junping Du commented on YARN-3223:
--

Thanks [~brookz] for updating the patch! I just finish my review, here is 
comments:
1. RMNodeImpl.java,
{code}
+
+  /**
+   * Set total resource for the node.
+   * @param totalResource {@link Resource} New total resource
+   */
+  void setTotalResource(Resource totalResource);
{code}
We should use RMNodeResourceUpdateEvent to update resource instead of 
manipulating resource on RMNode directly. Actually, a very similar interface 
(only name different) was added in YARN-311 but get removed in YARN-312 because 
RMNode is supposed to be a read-only interface, and all internal state changes 
are triggered by event instead of directly call. Please refer a similar 
comments here: 
(https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087)

2. AbstractYarnScheduler.java
{code}
+  /**
+   * Update the cluster resource based on a particular node's changes.
+   */
+  private synchronized void updateClusterResources(RMNode nm,
+  SchedulerNode node, Resource oldResource, Resource newResource) {
+// Notify NodeLabelsManager about this change
+rmContext.getNodeLabelManager().updateNodeResource(nm.getNodeID(),
+newResource);
+
+// Log resource change
+LOG.info("Update resource on node: " + node.getNodeName()
++ " from: " + oldResource + ", to: "
++ newResource);
+
+nodes.remove(nm.getNodeID());
+updateMaximumAllocation(node, false);
+
+// if node is decommissioning, preserve original total resource
+if (nm.getState() == NodeState.DECOMMISSIONING){
+  node.setOriginalTotalResource(oldResource);
+}
+// update resource to node
+node.setTotalResource(newResource);
+
+// Update RMNode
+nm.setTotalResource(newResource);
+
+nodes.put(nm.getNodeID(), (N)node);
+updateMaximumAllocation(node, true);
+
+// update resource to clusterResource
+Resources.subtractFrom(clusterResource, oldResource);
+Resources.addTo(clusterResource, newResource);
+  }
{code}
Instead of set new resource directly on RMNode and SchedulerNode, we should add 
a new transition (can combine with UpdateNodeResourceWhenRunningTransition) for 
RMNodeImpl to handle RESOURCE_UPDATE for node in decommissioning stage. It will 
handle scheduler node resource update and cluster resource update there (also 
some other logic we could be missing here). For recover the previous capacity 
after recommission, I would prefer we keep original resource in RMNode side 
instead of SchedulerNode as the value more belongs to RMNode itself and the 
state transition from running to decommissioning will take a snapshot of 
resource at that moment. The original resource can be updated if node resource 
update trigger from CLI but not when some running container get finished.

BTW, it would be nice if we have tests to cover key scenarios below: 
1. After a NM with running containers is put into decommissioning stage, it is 
total capacity can be updated with its used capacity, but original capacity is 
kept.
2. The total capacity get updated when some running container get finished.
3. After recommissioning, the total capacity of NM can back to original 
resource.

> Resource update during NM graceful decommission
> ---
>
> Key: YARN-3223
> URL: https://issues.apache.org/jira/browse/YARN-3223
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Junping Du
>Assignee: Brook Zhou
> Attachments: YARN-3223-v0.patch, YARN-3223-v1.patch, 
> YARN-3223-v2.patch
>
>
> During NM graceful decommission, we should handle resource update properly, 
> include: make RMNode keep track of old resource for possible rollback, keep 
> available resource to 0 and used resource get updated when
> container finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4241) Fix typo of property name in yarn-default.xml

2015-11-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000133#comment-15000133
 ] 

Hudson commented on YARN-4241:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #667 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/667/])
YARN-4241. Fix typo of property name in yarn-default.xml. Contributed by 
(aajisaka: rev 23d0db551cc63def9acbab2473e58fb1c52f85e0)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt


> Fix typo of property name in yarn-default.xml
> -
>
> Key: YARN-4241
> URL: https://issues.apache.org/jira/browse/YARN-4241
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.6.0
>Reporter: Anthony Rojas
>Assignee: Anthony Rojas
>  Labels: newbie
> Fix For: 2.8.0, 2.6.3, 2.7.3
>
> Attachments: YARN-4241.002.patch, YARN-4241.003.patch, 
> YARN-4241.branch-2.7.patch, YARN-4241.patch, YARN-4241.patch.1
>
>
> Typo in description section of yarn-default.xml, under the properties:
> yarn.nodemanager.disk-health-checker.min-healthy-disks
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage
> yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb
> yarn.nodemanager.disk-health-checker.disk-utilization-watermark-low-per-disk-percentage
> The reference to yarn-nodemanager.local-dirs should be 
> yarn.nodemanager.local-dirs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4241) Fix typo of property name in yarn-default.xml

2015-11-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000197#comment-15000197
 ] 

Hudson commented on YARN-4241:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #655 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/655/])
YARN-4241. Fix typo of property name in yarn-default.xml. Contributed by 
(aajisaka: rev 23d0db551cc63def9acbab2473e58fb1c52f85e0)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


> Fix typo of property name in yarn-default.xml
> -
>
> Key: YARN-4241
> URL: https://issues.apache.org/jira/browse/YARN-4241
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.6.0
>Reporter: Anthony Rojas
>Assignee: Anthony Rojas
>  Labels: newbie
> Fix For: 2.8.0, 2.6.3, 2.7.3
>
> Attachments: YARN-4241.002.patch, YARN-4241.003.patch, 
> YARN-4241.branch-2.7.patch, YARN-4241.patch, YARN-4241.patch.1
>
>
> Typo in description section of yarn-default.xml, under the properties:
> yarn.nodemanager.disk-health-checker.min-healthy-disks
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage
> yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb
> yarn.nodemanager.disk-health-checker.disk-utilization-watermark-low-per-disk-percentage
> The reference to yarn-nodemanager.local-dirs should be 
> yarn.nodemanager.local-dirs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations

2015-11-11 Thread Varun Vasudev (JIRA)
Varun Vasudev created YARN-4344:
---

 Summary: NMs reconnecting with changed capabilities can lead to 
wrong cluster resource calculations
 Key: YARN-4344
 URL: https://issues.apache.org/jira/browse/YARN-4344
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.2, 2.7.1
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Critical


After YARN-3802, if an NM re-connects to the RM with changed capabilities, 
there can arise situations where the overall cluster resource calculation for 
the cluster will be incorrect leading to inconsistencies in scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations

2015-11-11 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000229#comment-15000229
 ] 

Varun Vasudev commented on YARN-4344:
-

An example of a situation is shown below -
{code}
2015-11-09 10:43:51,784 INFO  resourcemanager.ResourceTrackerService 
(ResourceTrackerService.java:registerNodeManager(345)) - NodeManager from node 
10.0.0.64(cmPort: 30050 httpPort: 30060) registered with capability: 
, assigned nodeId 10.0.0.64:30050
2015-11-09 10:43:51,786 INFO  rmnode.RMNodeImpl (RMNodeImpl.java:handle(434)) - 
10.0.0.64:30050 Node Transitioned from NEW to RUNNING
2015-11-09 10:43:51,814 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:addNode(1193)) - Added node 10.0.0.64:30050 
clusterResource: 
2015-11-09 10:44:37,878 INFO  util.RackResolver 
(RackResolver.java:coreResolve(109)) - Resolved 10.0.0.63 to /default-rack
2015-11-09 10:44:37,879 INFO  resourcemanager.ResourceTrackerService 
(ResourceTrackerService.java:registerNodeManager(345)) - NodeManager from node 
10.0.0.63(cmPort: 30050 httpPort: 30060) registered with capability: 
, assigned nodeId 10.0.0.63:30050
2015-11-09 10:44:37,879 INFO  rmnode.RMNodeImpl (RMNodeImpl.java:handle(434)) - 
10.0.0.63:30050 Node Transitioned from NEW to RUNNING
2015-11-09 10:44:37,882 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:addNode(1193)) - Added node 10.0.0.63:30050 
clusterResource: 
2015-11-09 10:44:39,307 INFO  util.RackResolver 
(RackResolver.java:coreResolve(109)) - Resolved 10.0.0.64 to /default-rack
2015-11-09 10:44:39,309 INFO  resourcemanager.ResourceTrackerService 
(ResourceTrackerService.java:registerNodeManager(313)) - Reconnect from the 
node at: 10.0.0.64
2015-11-09 10:44:39,312 INFO  resourcemanager.ResourceTrackerService 
(ResourceTrackerService.java:registerNodeManager(345)) - NodeManager from node 
10.0.0.64(cmPort: 30050 httpPort: 30060) registered with capability: 
, assigned nodeId 10.0.0.64:30050
2015-11-09 10:44:39,314 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:removeNode(1247)) - Removed node 10.0.0.64:30050 
clusterResource: 
2015-11-09 10:44:39,315 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:addNode(1193)) - Added node 10.0.0.64:30050 
clusterResource: 
{code}

In this case - NM's from 10.0.0.64 and 10.0.0.63 registered leading to a total 
cluster resource of clusterResource: . After that 
10.0.0.64 re-connected with changed capabilities(from  
to ). This should have led to the cluster resources 
becoming  but instead it is calculated to be 
.

The root cause is this piece of code from RMNodeImpl -
{code}
rmNode.context.getDispatcher().getEventHandler().handle(
  new NodeRemovedSchedulerEvent(rmNode));

if (!rmNode.getTotalCapability().equals(
 newNode.getTotalCapability())) {
   rmNode.totalCapability = newNode.getTotalCapability();
{code}

If the dispatcher is delayed in its processing of the event, by the time the 
remove node is processed,  rmNode.totalCapability = 
newNode.getTotalCapability() has already been executed and the resources that 
are removed are the changed capabilities and not the older capabilities of the 
node.



> NMs reconnecting with changed capabilities can lead to wrong cluster resource 
> calculations
> --
>
> Key: YARN-4344
> URL: https://issues.apache.org/jira/browse/YARN-4344
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Critical
>
> After YARN-3802, if an NM re-connects to the RM with changed capabilities, 
> there can arise situations where the overall cluster resource calculation for 
> the cluster will be incorrect leading to inconsistencies in scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations

2015-11-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000361#comment-15000361
 ] 

Hadoop QA commented on YARN-4344:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 6s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
6s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 49s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_60. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 1s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_79. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 132m 2s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_60 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
| JDK v1.7.0_79 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.7.1 Server=1.7.1 
Image:test-patch-base-hadoop-date2015-11-11 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12771722/YARN-4344.001.patch |
| JIRA Issue | YARN-4344 |
| Optional Tests |  asflicense  javac  javadoc  mvninstall  unit  findbugs  
checkstyle  compile  |
| uname | Linux a9687b820f5f 3.13.0-36-lowlatency #63-Ubuntu 

[jira] [Commented] (YARN-4241) Fix typo of property name in yarn-default.xml

2015-11-11 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000432#comment-15000432
 ] 

Tsuyoshi Ozawa commented on YARN-4241:
--

Thanks Anthony for your contribution and thanks Akira for your committing.

> Fix typo of property name in yarn-default.xml
> -
>
> Key: YARN-4241
> URL: https://issues.apache.org/jira/browse/YARN-4241
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.6.0
>Reporter: Anthony Rojas
>Assignee: Anthony Rojas
>  Labels: newbie
> Fix For: 2.8.0, 2.6.3, 2.7.3
>
> Attachments: YARN-4241.002.patch, YARN-4241.003.patch, 
> YARN-4241.branch-2.7.patch, YARN-4241.patch, YARN-4241.patch.1
>
>
> Typo in description section of yarn-default.xml, under the properties:
> yarn.nodemanager.disk-health-checker.min-healthy-disks
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage
> yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb
> yarn.nodemanager.disk-health-checker.disk-utilization-watermark-low-per-disk-percentage
> The reference to yarn-nodemanager.local-dirs should be 
> yarn.nodemanager.local-dirs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

2015-11-11 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4304:
--
Attachment: 0001-YARN-4304.patch

Uploading  an initial version of patch.

- For cluster metrics, I still followed the approach of available+allocated to 
get total resource of queue/cluster. So whenever a resource update happens, I 
think we need to recalculate the {{availableMB}} (cores) based on all 
partitions in that queue. This patch is now as per this idea.
- In Scheduler page, all AM resource related fix is done. I will attach a 
screen shot soon.

Kindly help to check the patch.

> AM max resource configuration per partition to be displayed/updated correctly 
> in UI and in various partition related metrics
> 
>
> Key: YARN-4304
> URL: https://issues.apache.org/jira/browse/YARN-4304
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4304.patch
>
>
> As we are supporting per-partition level max AM resource percentage 
> configuration, UI and various metrics also need to display correct 
> configurations related to same. 
> For eg: Current UI still shows am-resource percentage per queue level. This 
> is to be updated correctly when label config is used.
> - Display max-am-percentage per-partition in Scheduler UI (label also) and in 
> ClusterMetrics page
> - Update queue/partition related metrics w.r.t per-partition 
> am-resource-percentage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout

2015-11-11 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4348:
-
Affects Version/s: 2.6.2

> ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of 
> zkSessionTimeout
> 
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
> Attachments: log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4347) Resource manager fails with Null pointer exception

2015-11-11 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001467#comment-15001467
 ] 

Jian He commented on YARN-4347:
---

Thanks for reviewing, I fixed the space, 
I think it's clear enough without the assertion msg, as code comments are 
there. 

> Resource manager fails with Null pointer exception
> --
>
> Key: YARN-4347
> URL: https://issues.apache.org/jira/browse/YARN-4347
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Yesha Vora
>Assignee: Jian He
> Attachments: YARN-4347.1.patch, YARN-4347.2.patch
>
>
> Resource manager fails with NPE while trying to load or recover a finished 
> application. 
> {code}
> 2015-11-11 17:53:22,351 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(597)) - Failed to load/recover state
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:746)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1155)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1037)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1001)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:839)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:854)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:844)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:719)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:313)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:411)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1219)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:593)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1026)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1067)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1063)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4350) TestDistributedShell fails

2015-11-11 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-4350:
-

 Summary: TestDistributedShell fails
 Key: YARN-4350
 URL: https://issues.apache.org/jira/browse/YARN-4350
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee


Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. 
There seem to be 2 distinct issues.

(1) testDSShellWithoutDomainV2* tests fail sporadically
These test fail more often than not if tested by themselves:
{noformat}
testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
  Time elapsed: 30.998 sec  <<< FAILURE!
java.lang.AssertionError: Application created event should be published atleast 
once expected:<1> but was:<0>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207)
{noformat}

They start happening after YARN-4129. I suspect this might have to do with some 
timing issue.

(2) the whole test times out
If you run the whole TestDistributedShell test, it times out without fail. This 
may or may not have to do with the port change introduced by YARN-2859 (just a 
hunch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Moved] (YARN-4349) Support CallerContext in YARN

2015-11-11 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan moved HDFS-9418 to YARN-4349:


Key: YARN-4349  (was: HDFS-9418)
Project: Hadoop YARN  (was: Hadoop HDFS)

> Support CallerContext in YARN
> -
>
> Key: YARN-4349
> URL: https://issues.apache.org/jira/browse/YARN-4349
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>
> More details about CallerContext please refer to description of HDFS-9184.
> From YARN's perspective, we should make following changes:
> - RMAuditLogger logs application's caller context when application submit by 
> user
> - Add caller context to application's data in ATS along with application 
> creation event
> Protocol and RPC changes are done in HDFS-9184.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-11-11 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001455#comment-15001455
 ] 

Naganarasimha G R commented on YARN-4234:
-

Hi [~jlowe] & [~gtCarrera], 
AFAIK, YARN-4183 wanted to address lil more than that what is captured by 
[~gtCarrera] (related to whether client should fail when timelineserver is not 
reachable).
But what i dont understand(or missing), as per the current code (ATSv1) is,  
only if the *"yarn.timeline-service.enabled"* is enabled only then the 
delegation tokens are created automatically in the YARNClientImpl. So not sure 
why [~jeagles] was pointing out all apps were trying to get delegation tokens 
in YARN-4183. I understand  SystemMetric Publisher (Server) need not check for 
this configuration but not sure why client should depend on some server side 
configuration?
Additionally we already have client side configuration 
*"yarn.timeline-service.client.best-effort"*  (realized it now) if when 
configured will not fail the {{YARNClient.SubmitApplication}} if fail to get 
the delegation tokens.

Hope My understanding about the problem is correct, If correct then as part of 
YARN-4183 we need to just remove the check for 
*"yarn.timeline-service.enabled"* being used in SystemMetricsPublisher, Please 
correct me if i am wrong.

And coming to this patch(support for 1.5) i can envisage it as follows:
* Server will be configured with *"TIMELINE_SERVICE_VERSION"* based on which 
appropriate timeline handler is selected
* Client apps who ever want to communicate with Timeline server will enable  
*"yarn.timeline-service.enabled"* 
* If security and *"yarn.timeline-service.enabled"*  are enabled then 
delegation token is got in the yarn client as earlier
* when the timeline client is intitialized it contacts server for the version 
and related configs and once it receives it initializes itself.
* If user tries to use invalid methods (not appropriate to the server timeline 
version) then timelineclient throws exception

Thoughts ?

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234-2015.2.patch, YARN-4234.1.patch, 
> YARN-4234.2.patch, YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, 
> YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token

2015-11-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001581#comment-15001581
 ] 

Hudson commented on YARN-4183:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #598 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/598/])
YARN-4183. Reverting the patch to fix behaviour change. Revert (vinodkv: rev 
6351d3fa638f1d901279cef9e56dc4e07ef3de11)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java


> Enabling generic application history forces every job to get a timeline 
> service delegation token
> 
>
> Key: YARN-4183
> URL: https://issues.apache.org/jira/browse/YARN-4183
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-4183.1.patch
>
>
> When enabling just the Generic History Server and not the timeline server, 
> the system metrics publisher will not publish the events to the timeline 
> store as it checks if the timeline server and system metrics publisher are 
> enabled before creating a timeline client.
> To make it work, if the timeline service flag is turned on, it will force 
> every yarn application to get a delegation token.
> Instead of checking if timeline service is enabled, we should be checking if 
> application history server is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-11-11 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001621#comment-15001621
 ] 

Naganarasimha G R commented on YARN-4234:
-

Thanks [~gtCarrera] it would be helpful if you can also verify 
{{yarn.timeline-service.enabled}} is a  client only config as explained in the 
documentation. From My side apart from RM's SystemMetricPublisher using it, i 
dont see it anyother code in the server using it. 
And any thoughts about the approach for getting the Version and related conf 
from the server side ?
Also i have commented the same in YARN-4183.

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234-2015.2.patch, YARN-4234.1.patch, 
> YARN-4234.2.patch, YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, 
> YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4349) Support CallerContext in YARN

2015-11-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001695#comment-15001695
 ] 

Hadoop QA commented on YARN-4349:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 6s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 10 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 51s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 33s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
3s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
59s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 2s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 18s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 44s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 44s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 44s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 33s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 33s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 2s 
{color} | {color:red} Patch generated 10 new checkstyle issues in root (total 
was 489, now 492). {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
59s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 24 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 28s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 introduced 1 new FindBugs issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 22s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 0s {color} | 
{color:red} hadoop-common in the patch failed with JDK v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK 
v1.8.0_60. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 31s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 56s 
{color} | {color:green} hadoop-mapreduce-client-app in the patch passed with 
JDK v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 48s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_79. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 27s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK 
v1.7.0_79. {color} |
| {color:red}-1{color} | {color:red} unit {color} | 

[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token

2015-11-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001416#comment-15001416
 ] 

Hudson commented on YARN-4183:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2536 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2536/])
YARN-4183. Reverting the patch to fix behaviour change. Revert (vinodkv: rev 
6351d3fa638f1d901279cef9e56dc4e07ef3de11)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java


> Enabling generic application history forces every job to get a timeline 
> service delegation token
> 
>
> Key: YARN-4183
> URL: https://issues.apache.org/jira/browse/YARN-4183
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-4183.1.patch
>
>
> When enabling just the Generic History Server and not the timeline server, 
> the system metrics publisher will not publish the events to the timeline 
> store as it checks if the timeline server and system metrics publisher are 
> enabled before creating a timeline client.
> To make it work, if the timeline service flag is turned on, it will force 
> every yarn application to get a delegation token.
> Instead of checking if timeline service is enabled, we should be checking if 
> application history server is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4347) Resource manager fails with Null pointer exception

2015-11-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001454#comment-15001454
 ] 

Hadoop QA commented on YARN-4347:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
24s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
33s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s 
{color} | {color:red} Patch generated 3 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 237, now 239). {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
37s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 23s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 41s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_79. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
27s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 151m 5s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_79 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.7.0 Server=1.7.0 
Image:test-patch-base-hadoop-date2015-11-11 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12771854/YARN-4347.1.patch |
| JIRA Issue | YARN-4347 |
| Optional Tests |  asflicense  javac  javadoc  mvninstall  unit  findbugs  
checkstyle  compile  |
| uname | Linux bfd0aa1707b4 

[jira] [Commented] (YARN-2480) DockerContainerExecutor must support user namespaces

2015-11-11 Thread Abin Shahab (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001707#comment-15001707
 ] 

Abin Shahab commented on YARN-2480:
---

Thanks, yes, it's interesting, but it demands contiguous id space for all
tasks/docker containers. We are wondering how to distribute the ids among
the tasks(do all tasks get the same range? Do all of them get separate
ranges? Do all tasks belonging to the same job get the same range?)

On Wed, Nov 11, 2015 at 6:52 PM, Erik Weathers (JIRA) 



> DockerContainerExecutor must support user namespaces
> 
>
> Key: YARN-2480
> URL: https://issues.apache.org/jira/browse/YARN-2480
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Abin Shahab
>  Labels: security
>
> When DockerContainerExector launches a container, the root inside that 
> container has root privileges on the host. 
> This is insecure in a mult-tenant environment. The uid of the container's 
> root user must be mapped to a non-privileged user on the host.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout

2015-11-11 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4348:
-
Attachment: YARN-4348.001.patch

The issue looks similar to YARN-3753. One workaround is to change the timeout 
for the sync to zkResyncWaitTime as [~jianhe] changed on YARN-3753. Attaching a 
patch for this.

If the timeout be increased, the probability of the case will be decreased, but 
it can still happen. e.g. ZK's server packet for the reply against sync is 
dropped after the operation itself success.



> ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of 
> zkSessionTimeout
> 
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
> Attachments: YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4349) Support CallerContext in YARN

2015-11-11 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4349:
-
Description: 
More details about CallerContext please refer to description of HDFS-9184.

>From YARN's perspective, we should make following changes:
- RMAuditLogger logs application's caller context when application submit by 
user
- Add caller context to application's data in ATS along with application 
creation event

>From MR's perspective:
- Set AppMaster container's context to YARN's application Id
- Set Mapper/Reducer containers' context to task attempt id

Protocol and RPC changes are done in HDFS-9184.

  was:
More details about CallerContext please refer to description of HDFS-9184.

>From YARN's perspective, we should make following changes:
- RMAuditLogger logs application's caller context when application submit by 
user
- Add caller context to application's data in ATS along with application 
creation event

Protocol and RPC changes are done in HDFS-9184.


> Support CallerContext in YARN
> -
>
> Key: YARN-4349
> URL: https://issues.apache.org/jira/browse/YARN-4349
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4349.1.patch
>
>
> More details about CallerContext please refer to description of HDFS-9184.
> From YARN's perspective, we should make following changes:
> - RMAuditLogger logs application's caller context when application submit by 
> user
> - Add caller context to application's data in ATS along with application 
> creation event
> From MR's perspective:
> - Set AppMaster container's context to YARN's application Id
> - Set Mapper/Reducer containers' context to task attempt id
> Protocol and RPC changes are done in HDFS-9184.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4347) Resource manager fails with Null pointer exception

2015-11-11 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-4347:
--
Attachment: YARN-4347.2.patch

> Resource manager fails with Null pointer exception
> --
>
> Key: YARN-4347
> URL: https://issues.apache.org/jira/browse/YARN-4347
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Yesha Vora
>Assignee: Jian He
> Attachments: YARN-4347.1.patch, YARN-4347.2.patch
>
>
> Resource manager fails with NPE while trying to load or recover a finished 
> application. 
> {code}
> 2015-11-11 17:53:22,351 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(597)) - Failed to load/recover state
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:746)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1155)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1037)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1001)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:839)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:854)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:844)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:719)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:313)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:411)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1219)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:593)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1026)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1067)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1063)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token

2015-11-11 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001607#comment-15001607
 ] 

Naganarasimha G R commented on YARN-4183:
-

Hi All,
Lately observed there is already a existing configuration 
{{"yarn.timeline-service.client.best-effort"}} which ensures even YARNClient 
doesnt throw exception even if delegation token fetching fails in 
{{YARNClient.SubmitApplication}}. So i feel there is sufficient guard in the 
client side when to create Timelineclient and fetch the delegation token and if 
fail what action to be taken, 
So as part of this jira, as mentioned by [~vinodkv] i think we need to just 
remove the check for {{yarn.timeline-service.enable}} being used in 
{{SystemMetricsPublisher}} as its a client side configuration (correct me if i 
am wrong). And as part of YARN-4234 we are anyway concentrating on how to 
support for multiple versions in TimelineClient. right ?

> Enabling generic application history forces every job to get a timeline 
> service delegation token
> 
>
> Key: YARN-4183
> URL: https://issues.apache.org/jira/browse/YARN-4183
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-4183.1.patch
>
>
> When enabling just the Generic History Server and not the timeline server, 
> the system metrics publisher will not publish the events to the timeline 
> store as it checks if the timeline server and system metrics publisher are 
> enabled before creating a timeline client.
> To make it work, if the timeline service flag is turned on, it will force 
> every yarn application to get a delegation token.
> Instead of checking if timeline service is enabled, we should be checking if 
> application history server is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-11-11 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-3769:
-
Attachment: YARN-3769.005.patch

[~leftnoteasy], Attaching YARN-3769.005.patch with the changes we discussed.

I have another question that may be an enhancement:
In {{LeafQueue#getTotalPendingResourcesConsideringUserLimit}}, the calculation 
of headroom is as follows in this patch:
{code}
Resource headroom = Resources.subtract(
computeUserLimit(app, resources, user, partition,
SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY),
user.getUsed(partition));
{code}
Would it be more efficient to just do the following?
{code}
 Resource headroom =
Resources.subtract(user.getUserResourceLimit(), user.getUsed());
{code}

> Preemption occurring unnecessarily because preemption doesn't consider user 
> limit
> -
>
> Key: YARN-3769
> URL: https://issues.apache.org/jira/browse/YARN-3769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: YARN-3769-branch-2.002.patch, 
> YARN-3769-branch-2.7.002.patch, YARN-3769-branch-2.7.003.patch, 
> YARN-3769.001.branch-2.7.patch, YARN-3769.001.branch-2.8.patch, 
> YARN-3769.003.patch, YARN-3769.004.patch, YARN-3769.005.patch
>
>
> We are seeing the preemption monitor preempting containers from queue A and 
> then seeing the capacity scheduler giving them immediately back to queue A. 
> This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4347) Resource manager fails with Null pointer exception

2015-11-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001666#comment-15001666
 ] 

Hadoop QA commented on YARN-4347:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 7s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
52s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
36s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
34s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s 
{color} | {color:red} Patch generated 3 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 237, now 239). {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
40s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 6s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_60. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 17s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_79. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
28s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 145m 32s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_60 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_79 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.7.1 Server=1.7.1 
Image:test-patch-base-hadoop-date2015-11-12 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12771892/YARN-4347.2.patch |
| JIRA Issue | YARN-4347 |
| Optional Tests |  asflicense  javac  javadoc  mvninstall  unit  findbugs  
checkstyle  compile  |
| uname | Linux 0ae13b980ea7 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 

[jira] [Commented] (YARN-4051) ContainerKillEvent is lost when container is In New State and is recovering

2015-11-11 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001725#comment-15001725
 ] 

sandflee commented on YARN-4051:


thanks [~jlowe]

Should the value be infinite by default? The concern is that if one container 
has issues recovering (due to log aggregation woes or whatever) then we risk 
expiring all of the containers on this node if we don't re-register with the RM 
within the node expiry interval. I think it makes sense if we have also fixed 
the recovery paths so there aren't potentially long-running procedures (like 
contacting HDFS) during the recovery process. If we haven't then we could 
create as many problems as we're solving by waiting forever.
-- aggree ! I also concern this.

Why does the patch change the check interval? If it's to reduce the logging 
then we can better fix that by only logging when the status changes rather than 
every iteration.
---yes, it's to reduce the log, since recovery is almost very fast, change it 
back

 Nit: A value of zero should also be treated as a disabled max time.
--  zero is to register to register to rm at once whether nm complete recover 
or  not,yes?

Nit: "Max time to wait NM to complete container recover before register to RM " 
should be "Max time NM will wait to complete container recovery before 
registering with the RM".
-- corrected



> ContainerKillEvent is lost when container is  In New State and is recovering
> 
>
> Key: YARN-4051
> URL: https://issues.apache.org/jira/browse/YARN-4051
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: sandflee
>Assignee: sandflee
>Priority: Critical
> Attachments: YARN-4051.01.patch, YARN-4051.02.patch, 
> YARN-4051.03.patch, YARN-4051.04.patch
>
>
> As in YARN-4050, NM event dispatcher is blocked, and container is in New 
> state, when we finish application, the container still alive even after NM 
> event dispatcher is unblocked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4051) ContainerKillEvent is lost when container is In New State and is recovering

2015-11-11 Thread sandflee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sandflee updated YARN-4051:
---
Attachment: YARN-4051.05.patch

set default timeout to 2min, since default nm expire timeout is 10min 

> ContainerKillEvent is lost when container is  In New State and is recovering
> 
>
> Key: YARN-4051
> URL: https://issues.apache.org/jira/browse/YARN-4051
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: sandflee
>Assignee: sandflee
>Priority: Critical
> Attachments: YARN-4051.01.patch, YARN-4051.02.patch, 
> YARN-4051.03.patch, YARN-4051.04.patch, YARN-4051.05.patch
>
>
> As in YARN-4050, NM event dispatcher is blocked, and container is in New 
> state, when we finish application, the container still alive even after NM 
> event dispatcher is unblocked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader

2015-11-11 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3862:
--
Attachment: YARN-3862-feature-YARN-2928.wip.03.patch

Testing the new feature branch name (feature-YARN-2928).

> Decide which contents to retrieve and send back in response in TimelineReader
> -
>
> Key: YARN-3862
> URL: https://issues.apache.org/jira/browse/YARN-3862
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-3862-YARN-2928.wip.01.patch, 
> YARN-3862-YARN-2928.wip.02.patch, YARN-3862-YARN-2928.wip.03.patch, 
> YARN-3862-feature-YARN-2928.wip.03.patch
>
>
> Currently, we will retrieve all the contents of the field if that field is 
> specified in the query API. In case of configs and metrics, this can become a 
> lot of data even though the user doesn't need it. So we need to provide a way 
> to query only a set of configs or metrics.
> As a comma spearated list of configs/metrics to be returned will be quite 
> cumbersome to specify, we have to support either of the following options :
> # Prefix match
> # Regex
> # Group the configs/metrics and query that group.
> We also need a facility to specify a metric time window to return metrics in 
> a that window. This may be useful in plotting graphs 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader

2015-11-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001470#comment-15001470
 ] 

Hadoop QA commented on YARN-3862:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
56s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
23s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
47s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 27s 
{color} | {color:red} hadoop-yarn-server-timelineservice in feature-YARN-2928 
failed with JDK v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s 
{color} | {color:red} Patch generated 43 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice
 (total was 102, now 128). {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
55s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 24s 
{color} | {color:red} hadoop-yarn-server-timelineservice in the patch failed 
with JDK v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 55s 
{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed 
with JDK v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 22s 
{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed 
with JDK v1.7.0_79. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
32s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 27m 49s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.7.1 Server=1.7.1 
Image:test-patch-base-hadoop-date2015-11-12 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12771888/YARN-3862-feature-YARN-2928.wip.03.patch
 |
| JIRA Issue | YARN-3862 |
| Optional Tests |  asflicense  javac  javadoc  mvninstall  unit  findbugs  
checkstyle  compile  |
| uname | Linux 530c24bb2676 

[jira] [Updated] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout

2015-11-11 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4348:
-
Description: 
The current internal ZK configuration of ZKRMStateStore can cause a following 
situation:

1. syncInternal timeouts, 
2. but sync succeeded later on.

We should use 

  was:
The current internal ZK configuration of ZKRMStateStore can cause a following 
situation:

1. syncInternal timeouts, 
2. but sync succeeded later on.

{quote}
2015-11-11 11:54:05,728 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:runWithRetries(1241)) - Failed to sync with ZK new 
connection.  -<--- sync failed
2015-11-11 11:54:05,728 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:runWithRetries(1244)) - Maxed out ZK retries. Giving up!
2015-11-11 11:54:05,728 ERROR recovery.RMStateStore 
(RMStateStore.java:transition(292)) - Error updating appAttempt: 
appattempt_1447242474882_0002_01
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for 
/rmstore/ZKRMStateRoot/RMAppRoot/application_1447242474882_0002/appattempt_1447242474882_0002_01
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
  at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
  at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1073)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$9.run(ZKRMStateStore.java:1082)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$9.run(ZKRMStateStore.java:1079)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1164)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1197)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:1079)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:716)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:286)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:269)
  at 
org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
  at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
  at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
  at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:1006)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1075)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1070)
  at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
  at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
  at java.lang.Thread.run(Thread.java:745)
2015-11-11 11:54:05,729 ERROR recovery.RMStateStore 
(RMStateStore.java:notifyStoreOperationFailed(1027)) - State store operation 
failed
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for 
/rmstore/ZKRMStateRoot/RMAppRoot/application_1447242474882_0002/appattempt_1447242474882_0002_01
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
  at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
  at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1073)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$9.run(ZKRMStateStore.java:1082)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$9.run(ZKRMStateStore.java:1079)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1164)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1197)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:1079)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:716)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:286)
  at 

[jira] [Commented] (YARN-4349) Support CallerContext in YARN

2015-11-11 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001472#comment-15001472
 ] 

Mingliang Liu commented on YARN-4349:
-

Thanks for working on this, [~leftnoteasy].

Minor comment:
{code}
+  CallerContext context =
+  new CallerContext(new CallerContext.Builder(pbContext.getContext())
+  .setSignature(pbContext.getSignature().toByteArray()));
{code}
I think this code can be simplified as:
{code}
+  CallerContext context =
+  new CallerContext.Builder(pbContext.getContext())
+  .setSignature(pbContext.getSignature().toByteArray()).build();
{code}

Is it a good idea to make the constructor 
{{CallerContext#CallerContext(Builder)}} private?

> Support CallerContext in YARN
> -
>
> Key: YARN-4349
> URL: https://issues.apache.org/jira/browse/YARN-4349
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4349.1.patch
>
>
> More details about CallerContext please refer to description of HDFS-9184.
> From YARN's perspective, we should make following changes:
> - RMAuditLogger logs application's caller context when application submit by 
> user
> - Add caller context to application's data in ATS along with application 
> creation event
> From MR's perspective:
> - Set AppMaster container's context to YARN's application Id
> - Set Mapper/Reducer containers' context to task attempt id
> Protocol and RPC changes are done in HDFS-9184.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4350) TestDistributedShell fails

2015-11-11 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001491#comment-15001491
 ] 

Naganarasimha G R commented on YARN-4350:
-

Thanks [~sjlee0] for pointing this issue out, will take a look at it !.

> TestDistributedShell fails
> --
>
> Key: YARN-4350
> URL: https://issues.apache.org/jira/browse/YARN-4350
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
>
> Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. 
> There seem to be 2 distinct issues.
> (1) testDSShellWithoutDomainV2* tests fail sporadically
> These test fail more often than not if tested by themselves:
> {noformat}
> testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 30.998 sec  <<< FAILURE!
> java.lang.AssertionError: Application created event should be published 
> atleast once expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207)
> {noformat}
> They start happening after YARN-4129. I suspect this might have to do with 
> some timing issue.
> (2) the whole test times out
> If you run the whole TestDistributedShell test, it times out without fail. 
> This may or may not have to do with the port change introduced by YARN-2859 
> (just a hunch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4350) TestDistributedShell fails

2015-11-11 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R reassigned YARN-4350:
---

Assignee: Naganarasimha G R

> TestDistributedShell fails
> --
>
> Key: YARN-4350
> URL: https://issues.apache.org/jira/browse/YARN-4350
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
>
> Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. 
> There seem to be 2 distinct issues.
> (1) testDSShellWithoutDomainV2* tests fail sporadically
> These test fail more often than not if tested by themselves:
> {noformat}
> testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 30.998 sec  <<< FAILURE!
> java.lang.AssertionError: Application created event should be published 
> atleast once expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207)
> {noformat}
> They start happening after YARN-4129. I suspect this might have to do with 
> some timing issue.
> (2) the whole test times out
> If you run the whole TestDistributedShell test, it times out without fail. 
> This may or may not have to do with the port change introduced by YARN-2859 
> (just a hunch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4327) RM can not renew TIMELINE_DELEGATION_TOKEN in secure clusters

2015-11-11 Thread Cheng-Hsin Cho (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001654#comment-15001654
 ] 

Cheng-Hsin Cho commented on YARN-4327:
--

Did you try using yarn.timeline-service.http-authentication.type=kerberos?
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_yarn_resource_mgt/content/ref-c2f35f55-fa15-4154-b80a-36df2db297d5.1.html

> RM can not renew  TIMELINE_DELEGATION_TOKEN in secure clusters
> --
>
> Key: YARN-4327
> URL: https://issues.apache.org/jira/browse/YARN-4327
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.7.1
> Environment: hadoop 2.7.1hdfs,yarn, mrhistoryserver, ATS all use 
> kerberos security.
> conf like this:
> 
>   hadoop.security.authorization
>   true
>   Is service-level authorization enabled?
> 
> 
>   hadoop.security.authentication
>   kerberos
>   Possible values are simple (no authentication), and kerberos
>   
> 
>Reporter: zhangshilong
>
> bin hadoop 2.7.1
> ATS conf like this: 
> 
> yarn.timeline-service.http-authentication.type
> simple
> 
> 
> yarn.timeline-service.http-authentication.kerberos.principal
> HTTP/_h...@xxx.com
> 
> 
> yarn.timeline-service.http-authentication.kerberos.keytab
> /etc/hadoop/keytabs/xxx.keytab
> 
> 
> yarn.timeline-service.principal
> xxx/_h...@xxx.com
> 
> 
> yarn.timeline-service.keytab
> /etc/hadoop/keytabs/xxx.keytab
> 
> 
> yarn.timeline-service.best-effort
> true
> 
> 
> yarn.timeline-service.enabled
> true
>   
>  
> I'd like to allow everyone to access ATS from HTTP as RM,HDFS.
> client can submit job to RM and  add TIMELINE_DELEGATION_TOKEN  to AM 
> Context, but RM can not renew  TIMELINE_DELEGATION_TOKEN and make application 
> to failure.
> RM logs:
> 2015-11-03 11:58:38,191 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN, 
> Service: 10.12.38.4:8188, Ident: (owner=yarn-test, renewer=yarn-test, 
> realUser=, issueDate=1446523118046, maxDate=1447127918046, sequenceNumber=9, 
> masterKeyId=2)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:439)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:847)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:828)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: HTTP status [500], message [Null user]
> at 
> org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:169)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:287)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.renewDelegationToken(DelegationTokenAuthenticator.java:212)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.renewDelegationToken(DelegationTokenAuthenticatedURL.java:414)
> at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$3.run(TimelineClientImpl.java:396)
> at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$3.run(TimelineClientImpl.java:378)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$5.run(TimelineClientImpl.java:451)
> at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:183)
> at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:466)
> at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.renewDelegationToken(TimelineClientImpl.java:400)
> at 
> 

[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-11-11 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001476#comment-15001476
 ] 

Li Lu commented on YARN-4234:
-

Hi [~Naganarasimha], I'm not extremely familiar with the delegation token stuff 
in ATS v1, but probably you'd like to mention your comments here in YARN-4183? 

Also, I'd like to double check if {{yarn.timeline-service.enabled}} is a client 
only config? 

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234-2015.2.patch, YARN-4234.1.patch, 
> YARN-4234.2.patch, YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, 
> YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4184) Remove update reservation state api from state store as its not used by ReservationSystem

2015-11-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001625#comment-15001625
 ] 

Hadoop QA commented on YARN-4184:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 6s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
5s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 7s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 42s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_60. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 3s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_79. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
22s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 129m 1s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_60 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
| JDK v1.7.0_79 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.7.1 Server=1.7.1 
Image:test-patch-base-hadoop-date2015-11-12 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12771140/YARN-4184.v1.patch |
| JIRA Issue | YARN-4184 |
| Optional Tests |  asflicense  javac  javadoc  mvninstall  unit  findbugs  
checkstyle  compile  |
| uname | Linux f52476d30bd3 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-11-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001722#comment-15001722
 ] 

Hadoop QA commented on YARN-3769:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 47s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_60. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 35s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_79. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
26s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 142m 58s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_60 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart |
|   | hadoop.yarn.server.resourcemanager.TestRM |
| JDK v1.7.0_79 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.7.1 Server=1.7.1 
Image:test-patch-base-hadoop-date2015-11-12 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12771905/YARN-3769.005.patch |
| JIRA Issue | YARN-3769 |
| Optional Tests |  asflicense  javac  javadoc  mvninstall  unit  findbugs  
checkstyle  compile  |
| uname | Linux dbfe7410cae9 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 

[jira] [Commented] (YARN-4050) NM event dispatcher may blocked by LogAggregationService if NameNode is slow

2015-11-11 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001734#comment-15001734
 ] 

sandflee commented on YARN-4050:


There may be 2 problems:
1,  nm dispatcher maybe blocked by logaggregation service, should we move 
logaggregation event to a new event dispatcher?
2,  nm recovery is blocked, and there're some bad effect as in YARN-4051

> NM event dispatcher may blocked by LogAggregationService if NameNode is slow
> 
>
> Key: YARN-4050
> URL: https://issues.apache.org/jira/browse/YARN-4050
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>
> env:  nm restart and log aggregation is enabled. 
> NN is almost dead, when we restart NM, NM event dispatcher is blocked until 
> NN returns to normal.It seems. NM recovered app  and send APPLICATION_START 
> event to log aggregation service, it will check log dir permission in 
> HDFS(BLOCKED) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout

2015-11-11 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4348:
-
Description: 
The current internal ZK configuration of ZKRMStateStore can cause a following 
situation:

1. syncInternal timeouts, 
2. but sync succeeded later on.

We should use zkResyncWaitTime as the timeout value.

  was:
The current internal ZK configuration of ZKRMStateStore can cause a following 
situation:

1. syncInternal timeouts, 
2. but sync succeeded later on.

We should use 


> ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of 
> zkSessionTimeout
> 
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>
> The current internal ZK configuration of ZKRMStateStore can cause a following 
> situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout

2015-11-11 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4348:
-
Attachment: log.txt

Attaching a log file.

> ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of 
> zkSessionTimeout
> 
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
> Attachments: log.txt
>
>
> The current internal ZK configuration of ZKRMStateStore can cause a following 
> situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout

2015-11-11 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4348:
-
Description: 
Jian mentioned that the current internal ZK configuration of ZKRMStateStore can 
cause a following situation:

1. syncInternal timeouts, 
2. but sync succeeded later on.

We should use zkResyncWaitTime as the timeout value.

  was:
The current internal ZK configuration of ZKRMStateStore can cause a following 
situation:

1. syncInternal timeouts, 
2. but sync succeeded later on.

We should use zkResyncWaitTime as the timeout value.


> ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of 
> zkSessionTimeout
> 
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
> Attachments: log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4349) Support CallerContext in YARN

2015-11-11 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4349:
-
Attachment: YARN-4349.1.patch

Attached initial patch for review. Included all changes in description.

> Support CallerContext in YARN
> -
>
> Key: YARN-4349
> URL: https://issues.apache.org/jira/browse/YARN-4349
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4349.1.patch
>
>
> More details about CallerContext please refer to description of HDFS-9184.
> From YARN's perspective, we should make following changes:
> - RMAuditLogger logs application's caller context when application submit by 
> user
> - Add caller context to application's data in ATS along with application 
> creation event
> Protocol and RPC changes are done in HDFS-9184.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader

2015-11-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001484#comment-15001484
 ] 

Hadoop QA commented on YARN-3862:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
5s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
50s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 27s 
{color} | {color:red} hadoop-yarn-server-timelineservice in feature-YARN-2928 
failed with JDK v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s 
{color} | {color:red} Patch generated 43 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice
 (total was 102, now 128). {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
20s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 1s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 1s 
{color} | {color:red} The patch has 1 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
55s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 24s 
{color} | {color:red} hadoop-yarn-server-timelineservice in the patch failed 
with JDK v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 49s 
{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed 
with JDK v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 25s 
{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed 
with JDK v1.7.0_79. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
33s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 28m 5s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.7.1 Server=1.7.1 
Image:test-patch-base-hadoop-date2015-11-12 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12771888/YARN-3862-feature-YARN-2928.wip.03.patch
 |
| JIRA Issue | YARN-3862 |
| Optional Tests |  asflicense  javac  javadoc  mvninstall  unit  findbugs  
checkstyle  compile  |
| uname | Linux 54f9ecd19690 

[jira] [Commented] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations

2015-11-11 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001800#comment-15001800
 ] 

zhihai xu commented on YARN-4344:
-

Thanks for reporting this issue [~vvasudev]! Thanks for the review [~Jason 
Lowe]! 
[~rohithsharma] tried to clean up the code at YARN-3286. Based on the following 
comment from [~jianhe] at YARN-3286,
{code}
I think this has changed the behavior that without any RM/NM restart features 
enabled, earlier restarting a node will trigger RM to kill all the containers 
on this node, but now it won't ?
{code}
The patch may cause compatibility issue. Maybe we can merge the case 
{{rmNode.getHttpPort() == newNode.getHttpPort()}} with {{rmNode.getHttpPort() 
!= newNode.getHttpPort()}} for noRunningApps.
Thoughts?

> NMs reconnecting with changed capabilities can lead to wrong cluster resource 
> calculations
> --
>
> Key: YARN-4344
> URL: https://issues.apache.org/jira/browse/YARN-4344
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Critical
> Attachments: YARN-4344.001.patch
>
>
> After YARN-3802, if an NM re-connects to the RM with changed capabilities, 
> there can arise situations where the overall cluster resource calculation for 
> the cluster will be incorrect leading to inconsistencies in scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4051) ContainerKillEvent is lost when container is In New State and is recovering

2015-11-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001783#comment-15001783
 ] 

Hadoop QA commented on YARN-4051:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 6s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 21s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
39s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 19s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in 
trunk has 3 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 59s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 27s 
{color} | {color:red} Patch generated 1 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 265, now 265). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
39s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 11s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_60. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 4s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 0s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_79. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 8s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_79. 

[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-11-11 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001334#comment-15001334
 ] 

Jason Lowe commented on YARN-4234:
--

One issue of having the client need to contact the server is it adds a 
dependency that ATS v1.5 is explicitly trying to remove.  Today we are running 
jobs that work just fine whether the ATS is up or not.  If the client needs to 
fetch anything from the ATS then jobs stop flowing as soon as the ATS is down.  
Since ATS v1 and v1.5 are not HA that's undesirable.  It puts the ATS on the 
critical path for jobs flowing through the cluster.

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, 
> YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4347) Resource manager fails with Null pointer exception

2015-11-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001359#comment-15001359
 ] 

Wangda Tan commented on YARN-4347:
--

+1, pending Jenkins.

> Resource manager fails with Null pointer exception
> --
>
> Key: YARN-4347
> URL: https://issues.apache.org/jira/browse/YARN-4347
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Yesha Vora
>Assignee: Jian He
> Attachments: YARN-4347.1.patch
>
>
> Resource manager fails with NPE while trying to load or recover a finished 
> application. 
> {code}
> 2015-11-11 17:53:22,351 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(597)) - Failed to load/recover state
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:746)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1155)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1037)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1001)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:839)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:854)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:844)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:719)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:313)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:411)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1219)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:593)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1026)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1067)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1063)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4345) yarn rmadmin -updateNodeResource doesn't work

2015-11-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001364#comment-15001364
 ] 

Hadoop QA commented on YARN-4345:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
9s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
37s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
9s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
48s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 33s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_60. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 36s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_79. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
28s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 109m 11s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_60 Failed junit tests | hadoop.yarn.client.TestGetGroups |
| JDK v1.8.0_60 Timed out junit tests | 
org.apache.hadoop.yarn.client.api.impl.TestYarnClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestNMClient |
| JDK v1.7.0_79 Failed junit tests | hadoop.yarn.client.TestGetGroups |
| JDK v1.7.0_79 Timed out junit tests | 
org.apache.hadoop.yarn.client.api.impl.TestYarnClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestNMClient |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.7.1 Server=1.7.1 
Image:test-patch-base-hadoop-date2015-11-11 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12771848/YARN-4345-v2.patch |
| JIRA Issue | YARN-4345 |
| Optional Tests |  asflicense  javac  javadoc  mvninstall  unit  findbugs  

[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-11-11 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001380#comment-15001380
 ] 

Li Lu commented on YARN-4234:
-

I think here are some miscommunications, please feel free to correct me if I'm 
wrong. Here are actually two problems we're discussing:
- For ATS v1.5 entity write patch, the client should not contact the server. In 
this way we're offloading the traffic from the centralized server, which became 
a significant problem of ATS v1. 
- For ATS compatibility issue raised in YARN-4183, we need to come up a 
mechanism to let the clients and server coordinate. Specifically, the client 
may want to know the server's ATS version or where exactly it needs to write 
the timeline entities. Client side configurations may not be consistent with 
the server in this case. So, maybe we want some mechanisms to coordinate on 
this?

Not sure if I got the points right but to me those two points are independent? 

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, 
> YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4347) Resource manager fails with Null pointer exception

2015-11-11 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001381#comment-15001381
 ] 

Daniel Templeton commented on YARN-4347:


A couple of quibbles on the patch:

{code}
 RMApp rmApp =appAttempt.rmContext.getRMApps().get(
{code}

needs a space on the other side of the equals.

Also, your asserts in the test class should have a failure message a the first 
parameter.

> Resource manager fails with Null pointer exception
> --
>
> Key: YARN-4347
> URL: https://issues.apache.org/jira/browse/YARN-4347
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Yesha Vora
>Assignee: Jian He
> Attachments: YARN-4347.1.patch
>
>
> Resource manager fails with NPE while trying to load or recover a finished 
> application. 
> {code}
> 2015-11-11 17:53:22,351 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(597)) - Failed to load/recover state
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:746)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1155)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1037)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1001)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:839)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:854)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:844)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:719)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:313)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:411)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1219)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:593)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1026)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1067)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1063)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout

2015-11-11 Thread Tsuyoshi Ozawa (JIRA)
Tsuyoshi Ozawa created YARN-4348:


 Summary: ZKRMStateStore.syncInternal should wait for 
zkResyncWaitTime instead of zkSessionTimeout
 Key: YARN-4348
 URL: https://issues.apache.org/jira/browse/YARN-4348
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.2
Reporter: Tsuyoshi Ozawa
Assignee: Tsuyoshi Ozawa


The current internal ZK configuration of ZKRMStateStore can cause a following 
situation:

1. syncInternal timeouts, 
2. but sync succeeded later on.

{quote}
2015-11-11 11:54:05,728 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:runWithRetries(1241)) - Failed to sync with ZK new 
connection.  -<--- sync failed
2015-11-11 11:54:05,728 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:runWithRetries(1244)) - Maxed out ZK retries. Giving up!
2015-11-11 11:54:05,728 ERROR recovery.RMStateStore 
(RMStateStore.java:transition(292)) - Error updating appAttempt: 
appattempt_1447242474882_0002_01
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for 
/rmstore/ZKRMStateRoot/RMAppRoot/application_1447242474882_0002/appattempt_1447242474882_0002_01
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
  at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
  at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1073)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$9.run(ZKRMStateStore.java:1082)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$9.run(ZKRMStateStore.java:1079)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1164)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1197)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:1079)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:716)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:286)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:269)
  at 
org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
  at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
  at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
  at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:1006)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1075)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1070)
  at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
  at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
  at java.lang.Thread.run(Thread.java:745)
2015-11-11 11:54:05,729 ERROR recovery.RMStateStore 
(RMStateStore.java:notifyStoreOperationFailed(1027)) - State store operation 
failed
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for 
/rmstore/ZKRMStateRoot/RMAppRoot/application_1447242474882_0002/appattempt_1447242474882_0002_01
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
  at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
  at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1073)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$9.run(ZKRMStateStore.java:1082)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$9.run(ZKRMStateStore.java:1079)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1164)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1197)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:1079)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:716)
  at 

[jira] [Updated] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token

2015-11-11 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-4183:
--
Fix Version/s: (was: 2.7.2)
   (was: 2.8.0)
   (was: 3.0.0)

bq. All of this needs more work, so unless I hear strongly otherwise I am going 
to revert this patch in the interest of 2.7.2's progress.
Seeing no No's, reverted this for the sake of 2.7.2.

> Enabling generic application history forces every job to get a timeline 
> service delegation token
> 
>
> Key: YARN-4183
> URL: https://issues.apache.org/jira/browse/YARN-4183
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-4183.1.patch
>
>
> When enabling just the Generic History Server and not the timeline server, 
> the system metrics publisher will not publish the events to the timeline 
> store as it checks if the timeline server and system metrics publisher are 
> enabled before creating a timeline client.
> To make it work, if the timeline service flag is turned on, it will force 
> every yarn application to get a delegation token.
> Instead of checking if timeline service is enabled, we should be checking if 
> application history server is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token

2015-11-11 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-4183:
--
Target Version/s: 2.7.3  (was: 2.7.2)

> Enabling generic application history forces every job to get a timeline 
> service delegation token
> 
>
> Key: YARN-4183
> URL: https://issues.apache.org/jira/browse/YARN-4183
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-4183.1.patch
>
>
> When enabling just the Generic History Server and not the timeline server, 
> the system metrics publisher will not publish the events to the timeline 
> store as it checks if the timeline server and system metrics publisher are 
> enabled before creating a timeline client.
> To make it work, if the timeline service flag is turned on, it will force 
> every yarn application to get a delegation token.
> Instead of checking if timeline service is enabled, we should be checking if 
> application history server is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4346) Test committer.commitJob() behavior during committing when MR AM get failed.

2015-11-11 Thread Junping Du (JIRA)
Junping Du created YARN-4346:


 Summary: Test committer.commitJob() behavior during committing 
when MR AM get failed.
 Key: YARN-4346
 URL: https://issues.apache.org/jira/browse/YARN-4346
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Junping Du


In MAPREDUCE-5485, we are adding additional API (isCommitJobRepeatable) to 
allow job commit can tolerate AM failure in some cases (like 
FileOutputCommitter in v2 algorithm). Although we have unit test to cover most 
of flows, we may want a completed end to end test to verify the whole work flow.
The scenario include:
1. For FileOutputCommitter (or some sub class), emulate a MR AM failure or 
restart during commitJob() in progress
2. Check different behavior for v1 and v2 (support isCommitJobRepeatable() or 
not)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4032) Corrupted state from a previous version can still cause RM to fail with NPE due to same reasons as YARN-2834

2015-11-11 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He reassigned YARN-4032:
-

Assignee: Jian He

> Corrupted state from a previous version can still cause RM to fail with NPE 
> due to same reasons as YARN-2834
> 
>
> Key: YARN-4032
> URL: https://issues.apache.org/jira/browse/YARN-4032
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Anubhav Dhoot
>Assignee: Jian He
>Priority: Critical
> Attachments: YARN-4032.prelim.patch
>
>
> YARN-2834 ensures in 2.6.0 there will not be any inconsistent state. But if 
> someone is upgrading from a previous version, the state can still be 
> inconsistent and then RM will still fail with NPE after upgrade to 2.6.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-11-11 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000962#comment-15000962
 ] 

Xuan Gong commented on YARN-4234:
-

Attached a new patch with the renamed configurations.

bq. CacheId -> TimelineEntityGroupId ? And similarly rename it everywhere. 

DONE

bq. Also move this to org.apache.hadoop.yarn.api.records.timeline.

DONE

bq. Move CacheIdProto from yarn_protos.proto also accordingly

I can not find a better place. Just keep it in yarn_protos.proto.

bq. Reorder the API parameters

DONE.

bq. entity-file-fd.flush-interval-secs -> 
entity-file-store.fd-flush-interval-secs

DONE

bq. Similarly entity-file-fd.clean-interval-sec and entity-file-fd.retain-secs

DONE

bq. Why do we need TIMELINE_SERVICE_PLUGIN_ENABLED especially if we also have 
this TimelineEntityGroupId/CacheId as part of the writer API? 

REMOVED

bq. TimelineClientImpl is doing two many things. Let's have a 
DirectTimelineWriter vs HDFSTimelineWriter which can encapsulate functionality.

Will do it later.

Also, I add a new configuration: TIMELINE_SERVICE_VERSION. So, if we want to 
use ats 1.5, we need set 1.5 for this configuration. And when we try to use new 
api for ats1.5, a sanity check would be enforced.

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, 
> YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-11-11 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4234:

Attachment: YARN-4234.2015.1.patch

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, 
> YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token

2015-11-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001009#comment-15001009
 ] 

Hudson commented on YARN-4183:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #670 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/670/])
YARN-4183. Reverting the patch to fix behaviour change. Revert (vinodkv: rev 
6351d3fa638f1d901279cef9e56dc4e07ef3de11)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java


> Enabling generic application history forces every job to get a timeline 
> service delegation token
> 
>
> Key: YARN-4183
> URL: https://issues.apache.org/jira/browse/YARN-4183
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-4183.1.patch
>
>
> When enabling just the Generic History Server and not the timeline server, 
> the system metrics publisher will not publish the events to the timeline 
> store as it checks if the timeline server and system metrics publisher are 
> enabled before creating a timeline client.
> To make it work, if the timeline service flag is turned on, it will force 
> every yarn application to get a delegation token.
> Instead of checking if timeline service is enabled, we should be checking if 
> application history server is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token

2015-11-11 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000857#comment-15000857
 ] 

Li Lu commented on YARN-4183:
-

Discussed with [~xgong] offline, and seems like YARN-4234 is blocked on the 
proposed {{yarn.timeline-service.version}} config. If this is the case maybe we 
can add the config in YARN-4234 and have it reviewed quickly? 

> Enabling generic application history forces every job to get a timeline 
> service delegation token
> 
>
> Key: YARN-4183
> URL: https://issues.apache.org/jira/browse/YARN-4183
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Fix For: 3.0.0, 2.8.0, 2.7.2
>
> Attachments: YARN-4183.1.patch
>
>
> When enabling just the Generic History Server and not the timeline server, 
> the system metrics publisher will not publish the events to the timeline 
> store as it checks if the timeline server and system metrics publisher are 
> enabled before creating a timeline client.
> To make it work, if the timeline service flag is turned on, it will force 
> every yarn application to get a delegation token.
> Instead of checking if timeline service is enabled, we should be checking if 
> application history server is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4345) yarn rmadmin -updateNodeResource doesn't work

2015-11-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001006#comment-15001006
 ] 

Hadoop QA commented on YARN-4345:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 5s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 1m 39s 
{color} | {color:red} root in trunk failed. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 11s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 14s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
8s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 11s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 11s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
8s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
39s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 36s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_60. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 50m 39s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_79. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
27s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 106m 58s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_60 Failed junit tests | hadoop.yarn.client.TestGetGroups |
| JDK v1.8.0_60 Timed out junit tests | 
org.apache.hadoop.yarn.client.api.impl.TestAMRMClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestYarnClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestNMClient |
| JDK v1.7.0_79 Failed junit tests | hadoop.yarn.client.TestGetGroups |
| JDK v1.7.0_79 Timed out junit tests | 
org.apache.hadoop.yarn.client.api.impl.TestAMRMClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestYarnClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestNMClient |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.7.1 Server=1.7.1 
Image:test-patch-base-hadoop-date2015-11-11 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12771808/YARN-4345.patch |
| JIRA Issue | YARN-4345 |
| Optional Tests |  asflicense  javac  javadoc  mvninstall  unit  findbugs  

[jira] [Updated] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-11-11 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4234:

Attachment: YARN-4234-2015.2.patch

Removed TimelineEntityGroupIdProto and TimelineEntityGroupIdPBImpl

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234-2015.2.patch, YARN-4234.1.patch, 
> YARN-4234.2.patch, YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, 
> YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2480) DockerContainerExecutor must support user namespaces

2015-11-11 Thread Erik Weathers (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001610#comment-15001610
 ] 

Erik Weathers commented on YARN-2480:
-

Experimental support for user-namespaces in Docker has landed in version 1.9:
* http://integratedcode.us/2015/10/13/user-namespaces-have-arrived-in-docker/
* https://github.com/docker/docker/pull/12648

It's available via experimental builds (not in the normal 1.9 build), see this 
link for info on getting those builds:
* https://github.com/docker/docker/tree/master/experimental 

Documentation on the feature:
* 
https://github.com/docker/docker/blob/3b5fac462d21ca164b3778647420016315289034/experimental/userns.md

> DockerContainerExecutor must support user namespaces
> 
>
> Key: YARN-2480
> URL: https://issues.apache.org/jira/browse/YARN-2480
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Abin Shahab
>  Labels: security
>
> When DockerContainerExector launches a container, the root inside that 
> container has root privileges on the host. 
> This is insecure in a mult-tenant environment. The uid of the container's 
> root user must be mapped to a non-privileged user on the host.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token

2015-11-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001061#comment-15001061
 ] 

Hudson commented on YARN-4183:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1394 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1394/])
YARN-4183. Reverting the patch to fix behaviour change. Revert (vinodkv: rev 
6351d3fa638f1d901279cef9e56dc4e07ef3de11)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java


> Enabling generic application history forces every job to get a timeline 
> service delegation token
> 
>
> Key: YARN-4183
> URL: https://issues.apache.org/jira/browse/YARN-4183
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-4183.1.patch
>
>
> When enabling just the Generic History Server and not the timeline server, 
> the system metrics publisher will not publish the events to the timeline 
> store as it checks if the timeline server and system metrics publisher are 
> enabled before creating a timeline client.
> To make it work, if the timeline service flag is turned on, it will force 
> every yarn application to get a delegation token.
> Instead of checking if timeline service is enabled, we should be checking if 
> application history server is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers

2015-11-11 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001134#comment-15001134
 ] 

Jian He commented on YARN-1509:
---

lgtm, thanks Meng !

> Make AMRMClient support send increase container request and get 
> increased/decreased containers
> --
>
> Key: YARN-1509
> URL: https://issues.apache.org/jira/browse/YARN-1509
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan (No longer used)
>Assignee: MENG DING
> Attachments: YARN-1509.1.patch, YARN-1509.10.patch, 
> YARN-1509.2.patch, YARN-1509.3.patch, YARN-1509.4.patch, YARN-1509.5.patch, 
> YARN-1509.6.patch, YARN-1509.7.patch, YARN-1509.8.patch, YARN-1509.9.patch
>
>
> As described in YARN-1197, we need add API in AMRMClient to support
> 1) Add increase request
> 2) Can get successfully increased/decreased containers from RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token

2015-11-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000893#comment-15000893
 ] 

Hudson commented on YARN-4183:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8793 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8793/])
YARN-4183. Reverting the patch to fix behaviour change. Revert (vinodkv: rev 
6351d3fa638f1d901279cef9e56dc4e07ef3de11)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* hadoop-yarn-project/CHANGES.txt


> Enabling generic application history forces every job to get a timeline 
> service delegation token
> 
>
> Key: YARN-4183
> URL: https://issues.apache.org/jira/browse/YARN-4183
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-4183.1.patch
>
>
> When enabling just the Generic History Server and not the timeline server, 
> the system metrics publisher will not publish the events to the timeline 
> store as it checks if the timeline server and system metrics publisher are 
> enabled before creating a timeline client.
> To make it work, if the timeline service flag is turned on, it will force 
> every yarn application to get a delegation token.
> Instead of checking if timeline service is enabled, we should be checking if 
> application history server is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token

2015-11-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001047#comment-15001047
 ] 

Hudson commented on YARN-4183:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2598 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2598/])
YARN-4183. Reverting the patch to fix behaviour change. Revert (vinodkv: rev 
6351d3fa638f1d901279cef9e56dc4e07ef3de11)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java


> Enabling generic application history forces every job to get a timeline 
> service delegation token
> 
>
> Key: YARN-4183
> URL: https://issues.apache.org/jira/browse/YARN-4183
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-4183.1.patch
>
>
> When enabling just the Generic History Server and not the timeline server, 
> the system metrics publisher will not publish the events to the timeline 
> store as it checks if the timeline server and system metrics publisher are 
> enabled before creating a timeline client.
> To make it work, if the timeline service flag is turned on, it will force 
> every yarn application to get a delegation token.
> Instead of checking if timeline service is enabled, we should be checking if 
> application history server is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4132) Nodemanagers should try harder to connect to the RM

2015-11-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001064#comment-15001064
 ] 

Hadoop QA commented on YARN-4132:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
9s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
50s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 15s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in 
trunk has 3 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 1s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 15s 
{color} | {color:red} hadoop-yarn-server-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 20s 
{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 29s 
{color} | {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 29s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 31s 
{color} | {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_85. 
{color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 31s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_85. {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 26s 
{color} | {color:red} Patch generated 1 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 221, now 220). {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
49s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 16s 
{color} | {color:red} hadoop-yarn-server-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 18s 
{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 42s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 0s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 50s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 14s {color} 
| {color:red} hadoop-yarn-server-common in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 20s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s 

[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-11-11 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001108#comment-15001108
 ] 

Li Lu commented on YARN-4234:
-

Hi [~Naganarasimha], IIUC the TIMELINE_SERVICE_VERSION config is describing the 
server's version. The client API implementation will perform sanity check to 
avoid calling a wrong (lower) version of API to the server. This is orthogonal 
to assuming to a particular version of ATS. The application can handle the 
logic by itself to decide which correct API to call. Am I missing anything 
here? Thanks. 

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, 
> YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers

2015-11-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001141#comment-15001141
 ] 

Wangda Tan commented on YARN-1509:
--

+1 to latest patch, will commit it tomorrow if there's no opposite opinions.

> Make AMRMClient support send increase container request and get 
> increased/decreased containers
> --
>
> Key: YARN-1509
> URL: https://issues.apache.org/jira/browse/YARN-1509
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan (No longer used)
>Assignee: MENG DING
> Attachments: YARN-1509.1.patch, YARN-1509.10.patch, 
> YARN-1509.2.patch, YARN-1509.3.patch, YARN-1509.4.patch, YARN-1509.5.patch, 
> YARN-1509.6.patch, YARN-1509.7.patch, YARN-1509.8.patch, YARN-1509.9.patch
>
>
> As described in YARN-1197, we need add API in AMRMClient to support
> 1) Add increase request
> 2) Can get successfully increased/decreased containers from RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3784) Indicate preemption timout along with the list of containers to AM (preemption message)

2015-11-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001145#comment-15001145
 ] 

Hadoop QA commented on YARN-3784:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
54s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 32s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 3s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
22s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 37s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in 
trunk has 3 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 46s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 16s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 29s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 5m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 14s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 5m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 14s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 12s 
{color} | {color:red} Patch generated 5 new checkstyle issues in root (total 
was 392, now 389). {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 53s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
introduced 1 new FindBugs issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 47s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 21s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 30s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_60. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 26s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_60. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 30s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_60. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 51s {color} 
| {color:red} hadoop-mapreduce-client-app in the patch failed with JDK 
v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 4s 
{color} | {color:green} hadoop-sls in the patch passed with JDK v1.8.0_60. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 28s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_79. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 23s 

[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-11-11 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001093#comment-15001093
 ] 

Naganarasimha G R commented on YARN-4234:
-

Hi [~xgong] & [~gtCarrera],
Before we go ahead with the supporting clients mentioning of ATS version, 
it would be better to have a look at YARN-3488, which talks about issues of 
each application having client side configuration of which timeline version to 
use. 
Basically it would be better if server side validations happen, if client side 
then possibilities are more that app will assume to use a particular version of 
the ATS but the server can be having different timeline server running.




> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, 
> YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching

2015-11-11 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001126#comment-15001126
 ] 

Jason Lowe commented on YARN-4265:
--

Thanks for the patch, [~gtCarrera9]!  This looks like most of the patch is a 
copy of the entity timeline store from YARN-3942 with a few edits, so I'm sorta 
reviewing my own code here.  As such I did a diff of the patch from this JIRA 
and the one from YARN-3942 so I could focus on what's changed.  I'll defer to 
others to review the parts that are identical to YARN-3942.  Eventually I can 
see this being a superset of YARN-3942, since it can cache to memory and either 
cache everything or a subset based on what the plugins decide.

TIMELINE_SERVICE_PLUGIN_ENABLED and DEFAULT_TIMELINE_SERVICE_PLUGIN_ENABLED are 
not needed.

Is there a reason we're not using the Configuration.getInstances method or the 
ReflectionUtils methods to handle plugin loading?

If no plugins are configured (which is the default behavior), do we want a 
fallback plugin that emulates what YARN-3942 is doing?

Are there plans to support the leveldb store as an alternative to the memory 
store for the detail timeline store?  There was concern that a single dag could 
overwhelm the server, and storing it to leveldb instead of the memory store 
would be one way to try to mitigate that.  I'm wondering if the class to use 
for the detail log timeline store should be configurable with 
MemoryTimelineStore as the default.  Could also do this as a followup JIRA if 
necessary.

Should add entries to yarn-default.xml for the new properties?

Do we want to log at the info level that a path is being skipped during the 
scan?  The store can end up scanning fairly often in practice, so this could 
end up logging a lot for just one path per scan.  I'm wondering if making it a 
debug log is more appropriate.

Comment in getDoneAppPath mentions a cache ID but it's using an app ID.

Nit: Indentation is off at the start of the YarnConfiguration patch hunk.

> Provide new timeline plugin storage to support fine-grained entity caching
> --
>
> Key: YARN-4265
> URL: https://issues.apache.org/jira/browse/YARN-4265
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4265-trunk.poc_001.patch
>
>
> To support the newly proposed APIs in YARN-4234, we need to create a new 
> plugin timeline store. The store may have similar behavior as the 
> EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id 
> granularity, instead of application id granularity. Let's have this storage 
> as a standalone one, instead of updating EntityFileTimelineStore, to keep the 
> existing store (EntityFileTimelineStore) stable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching

2015-11-11 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001155#comment-15001155
 ] 

Li Lu commented on YARN-4265:
-

Thanks [~jlowe]! Yes the current POC version is heavily based on YARN-3942. I'm 
doing some refactoring right now but the general logic should be pretty similar 
to your logic (thank for the work! ). I'll address your comments in the next 
version of the patch. 

bq. If no plugins are configured (which is the default behavior), do we want a 
fallback plugin that emulates what YARN-3942 is doing?
That's a nice suggestion. We can build a plugin that simply return the appid to 
simulate this. 

bq. Are there plans to support the leveldb store as an alternative to the 
memory store for the detail timeline store? There was concern that a single dag 
could overwhelm the server, and storing it to leveldb instead of the memory 
store would be one way to try to mitigate that. I'm wondering if the class to 
use for the detail log timeline store should be configurable with 
MemoryTimelineStore as the default. Could also do this as a followup JIRA if 
necessary.
Will do. 

bq. Should add entries to yarn-default.xml for the new properties?
Yes. Will add them. 

bq. Do we want to log at the info level that a path is being skipped during the 
scan? The store can end up scanning fairly often in practice, so this could end 
up logging a lot for just one path per scan. I'm wondering if making it a debug 
log is more appropriate.
Sure. 

> Provide new timeline plugin storage to support fine-grained entity caching
> --
>
> Key: YARN-4265
> URL: https://issues.apache.org/jira/browse/YARN-4265
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4265-trunk.poc_001.patch
>
>
> To support the newly proposed APIs in YARN-4234, we need to create a new 
> plugin timeline store. The store may have similar behavior as the 
> EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id 
> granularity, instead of application id granularity. Let's have this storage 
> as a standalone one, instead of updating EntityFileTimelineStore, to keep the 
> existing store (EntityFileTimelineStore) stable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-11-11 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001171#comment-15001171
 ] 

Naganarasimha G R commented on YARN-4234:
-

Hi [~gtCarrera], IIUC YARN-3488 is talking about if this server side 
configuration is configured differently in the client then though the user 
calls the right API's based on the client side ATS version, it will not be 
successfull in the server. 
So was wondering while starting of the timelineclient whether we can connect to 
the backend timeline server and check whether the server is for the specified 
ATS version in the client, or as YARN-3488 suggests receive the information 
related to the ATS version as part of RM register.Thoughts ?  

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, 
> YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4345) yarn rmadmin -updateNodeResource doesn't work

2015-11-11 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001187#comment-15001187
 ] 

Jason Lowe commented on YARN-4345:
--

Thanks for the patch, Junping!  The main change looks OK, but the unit test 
needs some work.  The unit test is missing the test decorator and therefore 
isn't executing.  With that fixed, the unit test is passing even without the 
main fix.

> yarn rmadmin -updateNodeResource doesn't work
> -
>
> Key: YARN-4345
> URL: https://issues.apache.org/jira/browse/YARN-4345
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Sushmitha Sreenivasan
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4345.patch
>
>
> YARN-313 add CLI to update node resource. It works fine for batch mode 
> update. However, for single node update "yarn rmadmin -updateNodeResource" 
> failed to work because resource is not set properly in sending request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4345) yarn rmadmin -updateNodeResource doesn't work

2015-11-11 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001192#comment-15001192
 ] 

Junping Du commented on YARN-4345:
--

I think the test failure is not related. Try the test locally (TestYarnClient) 
which is failed with or without this patch.

> yarn rmadmin -updateNodeResource doesn't work
> -
>
> Key: YARN-4345
> URL: https://issues.apache.org/jira/browse/YARN-4345
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Sushmitha Sreenivasan
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4345.patch
>
>
> YARN-313 add CLI to update node resource. It works fine for batch mode 
> update. However, for single node update "yarn rmadmin -updateNodeResource" 
> failed to work because resource is not set properly in sending request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4345) yarn rmadmin -updateNodeResource doesn't work

2015-11-11 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-4345:
-
Target Version/s: 2.7.3  (was: 2.8.0)

> yarn rmadmin -updateNodeResource doesn't work
> -
>
> Key: YARN-4345
> URL: https://issues.apache.org/jira/browse/YARN-4345
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Sushmitha Sreenivasan
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4345.patch
>
>
> YARN-313 add CLI to update node resource. It works fine for batch mode 
> update. However, for single node update "yarn rmadmin -updateNodeResource" 
> failed to work because resource is not set properly in sending request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4345) yarn rmadmin -updateNodeResource doesn't work

2015-11-11 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4345:
-
Attachment: YARN-4345-v2.patch

> yarn rmadmin -updateNodeResource doesn't work
> -
>
> Key: YARN-4345
> URL: https://issues.apache.org/jira/browse/YARN-4345
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Sushmitha Sreenivasan
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4345-v2.patch, YARN-4345.patch
>
>
> YARN-313 add CLI to update node resource. It works fine for batch mode 
> update. However, for single node update "yarn rmadmin -updateNodeResource" 
> failed to work because resource is not set properly in sending request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-11-11 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001224#comment-15001224
 ] 

Li Lu commented on YARN-4234:
-

Well let me clarify one thing: ATS v1.5 supports all ATS v1 features, while ATS 
v2 is not compatible with ATS v1 and 1.5. Therefore, how about this:

timeline-service.version: 1 -> only ATS v1 features are supported. 
timeline-service.version: 1.5 -> supports ATS v1 and v1.5 features.
timeline-service.version: 2 -> only ATS v2 features are supported. 

This describes the feature set on the server side of ATS. The client can use 
this information from server to perform sanity check. I agree with 
[~Naganarasimha] that checking this config on the client side does not quite 
make sense, and clearly we need some mechanisms to let the client know the 
current running version of ATS. However, I think this is orthogonal to the 
config key proposed in YARN-4183 (and added here). 

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, 
> YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-11-11 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001202#comment-15001202
 ] 

Li Lu commented on YARN-4234:
-

[~Naganarasimha] Sure, we can certainly add those extra sanity checks on the 
client. However, I believe anyways we need this config on the server side to 
describe the highest supported version number. 

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, 
> YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4345) yarn rmadmin -updateNodeResource doesn't work

2015-11-11 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001207#comment-15001207
 ] 

Junping Du commented on YARN-4345:
--

Thanks Jason for review! I update v2 patch to address your comments. 

> yarn rmadmin -updateNodeResource doesn't work
> -
>
> Key: YARN-4345
> URL: https://issues.apache.org/jira/browse/YARN-4345
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Sushmitha Sreenivasan
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4345-v2.patch, YARN-4345.patch
>
>
> YARN-313 add CLI to update node resource. It works fine for batch mode 
> update. However, for single node update "yarn rmadmin -updateNodeResource" 
> failed to work because resource is not set properly in sending request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token

2015-11-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001239#comment-15001239
 ] 

Hudson commented on YARN-4183:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #658 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/658/])
YARN-4183. Reverting the patch to fix behaviour change. Revert (vinodkv: rev 
6351d3fa638f1d901279cef9e56dc4e07ef3de11)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java


> Enabling generic application history forces every job to get a timeline 
> service delegation token
> 
>
> Key: YARN-4183
> URL: https://issues.apache.org/jira/browse/YARN-4183
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-4183.1.patch
>
>
> When enabling just the Generic History Server and not the timeline server, 
> the system metrics publisher will not publish the events to the timeline 
> store as it checks if the timeline server and system metrics publisher are 
> enabled before creating a timeline client.
> To make it work, if the timeline service flag is turned on, it will force 
> every yarn application to get a delegation token.
> Instead of checking if timeline service is enabled, we should be checking if 
> application history server is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4347) Resource manager fails with Null pointer exception

2015-11-11 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-4347:


 Summary: Resource manager fails with Null pointer exception
 Key: YARN-4347
 URL: https://issues.apache.org/jira/browse/YARN-4347
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Reporter: Yesha Vora


Resource manager fails with NPE while trying to load or recover a finished 
application. 
{code}
2015-11-11 17:53:22,351 ERROR resourcemanager.ResourceManager 
(ResourceManager.java:serviceStart(597)) - Failed to load/recover state
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:746)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1155)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:116)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1037)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1001)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:839)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:102)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:854)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:844)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:719)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:313)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:411)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1219)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:593)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1026)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1067)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1063)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4347) Resource manager fails with Null pointer exception

2015-11-11 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He reassigned YARN-4347:
-

Assignee: Jian He

> Resource manager fails with Null pointer exception
> --
>
> Key: YARN-4347
> URL: https://issues.apache.org/jira/browse/YARN-4347
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Yesha Vora
>Assignee: Jian He
>
> Resource manager fails with NPE while trying to load or recover a finished 
> application. 
> {code}
> 2015-11-11 17:53:22,351 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(597)) - Failed to load/recover state
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:746)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1155)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1037)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1001)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:839)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:854)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:844)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:719)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:313)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:411)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1219)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:593)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1026)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1067)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1063)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4132) Nodemanagers should try harder to connect to the RM

2015-11-11 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001278#comment-15001278
 ] 

Jason Lowe commented on YARN-4132:
--

Patch doesn't build.  Other comments:

Do we need to get rid of the old RMProxy.createRetryPolicy(conf) method?  We 
can just leave that around to reduce code churn on the patch and have it 
implemented in terms of the new createRetryPolicy method.

Similarly for maximum backwards compatibility we could leave around an 
RMProxy.createRMProxy method that doesn't take the extra values and calls the 
old createRetryPolicy method. A new private utility method, like 
createRMProxy(conf, protocol, instance, retryPolicy) could be added to factor 
out the common code between the two createRMProxy methods.

"to connection to RM" should be "to connect to the RM" in the property 
descriptions in yarn-default.xml.


> Nodemanagers should try harder to connect to the RM
> ---
>
> Key: YARN-4132
> URL: https://issues.apache.org/jira/browse/YARN-4132
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4132.2.patch, YARN-4132.3.patch, YARN-4132.4.patch, 
> YARN-4132.5.patch, YARN-4132.patch
>
>
> Being part of the cluster, nodemanagers should try very hard (and possibly 
> never give up) to connect to a resourcemanager. Minimally we should have a 
> separate config to set how aggressively a nodemanager will connect to the RM 
> separate from what clients will do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4051) ContainerKillEvent is lost when container is In New State and is recovering

2015-11-11 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001309#comment-15001309
 ] 

Jason Lowe commented on YARN-4051:
--

Thanks for updating the patch!

Should the value be infinite by default?  The concern is that if one container 
has issues recovering (due to log aggregation woes or whatever) then we risk 
expiring all of the containers on this node if we don't re-register with the RM 
within the node expiry interval.  I think it makes sense if we have also fixed 
the recovery paths so there aren't potentially long-running  procedures (like 
contacting HDFS) during the recovery process.  If we haven't then we could 
create as many problems as we're solving by waiting forever.

Why does the patch change the check interval?  If it's to reduce the logging 
then we can better fix that by only logging when the status changes rather than 
every iteration.

Nit: A value of zero should also be treated as a disabled max time.

Nit: "Max time to wait NM to complete container recover before register to RM " 
should be "Max time NM will wait to complete container recovery before 
registering with the RM".

> ContainerKillEvent is lost when container is  In New State and is recovering
> 
>
> Key: YARN-4051
> URL: https://issues.apache.org/jira/browse/YARN-4051
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: sandflee
>Assignee: sandflee
>Priority: Critical
> Attachments: YARN-4051.01.patch, YARN-4051.02.patch, 
> YARN-4051.03.patch, YARN-4051.04.patch
>
>
> As in YARN-4050, NM event dispatcher is blocked, and container is in New 
> state, when we finish application, the container still alive even after NM 
> event dispatcher is unblocked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-11-11 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001318#comment-15001318
 ] 

Naganarasimha G R commented on YARN-4234:
-

Thanks [~gtCarrera], for updating offline about the approach of ATS 1.5, it 
seems to be different than existing approaches in V1 and V2. *Timeline client* 
is able to write directly to HDFS without the knowledge of ATS Server(Reader). 
But one small suggestion, Can the configuration information(like activePath) 
related to 1.5 be got from server rather than client doing it based on its 
local configuration, this avoid configurations conflicts? hope my understanding 
is right. correct me if i am wrong.

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, 
> YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4347) Resource manager fails with Null pointer exception

2015-11-11 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001277#comment-15001277
 ] 

Jian He commented on YARN-4347:
---

Thanks [~yeshavora] for reporting this.
The issue here is that, attempt final state was previously not recorded but app 
final state was previously recoded. So, when it recovers, NPE throws.

Working on a patch to fix the attempt recovery path - 
If the corresponding app's recovered state is at final state, return the 
attempt final state accordingly, instead of adding that into scheduler again. 

> Resource manager fails with Null pointer exception
> --
>
> Key: YARN-4347
> URL: https://issues.apache.org/jira/browse/YARN-4347
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Yesha Vora
>Assignee: Jian He
>
> Resource manager fails with NPE while trying to load or recover a finished 
> application. 
> {code}
> 2015-11-11 17:53:22,351 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(597)) - Failed to load/recover state
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:746)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1155)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1037)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1001)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:839)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:854)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:844)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:719)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:313)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:411)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1219)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:593)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1026)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1067)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1063)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >