[jira] [Commented] (YARN-4815) ATS 1.5 timelineclinet impl try to create attempt directory for every event call

2016-03-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199541#comment-15199541
 ] 

Junping Du commented on YARN-4815:
--

bq.  My concern is that we do not need to have separate caches for each use 
cases if they can be modeled by Guava.
I see. There are pros and cons to use third party library like Guava. The good 
thing is it simply implementation, like: we don't need to handle 
synchronization in this case. However, the general concern is the consistent 
(interface and behavior) of APIs is not that promising (even Java API has this 
problem but we have no other choices), also it is hard to follow bug fix across 
different versions. I don't have strong preference to use Guava or not. But 
this shouldn't be the reason to keep patch get in as no consensus in community 
yet.

Any other comments on patch?

> ATS 1.5 timelineclinet impl try to create attempt directory for every event 
> call
> 
>
> Key: YARN-4815
> URL: https://issues.apache.org/jira/browse/YARN-4815
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4815.1.patch
>
>
> ATS 1.5 timelineclinet impl, try to create attempt directory for every event 
> call. Since per attempt only one call to create directory is enough, this is 
> causing perf issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4746) yarn web services should convert parse failures of appId to 400

2016-03-19 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4746:
---
Attachment: 0004-YARN-4746.patch

Attaching latest patch with testcase fix 

> yarn web services should convert parse failures of appId to 400
> ---
>
> Key: YARN-4746
> URL: https://issues.apache.org/jira/browse/YARN-4746
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Priority: Minor
> Attachments: 0001-YARN-4746.patch, 0002-YARN-4746.patch, 
> 0003-YARN-4746.patch, 0003-YARN-4746.patch, 0004-YARN-4746.patch
>
>
> I'm seeing somewhere in the WS API tests of mine an error with exception 
> conversion of  a bad app ID sent in as an argument to a GET. I know it's in 
> ATS, but a scan of the core RM web services implies a same problem
> {{WebServices.parseApplicationId()}} uses {{ConverterUtils.toApplicationId}} 
> to convert an argument; this throws IllegalArgumentException, which is then 
> handled somewhere by jetty as a 500 error.
> In fact, it's a bad argument, which should be handled by returning a 400. 
> This can be done by catching the raised argument and explicitly converting it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4837) User facing aspects of 'AM blacklisting' feature need fixing

2016-03-19 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200838#comment-15200838
 ] 

Rohith Sharma K S commented on YARN-4837:
-

Adding to discussion about AM blacklisting, there are few corner cases where 
application can get hanged forever. Especially after blacklisting some nodes, 
if other nodes removed from the cluster. In such cases, there is no mechanism 
such that to remove blacklisted nodes for AM. See YARN-4685 for one of the 
scenario.

> User facing aspects of 'AM blacklisting' feature need fixing
> 
>
> Key: YARN-4837
> URL: https://issues.apache.org/jira/browse/YARN-4837
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> Was reviewing the user-facing aspects that we are releasing as part of 2.8.0.
> Looking at the 'AM blacklisting feature', I see several things to be fixed 
> before we release it in 2.8.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4517) [YARN-3368] Add nodes page

2016-03-19 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200518#comment-15200518
 ] 

Li Lu commented on YARN-4517:
-

Hi [~varun_saxena], thanks for the note! Right now I'm fine with moving forward 
as a POC and keep all related issues tracked in new JIRAs. 

> [YARN-3368] Add nodes page
> --
>
> Key: YARN-4517
> URL: https://issues.apache.org/jira/browse/YARN-4517
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Wangda Tan
>Assignee: Varun Saxena
>  Labels: webui
> Attachments: (21-Feb-2016)yarn-ui-screenshots.zip, 
> Screenshot_after_4709.png, Screenshot_after_4709_1.png, 
> YARN-4517-YARN-3368.01.patch, YARN-4517-YARN-3368.02.patch
>
>
> We need nodes page added to next generation web UI, similar to existing 
> RM/nodes page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3933) Race condition when calling AbstractYarnScheduler.completedContainer.

2016-03-19 Thread Shiwei Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shiwei Guo updated YARN-3933:
-
Attachment: YARN-3933.003.patch

> Race condition when calling AbstractYarnScheduler.completedContainer.
> -
>
> Key: YARN-3933
> URL: https://issues.apache.org/jira/browse/YARN-3933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.0, 2.7.0, 2.5.2, 2.7.1
>Reporter: Lavkesh Lahngir
>Assignee: Shiwei Guo
> Attachments: YARN-3933.001.patch, YARN-3933.002.patch, 
> YARN-3933.003.patch
>
>
> In our cluster we are seeing available memory and cores being negative. 
> Initial inspection:
> Scenario no. 1: 
> In capacity scheduler the method allocateContainersToNode() checks if 
> there are excess reservation of containers for an application, and they are 
> no longer needed then it calls queue.completedContainer() which causes 
> resources being negative. And they were never assigned in the first place. 
> I am still looking through the code. Can somebody suggest how to simulate 
> excess containers assignments ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4812) TestFairScheduler#testContinuousScheduling fails intermittently

2016-03-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199629#comment-15199629
 ] 

Hudson commented on YARN-4812:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9472 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9472/])
YARN-4812. TestFairScheduler#testContinuousScheduling fails (kasha: rev 
f84af8bd588763c4e99305742d8c86ed596e8359)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestContinuousScheduling.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java


> TestFairScheduler#testContinuousScheduling fails intermittently
> ---
>
> Key: YARN-4812
> URL: https://issues.apache.org/jira/browse/YARN-4812
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-4812-1.patch
>
>
> This test has failed in the past, and there seem to be more issues. 
> {noformat}
> java.lang.AssertionError: expected:<2> but was:<1>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at org.junit.Assert.assertEquals(Assert.java:542)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3816)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4607) AppAttempt page TotalOutstandingResource Requests table support pagination

2016-03-19 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4607:
---
Attachment: 0001-YARN-4607.patch

[~rohithsharma]
Could you please review patch attached

> AppAttempt page TotalOutstandingResource Requests table support pagination
> --
>
> Key: YARN-4607
> URL: https://issues.apache.org/jira/browse/YARN-4607
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: 0001-YARN-4607.patch
>
>
> Simulate cluster with 10 racks with 100 nodes using sls and of we check the 
> table for Total Outstanding Resource Requests will consume complete page.
> Good to support pagination for the table



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4839) ResourceManager deadlock between RMAppAttemptImpl and SchedulerApplicationAttempt

2016-03-19 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201687#comment-15201687
 ] 

Jason Lowe commented on YARN-4839:
--

bq. Could this be the same issue as pointed out by YARN-4247?

It is essentially the same core issue, but it wasn't caused by YARN-2005.  We 
don't have that change in our build, but we do have YARN-3116.  That's the 
first time getMasterContainer was called from SchedulerApplicationAttempt.  
Without the side-effect of YARN-3361 it leads to a deadlock since 
getMasterContainer tries to grab the lock.


> ResourceManager deadlock between RMAppAttemptImpl and 
> SchedulerApplicationAttempt
> -
>
> Key: YARN-4839
> URL: https://issues.apache.org/jira/browse/YARN-4839
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Jason Lowe
>Priority: Blocker
>
> Hit a deadlock in the ResourceManager as one thread was holding the 
> SchedulerApplicationAttempt lock and trying to call 
> RMAppAttemptImpl.getMasterContainer while another thread had the 
> RMAppAttemptImpl lock and was trying to call 
> SchedulerApplicationAttempt.getResourceUsageReport.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4767) Network issues can cause persistent RM UI outage

2016-03-19 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200150#comment-15200150
 ] 

Daniel Templeton commented on YARN-4767:


Nevermind.  Looks like I have a new round of checkstyle issues to resolve, and 
I broke some tests.

> Network issues can cause persistent RM UI outage
> 
>
> Key: YARN-4767
> URL: https://issues.apache.org/jira/browse/YARN-4767
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.9.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-4767.001.patch, YARN-4767.002.patch, 
> YARN-4767.003.patch
>
>
> If a network issue causes an AM web app to resolve the RM proxy's address to 
> something other than what's listed in the allowed proxies list, the 
> AmIpFilter will 302 redirect the RM proxy's request back to the RM proxy.  
> The RM proxy will then consume all available handler threads connecting to 
> itself over and over, resulting in an outage of the web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started

2016-03-19 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201886#comment-15201886
 ] 

Eric Payne commented on YARN-4686:
--

[~ebadger], Thanks for the work you have done resolving this issue. I merged 
{{YARN-4686.006.patch}} to trunk and cherry-picked to branch-2 and branch-2.8. 
There were enough conflicts with the cherry-pick to branch-2.7 that I think it 
would be best if you provided a separate patch. The way it is written now, it 
has dependencies on JIRAs that were not backported to 2.7 (e.g., YARN-41).

> MiniYARNCluster.start() returns before cluster is completely started
> 
>
> Key: YARN-4686
> URL: https://issues.apache.org/jira/browse/YARN-4686
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Rohith Sharma K S
>Assignee: Eric Badger
> Attachments: MAPREDUCE-6507.001.patch, YARN-4686.001.patch, 
> YARN-4686.002.patch, YARN-4686.003.patch, YARN-4686.004.patch, 
> YARN-4686.005.patch, YARN-4686.006.patch
>
>
> TestRMNMInfo fails intermittently. Below is trace for the failure
> {noformat}
> testRMNMInfo(org.apache.hadoop.mapreduce.v2.TestRMNMInfo)  Time elapsed: 0.28 
> sec  <<< FAILURE!
> java.lang.AssertionError: Unexpected number of live nodes: expected:<4> but 
> was:<3>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.mapreduce.v2.TestRMNMInfo.testRMNMInfo(TestRMNMInfo.java:111)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started

2016-03-19 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200203#comment-15200203
 ] 

Jason Lowe commented on YARN-4686:
--

bq. Still interested in if Jason Lowe or Karthik Kambatla have comments, 
especially about removal of the (extra) threads in startResourceManager and 
serviceStart methods.

The thread removal is key, IMHO.  MiniYARNCluster was a source of flaky tests 
because those threads allowed the mini cluster to return from its start method 
before its subcomponents completed their start methods.  That means tests that 
assumed the cluster was started after cluster.start() were making a bad 
assumption.  Removing these threads means the cluster really is started after 
the start method, assuming the RM and NM start methods correctly return only 
after they have started.

+1 patch looks good to me.  I'm OK either way on the blind or checked 
transition to active since it's a fast no-op in the non-HA case.  It will 
generate an extra "Already in active state" info message in the test logs but 
is otherwise benign.


> MiniYARNCluster.start() returns before cluster is completely started
> 
>
> Key: YARN-4686
> URL: https://issues.apache.org/jira/browse/YARN-4686
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Rohith Sharma K S
>Assignee: Eric Badger
> Attachments: MAPREDUCE-6507.001.patch, YARN-4686.001.patch, 
> YARN-4686.002.patch, YARN-4686.003.patch, YARN-4686.004.patch, 
> YARN-4686.005.patch, YARN-4686.006.patch
>
>
> TestRMNMInfo fails intermittently. Below is trace for the failure
> {noformat}
> testRMNMInfo(org.apache.hadoop.mapreduce.v2.TestRMNMInfo)  Time elapsed: 0.28 
> sec  <<< FAILURE!
> java.lang.AssertionError: Unexpected number of live nodes: expected:<4> but 
> was:<3>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.mapreduce.v2.TestRMNMInfo.testRMNMInfo(TestRMNMInfo.java:111)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started

2016-03-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201876#comment-15201876
 ] 

Hudson commented on YARN-4686:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9474 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9474/])
YARN-4686. MiniYARNCluster.start() returns before cluster is completely 
(epayne: rev 92b7e0d41302b6b110927f99de5c2b4a4a93c5fd)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYARNClusterForHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java


> MiniYARNCluster.start() returns before cluster is completely started
> 
>
> Key: YARN-4686
> URL: https://issues.apache.org/jira/browse/YARN-4686
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Rohith Sharma K S
>Assignee: Eric Badger
> Attachments: MAPREDUCE-6507.001.patch, YARN-4686.001.patch, 
> YARN-4686.002.patch, YARN-4686.003.patch, YARN-4686.004.patch, 
> YARN-4686.005.patch, YARN-4686.006.patch
>
>
> TestRMNMInfo fails intermittently. Below is trace for the failure
> {noformat}
> testRMNMInfo(org.apache.hadoop.mapreduce.v2.TestRMNMInfo)  Time elapsed: 0.28 
> sec  <<< FAILURE!
> java.lang.AssertionError: Unexpected number of live nodes: expected:<4> but 
> was:<3>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.mapreduce.v2.TestRMNMInfo.testRMNMInfo(TestRMNMInfo.java:111)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4685) AM blacklisting result in application to get hanged

2016-03-19 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201117#comment-15201117
 ] 

Vinod Kumar Vavilapalli commented on YARN-4685:
---

There are simpler cases which are busted too. For e.g, if an AM failed on a 
node, this node will *never* be looked again for launching this app's AM as it 
is within the blacklist threshold. In a busy cluster where this node continues 
to be the only one free for a while, we will keep on skipping the machine.

> AM blacklisting result in application to get hanged
> ---
>
> Key: YARN-4685
> URL: https://issues.apache.org/jira/browse/YARN-4685
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>
> AM blacklist addition or removal is updated only when RMAppAttempt is 
> scheduled i.e {{RMAppAttemptImpl#ScheduleTransition#transition}}. But once 
> attempt is scheduled if there is any removeNode/addNode in cluster then this 
> is not updated to {{BlackListManager#refreshNodeHostCount}}. This leads 
> BlackListManager to operate on stale NM's count. And application is in 
> ACCEPTED state and wait forever even if we add more nodes to cluster.
> Solution is update BlacklistManager for every 
> {{RMAppAttemptImpl#AMContainerAllocatedTransition#transition}} call. This 
> ensures if there is any addition/removal in nodes, this will be updated to 
> BlacklistManager 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3926) Extend the YARN resource model for easier resource-type management and profiles

2016-03-19 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197750#comment-15197750
 ] 

Arun Suresh commented on YARN-3926:
---

bq. RM subset - as long the NM resource types are a superset of the RM, the 
handshake proceeds - I believe this will address your concerns. Correct?
That should work... But I feel, maybe *allow mismatch* should be the default. 
If NM has a super-set of RMs resource types, it will just be ignored, If 
sub-set, then for those specific resource-types, RM will assign a 0 value for 
the NM.

Which would facilitate my other point.. Allow NMs to dynamically advertise new 
/ disable existing resource types (NM would know of these new types via some 
admin API or self-discovery) as part of the NM heartbeat. Similarly, on the RM 
side, if the new resource advertised by the NM is unknown to RM, it just 
ignores it. We can also add admin API on the RM to add / remove allowable 
resource types on the fly.

> Extend the YARN resource model for easier resource-type management and 
> profiles
> ---
>
> Key: YARN-3926
> URL: https://issues.apache.org/jira/browse/YARN-3926
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Proposal for modifying resource model and profiles.pdf
>
>
> Currently, there are efforts to add support for various resource-types such 
> as disk(YARN-2139), network(YARN-2140), and  HDFS bandwidth(YARN-2681). These 
> efforts all aim to add support for a new resource type and are fairly 
> involved efforts. In addition, once support is added, it becomes harder for 
> users to specify the resources they need. All existing jobs have to be 
> modified, or have to use the minimum allocation.
> This ticket is a proposal to extend the YARN resource model to a more 
> flexible model which makes it easier to support additional resource-types. It 
> also considers the related aspect of “resource profiles” which allow users to 
> easily specify the various resources they need for any given container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1508) Rename ResourceOption and document resource over-commitment cases

2016-03-19 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1508:
-
Priority: Major  (was: Minor)

> Rename ResourceOption and document resource over-commitment cases
> -
>
> Key: YARN-1508
> URL: https://issues.apache.org/jira/browse/YARN-1508
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
>
> Per Vinod's comment in 
> YARN-312(https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087)
>  and Bikas' comment in 
> YARN-311(https://issues.apache.org/jira/browse/YARN-311?focusedCommentId=13848615),
>  the name of ResourceOption is not good enough for being understood. Also, we 
> need to document more on resource overcommitment time and use cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode

2016-03-19 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200154#comment-15200154
 ] 

Varun Saxena commented on YARN-2962:


Thanks Daniel for the review. 

bq. One optimization I might consider is to only add the splits that actually 
exist to the rmAppRootHierarchies since I would assume that the common case 
will be to not use splits.
That is done in {{loadRMAppState}}. Only those hierarchies which have apps are 
considered.  We can go with not creating the directories which are not 
configured ever, at all. But I went with this approach.

bq. what happens if an app happens to exist in more than one split? 
Should not happen until and unless somebody does manual operations in state 
store. And in that case, a lot can go wrong with state store. So we do make 
some assumptions, even in previous code.
However, I am not sure whether we need to fence certain operations or not as I 
highlighted in an earlier comment. But could not come up with a scenario 
because I felt state store operations are synchronized, inconsistency should 
not occur.

> ZKRMStateStore: Limit the number of znodes under a znode
> 
>
> Key: YARN-2962
> URL: https://issues.apache.org/jira/browse/YARN-2962
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: YARN-2962.01.patch, YARN-2962.04.patch, 
> YARN-2962.2.patch, YARN-2962.3.patch
>
>
> We ran into this issue where we were hitting the default ZK server message 
> size configs, primarily because the message had too many znodes even though 
> they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4823) Refactor the nested reservation id field in listReservation to simple string field

2016-03-19 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-4823:
-
Summary: Refactor the nested reservation id field in listReservation to 
simple string field  (was: Rename the nested reservation id field in 
listReservation to ID)

> Refactor the nested reservation id field in listReservation to simple string 
> field
> --
>
> Key: YARN-4823
> URL: https://issues.apache.org/jira/browse/YARN-4823
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>
> The listReservation REST API returns a ReservationId field which has a nested 
> id field which is also called ReservationId. This JIRA proposes to rename the 
> nested field to id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4732) *ProcessTree classes have too many whitespace issues

2016-03-19 Thread Gabor Liptak (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203040#comment-15203040
 ] 

Gabor Liptak commented on YARN-4732:


[~kasha] Any other changes you would like to see? Thanks

> *ProcessTree classes have too many whitespace issues
> 
>
> Key: YARN-4732
> URL: https://issues.apache.org/jira/browse/YARN-4732
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Karthik Kambatla
>Assignee: Gabor Liptak
>Priority: Trivial
>  Labels: newbie
> Attachments: YARN-4732.1.patch
>
>
> *ProcessTree classes have too many whitespace issues - extra newlines between 
> methods, spaces in empty lines etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4829) Add support for binary units

2016-03-19 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-4829:

Attachment: YARN-4829-YARN-3926.001.patch

Attached a file with the fix.

> Add support for binary units
> 
>
> Key: YARN-4829
> URL: https://issues.apache.org/jira/browse/YARN-4829
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4829-YARN-3926.001.patch
>
>
> The units conversion util should have support for binary units.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-3926) Extend the YARN resource model for easier resource-type management and profiles

2016-03-19 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197763#comment-15197763
 ] 

Varun Vasudev edited comment on YARN-3926 at 3/16/16 5:41 PM:
--

bq. That should work... But I feel, maybe allow mismatch should be the default. 
If NM has a super-set of RMs resource types, it will just be ignored, If 
sub-set, then for those specific resource-types, RM will assign a 0 value for 
the NM.

I don't have any particular preference - I can see scenarios for all 3. I'm 
fine with making allow mismatch the default.

bq. We can also add admin API on the RM to add / remove allowable resource 
types on the fly.

This should be do-able but we need to go through how this will affect on 
running apps.


was (Author: vvasudev):
bq. That should work... But I feel, maybe allow mismatch should be the default. 
If NM has a super-set of RMs resource types, it will just be ignored, If 
sub-set, then for those specific resource-types, RM will assign a 0 value for 
the NM.

I don't have any particular preference - I can see scenarios for all 3. I'm 
fine with making allow mismatch the default.

bq. We can also add admin API on the RM to add / remove allowable resource 
types on the fly.

This should be do-able but we need to go through the affect on running apps.

> Extend the YARN resource model for easier resource-type management and 
> profiles
> ---
>
> Key: YARN-3926
> URL: https://issues.apache.org/jira/browse/YARN-3926
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Proposal for modifying resource model and profiles.pdf
>
>
> Currently, there are efforts to add support for various resource-types such 
> as disk(YARN-2139), network(YARN-2140), and  HDFS bandwidth(YARN-2681). These 
> efforts all aim to add support for a new resource type and are fairly 
> involved efforts. In addition, once support is added, it becomes harder for 
> users to specify the resources they need. All existing jobs have to be 
> modified, or have to use the minimum allocation.
> This ticket is a proposal to extend the YARN resource model to a more 
> flexible model which makes it easier to support additional resource-types. It 
> also considers the related aspect of “resource profiles” which allow users to 
> easily specify the various resources they need for any given container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4820) ResourceManager web redirects in HA mode drops query parameters

2016-03-19 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198044#comment-15198044
 ] 

Steve Loughran commented on YARN-4820:
--

I was just looking at the redirect logic and noting it was looking at 302's 
only...if you are treating redirects specially, then 307s ought to be covered 
as well, as those are what PUT/POST/DELETE verbs should be issuing if they ever 
need to redirect

> ResourceManager web redirects in HA mode drops query parameters
> ---
>
> Key: YARN-4820
> URL: https://issues.apache.org/jira/browse/YARN-4820
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4820.001.patch
>
>
> The RMWebAppFilter redirects http requests from the standby to the active. 
> However it drops all the query parameters when it does the redirect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4712) CPU Usage Metric is not captured properly in YARN-2928

2016-03-19 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201871#comment-15201871
 ] 

Naganarasimha G R commented on YARN-4712:
-

Thanks for the review and commit [~varun_saxena], [~sjlee0], [~djp], & 
[~sunilg].

> CPU Usage Metric is not captured properly in YARN-2928
> --
>
> Key: YARN-4712
> URL: https://issues.apache.org/jira/browse/YARN-4712
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Fix For: YARN-2928
>
> Attachments: YARN-4712-YARN-2928.v1.001.patch, 
> YARN-4712-YARN-2928.v1.002.patch, YARN-4712-YARN-2928.v1.003.patch, 
> YARN-4712-YARN-2928.v1.004.patch, YARN-4712-YARN-2928.v1.005.patch, 
> YARN-4712-YARN-2928.v1.006.patch
>
>
> There are 2 issues with CPU usage collection 
> * I was able to observe that that many times CPU usage got from 
> {{pTree.getCpuUsagePercent()}} is 
> ResourceCalculatorProcessTree.UNAVAILABLE(i.e. -1) but ContainersMonitor do 
> the calculation  i.e. {{cpuUsageTotalCoresPercentage = cpuUsagePercentPerCore 
> /resourceCalculatorPlugin.getNumProcessors()}} because of which UNAVAILABLE 
> check in {{NMTimelinePublisher.reportContainerResourceUsage}} is not 
> encountered. so proper checks needs to be handled
> * {{EntityColumnPrefix.METRIC}} uses always LongConverter but 
> ContainerMonitor is publishing decimal values for the CPU usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4767) Network issues can cause persistent RM UI outage

2016-03-19 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-4767:
---
Attachment: YARN-4767.003.patch

This patch should resolve the checkstyle issues.  I ended up having to refactor 
some existing code to reduce the method sizes.

> Network issues can cause persistent RM UI outage
> 
>
> Key: YARN-4767
> URL: https://issues.apache.org/jira/browse/YARN-4767
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.9.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-4767.001.patch, YARN-4767.002.patch, 
> YARN-4767.003.patch
>
>
> If a network issue causes an AM web app to resolve the RM proxy's address to 
> something other than what's listed in the allowed proxies list, the 
> AmIpFilter will 302 redirect the RM proxy's request back to the RM proxy.  
> The RM proxy will then consume all available handler threads connecting to 
> itself over and over, resulting in an outage of the web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4390) Consider container request size during CS preemption

2016-03-19 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198795#comment-15198795
 ] 

Wangda Tan commented on YARN-4390:
--

Hi [~eepayne],

Since YARN-4108 doesn't solve all the issues. (I planned to solve this together 
with YARN-4108, but YARN-4108 only tackled half of the problem: when containers 
selected, only preempt useful containers). However, we need select container 
more clever based on requirement. I'm thinking about this recently and I plan 
to make some progresses as soon as possible. May I reopen this JIRA and take 
over from you?

> Consider container request size during CS preemption
> 
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.0.0, 2.8.0, 2.7.3
>Reporter: Eric Payne
>Assignee: Eric Payne
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2965) Enhance Node Managers to monitor and report the resource usage on machines

2016-03-19 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198111#comment-15198111
 ] 

Inigo Goiri commented on YARN-2965:
---

[~rgrandl], [~srikanthkandula], any objections?

> Enhance Node Managers to monitor and report the resource usage on machines
> --
>
> Key: YARN-2965
> URL: https://issues.apache.org/jira/browse/YARN-2965
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Robert Grandl
>Assignee: Robert Grandl
> Attachments: ddoc_RT.docx
>
>
> This JIRA is about augmenting Node Managers to monitor the resource usage on 
> the machine, aggregates these reports and exposes them to the RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4746) yarn web services should convert parse failures of appId to 400

2016-03-19 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4746:
---
Attachment: 0003-YARN-4746.patch

Attaching patch after testcase fix

> yarn web services should convert parse failures of appId to 400
> ---
>
> Key: YARN-4746
> URL: https://issues.apache.org/jira/browse/YARN-4746
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Priority: Minor
> Attachments: 0001-YARN-4746.patch, 0002-YARN-4746.patch, 
> 0003-YARN-4746.patch, 0003-YARN-4746.patch
>
>
> I'm seeing somewhere in the WS API tests of mine an error with exception 
> conversion of  a bad app ID sent in as an argument to a GET. I know it's in 
> ATS, but a scan of the core RM web services implies a same problem
> {{WebServices.parseApplicationId()}} uses {{ConverterUtils.toApplicationId}} 
> to convert an argument; this throws IllegalArgumentException, which is then 
> handled somewhere by jetty as a 500 error.
> In fact, it's a bad argument, which should be handled by returning a 400. 
> This can be done by catching the raised argument and explicitly converting it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2016-03-19 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197237#comment-15197237
 ] 

Naganarasimha G R commented on YARN-796:


Hi [~jameszhouyi], In 2.6.0 label exclusivity is not supported and hope you are 
also aware that labels are supported only in CS

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, 
> Node-labels-Requirements-Design-doc-V2.pdf, 
> Non-exclusive-Node-Partition-Design.pdf, YARN-796-Diagram.pdf, 
> YARN-796.node-label.consolidate.1.patch, 
> YARN-796.node-label.consolidate.10.patch, 
> YARN-796.node-label.consolidate.11.patch, 
> YARN-796.node-label.consolidate.12.patch, 
> YARN-796.node-label.consolidate.13.patch, 
> YARN-796.node-label.consolidate.14.patch, 
> YARN-796.node-label.consolidate.2.patch, 
> YARN-796.node-label.consolidate.3.patch, 
> YARN-796.node-label.consolidate.4.patch, 
> YARN-796.node-label.consolidate.5.patch, 
> YARN-796.node-label.consolidate.6.patch, 
> YARN-796.node-label.consolidate.7.patch, 
> YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
> YARN-796.patch, YARN-796.patch4
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4785) inconsistent value type of the "type" field for LeafQueueInfo in response of RM REST API - cluster/scheduler

2016-03-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199809#comment-15199809
 ] 

Junping Du commented on YARN-4785:
--

Thanks [~vvasudev], the patch for branch-2.6 and branch-2.7 LGTM. Will commit 
them shortly.

> inconsistent value type of the "type" field for LeafQueueInfo in response of 
> RM REST API - cluster/scheduler
> 
>
> Key: YARN-4785
> URL: https://issues.apache.org/jira/browse/YARN-4785
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.6.0
>Reporter: Jayesh
>Assignee: Varun Vasudev
>  Labels: REST_API
> Attachments: YARN-4785.001.patch, YARN-4785.branch-2.6.001.patch, 
> YARN-4785.branch-2.7.001.patch
>
>
> I see inconsistent value type ( String and Array ) of the "type" field for 
> LeafQueueInfo in response of RM REST API - cluster/scheduler
> as per the spec it should be always String.
> here is the sample output ( removed non-relevant fields )
> {code}
> {
>   "scheduler": {
> "schedulerInfo": {
>   "type": "capacityScheduler",
>   "capacity": 100,
>   ...
>   "queueName": "root",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 0.1,
> 
>   },
>   {
> "type": [
>   "capacitySchedulerLeafQueueInfo"
> ],
> "capacity": 0.1,
> "queueName": "test-queue",
> "state": "RUNNING",
> 
>   },
>   {
> "type": [
>   "capacitySchedulerLeafQueueInfo"
> ],
> "capacity": 2.5,
> 
>   },
>   {
> "capacity": 25,
> 
> "state": "RUNNING",
> "queues": {
>   "queue": [
> {
>   "capacity": 6,
>   "state": "RUNNING",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 100,
> ...
>   }
> ]
>   },
>   
> },
> {
>   "capacity": 6,
>   ...
>   "state": "RUNNING",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 100,
> ...
>   }
> ]
>   },
>   ...
> },
> ...
>   ]
> },
> ...
>   }
> ]
>   }
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4829) Add support for binary units

2016-03-19 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197959#comment-15197959
 ] 

Arun Suresh commented on YARN-4829:
---

bq. ..ones in the IEC standard. It's also the format used by Kubernetes...
Ah !!.. Makes sense..

Also, Since we are introducing *Mi* here, the following should be fixed as well 
:
* In {{ResourceInformation}}, the static final {{MEMORY_MB}} field should be 
initialized to {{ResourceInformation.newInstance(MEMORY_URI, "Mi");}}
* In {{ResourcePBImpl#getMemory}}, the argument to {{convert()}} should be 
*Mi*, not *M*




> Add support for binary units
> 
>
> Key: YARN-4829
> URL: https://issues.apache.org/jira/browse/YARN-4829
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4829-YARN-3926.001.patch, 
> YARN-4829-YARN-3926.002.patch
>
>
> The units conversion util should have support for binary units.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4830) Add support for resource types in the nodemanager

2016-03-19 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-4830:

Component/s: (was: resourcemanager)

> Add support for resource types in the nodemanager
> -
>
> Key: YARN-4830
> URL: https://issues.apache.org/jira/browse/YARN-4830
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4829) Add support for binary units

2016-03-19 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197795#comment-15197795
 ] 

Arun Suresh commented on YARN-4829:
---

The patch looks mostly good. Thanks [~vvasudev]

Couplo minor nits:
* Maybe rename *Mi, Ti, Pi* to *Me, Te, Pe* or maybe replace eveything with *b 
(Kb, Mb..) to signify binary ?
* Can we have a test case that converts between a binary to non-binary (K to 
Ki) for eg. ? 

> Add support for binary units
> 
>
> Key: YARN-4829
> URL: https://issues.apache.org/jira/browse/YARN-4829
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4829-YARN-3926.001.patch
>
>
> The units conversion util should have support for binary units.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4820) ResourceManager web redirects in HA mode drops query parameters

2016-03-19 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200204#comment-15200204
 ] 

Steve Loughran commented on YARN-4820:
--

ok, I'm wrong, this is a 307, not a 302 ... ignore everything I was complaining 
about

> ResourceManager web redirects in HA mode drops query parameters
> ---
>
> Key: YARN-4820
> URL: https://issues.apache.org/jira/browse/YARN-4820
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4820.001.patch, YARN-4820.002.patch
>
>
> The RMWebAppFilter redirects http requests from the standby to the active. 
> However it drops all the query parameters when it does the redirect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4839) ResourceManager deadlock between RMAppAttemptImpl and SchedulerApplicationAttempt

2016-03-19 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201669#comment-15201669
 ] 

Sangjin Lee commented on YARN-4839:
---

Could this be the same issue as pointed out by YARN-4247? We did see this issue 
in our environment (which is 2.6 + patches), but that was because we backported 
YARN-2005 without YARN-3361. Not sure if there has been a more recent 
regression.

> ResourceManager deadlock between RMAppAttemptImpl and 
> SchedulerApplicationAttempt
> -
>
> Key: YARN-4839
> URL: https://issues.apache.org/jira/browse/YARN-4839
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Jason Lowe
>Priority: Blocker
>
> Hit a deadlock in the ResourceManager as one thread was holding the 
> SchedulerApplicationAttempt lock and trying to call 
> RMAppAttemptImpl.getMasterContainer while another thread had the 
> RMAppAttemptImpl lock and was trying to call 
> SchedulerApplicationAttempt.getResourceUsageReport.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4517) [YARN-3368] Add nodes page

2016-03-19 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199761#comment-15199761
 ] 

Varun Saxena commented on YARN-4517:


[~gtCarrera9], as per discussion with Wangda, unification of app/container 
information, we can do after this branch goes into trunk. I think we can 
definitely unify and have a single container page. I will do this later as part 
of another JIRA. NM single app page, will have to see. 

Regarding this, 
bq. However, in the new UI, when the node is shutdown, I just could not hold 
myself to try to find a link to the NM logs to figure out why. I think the 
workflow here changed slightly, hence the user experience. Some other projects 
like Apache Ambari may want to maintain those information as well, but in YARN, 
it will be great if we could provide our users a way out. Maybe something like: 
"You have to ssh to the missing nodes' /xxx/ dir to look for the logs" 
would even be helpful.
In old UI, we did not show anything if node was shutdown. So here, the change 
is that we are showing even the nodes which have been SHUTDOWN. Just thought 
that this might be useful info for the admin.
Now, some information regarding on which path, user can check the logs may be 
useful but currently this information is not available in RM. I am not sure how 
NM can report this to RM. You can report it in node registration but do we need 
to ?
Ambari may have this information because I guess it knows exactly where 
installation has been done through it.

Regarding node labels, we can add it in REST response indicating if labels are 
enabled or not. We can do this later because this would require another JIRA 
for REST changes.

500 error I think you are encountering on app page. I will fix it while doing 
AM pages or as a separate JIRA.


> [YARN-3368] Add nodes page
> --
>
> Key: YARN-4517
> URL: https://issues.apache.org/jira/browse/YARN-4517
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Wangda Tan
>Assignee: Varun Saxena
>  Labels: webui
> Attachments: (21-Feb-2016)yarn-ui-screenshots.zip, 
> Screenshot_after_4709.png, Screenshot_after_4709_1.png, 
> YARN-4517-YARN-3368.01.patch, YARN-4517-YARN-3368.02.patch
>
>
> We need nodes page added to next generation web UI, similar to existing 
> RM/nodes page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2965) Enhance Node Managers to monitor and report the resource usage on machines

2016-03-19 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198204#comment-15198204
 ] 

Srikanth Kandula commented on YARN-2965:


Go for it :-)  We have some dummy code that was good enough to get numbers and 
experiments but are not actively working on pushing that in. Inigo, i will 
share that code with you offline so you can pick any useful pieces if you like 
from that.

> Enhance Node Managers to monitor and report the resource usage on machines
> --
>
> Key: YARN-2965
> URL: https://issues.apache.org/jira/browse/YARN-2965
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Robert Grandl
>Assignee: Robert Grandl
> Attachments: ddoc_RT.docx
>
>
> This JIRA is about augmenting Node Managers to monitor the resource usage on 
> the machine, aggregates these reports and exposes them to the RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4062) Add the flush and compaction functionality via coprocessors and scanners for flow run table

2016-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200736#comment-15200736
 ] 

Hadoop QA commented on YARN-4062:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 48s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
12s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 57s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 28s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
42s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 30s 
{color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
42s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
14s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 24s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 45s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 58s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 3m 23s {color} 
| {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.8.0_74 with JDK v1.8.0_74 
generated 1 new + 9 unchanged - 0 fixed = 10 total (was 9) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 58s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 22s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 5m 45s {color} 
| {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.7.0_95 with JDK v1.7.0_95 
generated 1 new + 10 unchanged - 0 fixed = 11 total (was 10) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
37s {color} | {color:green} hadoop-yarn-project/hadoop-yarn: patch generated 0 
new + 212 unchanged - 1 fixed = 212 total (was 213) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
36s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 38s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_74. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 0s 
{color} | {color:green} 

[jira] [Commented] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started

2016-03-19 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198410#comment-15198410
 ] 

Karthik Kambatla commented on YARN-4686:


On a cursory look, the patch looks reasonable. One thing that caught my eye: 
are we explicitly transitioning the RM to active even when HA is not enabled? 
Is that required? 

> MiniYARNCluster.start() returns before cluster is completely started
> 
>
> Key: YARN-4686
> URL: https://issues.apache.org/jira/browse/YARN-4686
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Rohith Sharma K S
>Assignee: Eric Badger
> Attachments: MAPREDUCE-6507.001.patch, YARN-4686.001.patch, 
> YARN-4686.002.patch, YARN-4686.003.patch, YARN-4686.004.patch, 
> YARN-4686.005.patch, YARN-4686.006.patch
>
>
> TestRMNMInfo fails intermittently. Below is trace for the failure
> {noformat}
> testRMNMInfo(org.apache.hadoop.mapreduce.v2.TestRMNMInfo)  Time elapsed: 0.28 
> sec  <<< FAILURE!
> java.lang.AssertionError: Unexpected number of live nodes: expected:<4> but 
> was:<3>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.mapreduce.v2.TestRMNMInfo.testRMNMInfo(TestRMNMInfo.java:111)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4808) SchedulerNode can use a few more cosmetic changes

2016-03-19 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15202938#comment-15202938
 ] 

Karthik Kambatla commented on YARN-4808:


[~leftnoteasy] - will you be able to review this? 

> SchedulerNode can use a few more cosmetic changes
> -
>
> Key: YARN-4808
> URL: https://issues.apache.org/jira/browse/YARN-4808
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-4808-1.patch
>
>
> We have made some cosmetic changes to SchedulerNode recently. While working 
> on YARN-4511, realized we could improve it a little more:
> # Remove volatile variables - don't see the need for them being volatile
> # Some methods end up doing very similar things, so consolidating them
> # Renaming totalResource to capacity. YARN-4511 plans to add inflatedCapacity 
> to include the un-utilized resources, and having two totals can be a little 
> confusing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4815) ATS 1.5 timelineclinet impl try to create attempt directory for every event call

2016-03-19 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4815:

Attachment: YARN-4815.2.patch

rebase the patch

> ATS 1.5 timelineclinet impl try to create attempt directory for every event 
> call
> 
>
> Key: YARN-4815
> URL: https://issues.apache.org/jira/browse/YARN-4815
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4815.1.patch, YARN-4815.2.patch
>
>
> ATS 1.5 timelineclinet impl, try to create attempt directory for every event 
> call. Since per attempt only one call to create directory is enough, this is 
> causing perf issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4835) [YARN-3368] REST API related changes for new Web UI

2016-03-19 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4835:
---
Description: 
Following things need to be added for AM related web pages.

1. Support task state query param in REST URL for fetching tasks.
2. Support task attempt state query param in REST URL for fetching task 
attempts.
3. A new REST endpoint to fetch counters for each task belonging to a job. Also 
have a query param for counter name.
   i.e. something like :
  {{/jobs/\{jobid\}/taskCounters}}
4. A REST endpoint in NM for fetching all log files associated with a 
container. Useful if logs served by NM.

> [YARN-3368] REST API related changes for new Web UI
> ---
>
> Key: YARN-4835
> URL: https://issues.apache.org/jira/browse/YARN-4835
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>
> Following things need to be added for AM related web pages.
> 1. Support task state query param in REST URL for fetching tasks.
> 2. Support task attempt state query param in REST URL for fetching task 
> attempts.
> 3. A new REST endpoint to fetch counters for each task belonging to a job. 
> Also have a query param for counter name.
>i.e. something like :
>   {{/jobs/\{jobid\}/taskCounters}}
> 4. A REST endpoint in NM for fetching all log files associated with a 
> container. Useful if logs served by NM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4823) Refactor the nested reservation id field in listReservation to simple string field

2016-03-19 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200852#comment-15200852
 ] 

Subru Krishnan commented on YARN-4823:
--

The test case failures are consistent and unrelated and are covered in YARN-4478

> Refactor the nested reservation id field in listReservation to simple string 
> field
> --
>
> Key: YARN-4823
> URL: https://issues.apache.org/jira/browse/YARN-4823
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-4823-v1.patch
>
>
> The listReservation REST API returns a ReservationId field which has a nested 
> id field which is also called ReservationId. This JIRA proposes to rename the 
> nested field to a string as it's easier to read and moreover what the 
> update/delete APIs take in as input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4785) inconsistent value type of the "type" field for LeafQueueInfo in response of RM REST API - cluster/scheduler

2016-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199865#comment-15199865
 ] 

Hadoop QA commented on YARN-4785:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s {color} 
| {color:red} YARN-4785 does not apply to branch-2.6. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12793996/YARN-4785.branch-2.6.001.patch
 |
| JIRA Issue | YARN-4785 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10806/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> inconsistent value type of the "type" field for LeafQueueInfo in response of 
> RM REST API - cluster/scheduler
> 
>
> Key: YARN-4785
> URL: https://issues.apache.org/jira/browse/YARN-4785
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.6.0
>Reporter: Jayesh
>Assignee: Varun Vasudev
>  Labels: REST_API
> Attachments: YARN-4785.001.patch, YARN-4785.branch-2.6.001.patch, 
> YARN-4785.branch-2.7.001.patch
>
>
> I see inconsistent value type ( String and Array ) of the "type" field for 
> LeafQueueInfo in response of RM REST API - cluster/scheduler
> as per the spec it should be always String.
> here is the sample output ( removed non-relevant fields )
> {code}
> {
>   "scheduler": {
> "schedulerInfo": {
>   "type": "capacityScheduler",
>   "capacity": 100,
>   ...
>   "queueName": "root",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 0.1,
> 
>   },
>   {
> "type": [
>   "capacitySchedulerLeafQueueInfo"
> ],
> "capacity": 0.1,
> "queueName": "test-queue",
> "state": "RUNNING",
> 
>   },
>   {
> "type": [
>   "capacitySchedulerLeafQueueInfo"
> ],
> "capacity": 2.5,
> 
>   },
>   {
> "capacity": 25,
> 
> "state": "RUNNING",
> "queues": {
>   "queue": [
> {
>   "capacity": 6,
>   "state": "RUNNING",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 100,
> ...
>   }
> ]
>   },
>   
> },
> {
>   "capacity": 6,
>   ...
>   "state": "RUNNING",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 100,
> ...
>   }
> ]
>   },
>   ...
> },
> ...
>   ]
> },
> ...
>   }
> ]
>   }
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-998) Persistent resource change during NM/RM restart

2016-03-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199584#comment-15199584
 ] 

Junping Du commented on YARN-998:
-

Hi [~jianhe], would you kindly review the patch again? Thanks!

> Persistent resource change during NM/RM restart
> ---
>
> Key: YARN-998
> URL: https://issues.apache.org/jira/browse/YARN-998
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-998-sample.patch, YARN-998-v1.patch, 
> YARN-998-v2.patch
>
>
> When NM is restarted by plan or from a failure, previous dynamic resource 
> setting should be kept for consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4517) [YARN-3368] Add nodes page

2016-03-19 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200205#comment-15200205
 ] 

Varun Saxena commented on YARN-4517:


[~sunilg],
bq. I will try and see whether we can make a unified patch for all REST changes 
needed for UI together. Will syncup with you offline and will share summary 
here.
Yup. I will raise a JIRA for REST changes soon. You can update the changes you 
plan to make, there.

bq. From RM, we can get the node ip/hostname. Atleast we can give a relative 
patch for getting dir for logs (may be from available path from 
yarn-default.xml). 
Node IP will anyways be shown with the SHUTDOWN node.
Sorry, I do not know, but which current configuration are you referring to, 
from which we can know where nodemanager logs are located ?
IIUC, log file location is decided by yarn.log.dir system property(which is 
passed while starting NM daemon) and RM wont know about it for nodemanagers. 
Currently, if I am not wrong, RM wont have this info and if we need to support, 
code will have to be added.

> [YARN-3368] Add nodes page
> --
>
> Key: YARN-4517
> URL: https://issues.apache.org/jira/browse/YARN-4517
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Wangda Tan
>Assignee: Varun Saxena
>  Labels: webui
> Attachments: (21-Feb-2016)yarn-ui-screenshots.zip, 
> Screenshot_after_4709.png, Screenshot_after_4709_1.png, 
> YARN-4517-YARN-3368.01.patch, YARN-4517-YARN-3368.02.patch
>
>
> We need nodes page added to next generation web UI, similar to existing 
> RM/nodes page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4712) CPU Usage Metric is not captured properly in YARN-2928

2016-03-19 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199768#comment-15199768
 ] 

Varun Saxena commented on YARN-4712:


+1 on the latest patch from me too.
>From the point of view of aggregation, I agree with using container CPU per 
>core.

Before committing this, I think we can discuss with others in meeting today and 
find out if they agree.
And if we need other metrics at the container level.


> CPU Usage Metric is not captured properly in YARN-2928
> --
>
> Key: YARN-4712
> URL: https://issues.apache.org/jira/browse/YARN-4712
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4712-YARN-2928.v1.001.patch, 
> YARN-4712-YARN-2928.v1.002.patch, YARN-4712-YARN-2928.v1.003.patch, 
> YARN-4712-YARN-2928.v1.004.patch, YARN-4712-YARN-2928.v1.005.patch, 
> YARN-4712-YARN-2928.v1.006.patch
>
>
> There are 2 issues with CPU usage collection 
> * I was able to observe that that many times CPU usage got from 
> {{pTree.getCpuUsagePercent()}} is 
> ResourceCalculatorProcessTree.UNAVAILABLE(i.e. -1) but ContainersMonitor do 
> the calculation  i.e. {{cpuUsageTotalCoresPercentage = cpuUsagePercentPerCore 
> /resourceCalculatorPlugin.getNumProcessors()}} because of which UNAVAILABLE 
> check in {{NMTimelinePublisher.reportContainerResourceUsage}} is not 
> encountered. so proper checks needs to be handled
> * {{EntityColumnPrefix.METRIC}} uses always LongConverter but 
> ContainerMonitor is publishing decimal values for the CPU usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4746) yarn web services should convert parse failures of appId to 400

2016-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199226#comment-15199226
 ] 

Hadoop QA commented on YARN-4746:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 51s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 52s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
35s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 57s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
55s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
45s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 24s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
40s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 46s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 46s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 10s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 52s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 3s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 35s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 55s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_74. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK 
v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 4s {color} | 
{color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 41s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_74. {color} |
| {color:green}+1{color} | {color:green} unit 

[jira] [Commented] (YARN-4785) inconsistent value type of the "type" field for LeafQueueInfo in response of RM REST API - cluster/scheduler

2016-03-19 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200882#comment-15200882
 ] 

Varun Vasudev commented on YARN-4785:
-

Thanks [~djp]!

> inconsistent value type of the "type" field for LeafQueueInfo in response of 
> RM REST API - cluster/scheduler
> 
>
> Key: YARN-4785
> URL: https://issues.apache.org/jira/browse/YARN-4785
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.6.0
>Reporter: Jayesh
>Assignee: Varun Vasudev
>  Labels: REST_API
> Fix For: 2.8.0, 2.7.3, 2.6.5
>
> Attachments: YARN-4785.001.patch, YARN-4785.branch-2.6.001.patch, 
> YARN-4785.branch-2.7.001.patch
>
>
> I see inconsistent value type ( String and Array ) of the "type" field for 
> LeafQueueInfo in response of RM REST API - cluster/scheduler
> as per the spec it should be always String.
> here is the sample output ( removed non-relevant fields )
> {code}
> {
>   "scheduler": {
> "schedulerInfo": {
>   "type": "capacityScheduler",
>   "capacity": 100,
>   ...
>   "queueName": "root",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 0.1,
> 
>   },
>   {
> "type": [
>   "capacitySchedulerLeafQueueInfo"
> ],
> "capacity": 0.1,
> "queueName": "test-queue",
> "state": "RUNNING",
> 
>   },
>   {
> "type": [
>   "capacitySchedulerLeafQueueInfo"
> ],
> "capacity": 2.5,
> 
>   },
>   {
> "capacity": 25,
> 
> "state": "RUNNING",
> "queues": {
>   "queue": [
> {
>   "capacity": 6,
>   "state": "RUNNING",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 100,
> ...
>   }
> ]
>   },
>   
> },
> {
>   "capacity": 6,
>   ...
>   "state": "RUNNING",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 100,
> ...
>   }
> ]
>   },
>   ...
> },
> ...
>   ]
> },
> ...
>   }
> ]
>   }
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4560) Make scheduler error checking message more user friendly

2016-03-19 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197617#comment-15197617
 ] 

Ray Chiang commented on YARN-4560:
--

Thanks for the review everyone!  Thanks for the commit Karthik!

> Make scheduler error checking message more user friendly
> 
>
> Key: YARN-4560
> URL: https://issues.apache.org/jira/browse/YARN-4560
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.7.1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Trivial
>  Labels: supportability
> Fix For: 2.9.0
>
> Attachments: YARN-4560.001.patch
>
>
> If the YARN properties below are poorly configured:
> {code}
> yarn.scheduler.minimum-allocation-mb
> yarn.scheduler.maximum-allocation-mb
> {code}
> The error message that shows up in the RM is:
> {panel}
> 2016-01-07 14:47:03,711 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Invalid resource 
> scheduler memory allocation configuration, 
> yarn.scheduler.minimum-allocation-mb=-1, 
> yarn.scheduler.maximum-allocation-mb=-3, min should equal greater than 0, max 
> should be no smaller than min.
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.validateConf(FairScheduler.java:215)
> {panel}
> While it's technically correct, it's not very user friendly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4825) Remove redundant code in ClientRMService::listReservations

2016-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200746#comment-15200746
 ] 

Hadoop QA commented on YARN-4825:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 8s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 56s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 11s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
16s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 153m 7s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_74 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 

[jira] [Commented] (YARN-3926) Extend the YARN resource model for easier resource-type management and profiles

2016-03-19 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197811#comment-15197811
 ] 

Arun Suresh commented on YARN-3926:
---

bq. ..how this will affect on running apps
Agreed, might not be trivial. But my hunch is, if DRF works correctly, it 
should be equivalent to Cluster Capacity / Resource change (In the 
FairScheduler IIRC, a re-calculation of Queue and Application fair-shares are 
done)

> Extend the YARN resource model for easier resource-type management and 
> profiles
> ---
>
> Key: YARN-3926
> URL: https://issues.apache.org/jira/browse/YARN-3926
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Proposal for modifying resource model and profiles.pdf
>
>
> Currently, there are efforts to add support for various resource-types such 
> as disk(YARN-2139), network(YARN-2140), and  HDFS bandwidth(YARN-2681). These 
> efforts all aim to add support for a new resource type and are fairly 
> involved efforts. In addition, once support is added, it becomes harder for 
> users to specify the resources they need. All existing jobs have to be 
> modified, or have to use the minimum allocation.
> This ticket is a proposal to extend the YARN resource model to a more 
> flexible model which makes it easier to support additional resource-types. It 
> also considers the related aspect of “resource profiles” which allow users to 
> easily specify the various resources they need for any given container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4831) Recovered containers will be killed after NM stateful restart

2016-03-19 Thread Siqi Li (JIRA)
Siqi Li created YARN-4831:
-

 Summary: Recovered containers will be killed after NM stateful 
restart 
 Key: YARN-4831
 URL: https://issues.apache.org/jira/browse/YARN-4831
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode

2016-03-19 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198442#comment-15198442
 ] 

Daniel Templeton commented on YARN-2962:


Thanks for updating the patch, [~varun_saxena].

The overall approach appears faithful to discussion above.  One optimization I 
might consider is to only add the splits that actually exist to the 
{{rmAppRootHierarchies}} since I would assume that the common case will be to 
not use splits.

Implementation comments:

Could you also explain in the parameter description why one would want to 
change it from the default of 0 and how to know what a good split value would 
be?

{code}
  static final int NO_APPID_NODE_SPLIT = 0;
{code}

I'm not sure this constant adds anything.  I found it made the code harder to 
read than just hard-coding in 0.  At a minimum, the name could be improved.  
Took me a bit to realize you you weren't abbreviating "number" as "no".  At the 
absolute barest minimum, a comment to explain the constant would help.


{code}
  private final static class AppNodeSplitInfo {
private final String path;
private final int splitIndex;
AppNodeSplitInfo(String path, int splitIndex) {
  this.path = path;
  this.splitIndex = splitIndex;
}
public String getPath() {
  return path;
}
public int getSplitIndex() {
  return splitIndex;
}
  }
{code}

It may be just me, but for a private holding class like this, I don't think the 
accessors are needed.  Just access the member vars directly.


{code}
if (appIdNodeSplitIndex < 1 || appIdNodeSplitIndex > 4) {
  appIdNodeSplitIndex = NO_APPID_NODE_SPLIT;
}
{code}

This violates the Principle of Least Astonishment.  At least log a warning that 
you're not doing what the user said to.  I might even log it as an error.


{code}
rmAppRootHierarchies = new HashMap(5);
{code}

should be

{code}
rmAppRootHierarchies = new HashMap<>(5);
{code}


{code}
  if (alternatePathInfo == null) {
// Unexpected. Assume that app attempt has been deleted.
return;
  }
  appIdRemovePath = alternatePathInfo.getPath();
{code}

I'm not a fan of the if-return style of coding.  I'd rather you did:

{code}
  // Assume that app attempt has been deleted if the path info is null
  if (alternatePathInfo != null) {
appIdRemovePath = alternatePathInfo.getPath();
  }
{code}

and then wrap the tail of the method in {{if (appIdRemovePath != null) {}}.  
Same in {{removeApplicationStateInternal()}} and {{removeApplication()}}.

Please include messages in your @throws comments.


{code}
  /**
   * Deletes the path. Assumes that path exists.
   * @param path Path to be deleted.
   * @throws Exception
   */
  private void safeDeleteIfExists(final String path) throws Exception {
SafeTransaction transaction = new SafeTransaction();
transaction.delete(path);
transaction.commit();
  }

  /**
   * Deletes the path. Checks for existence of path as well.
   * @param path Path to be deleted.
   * @throws Exception
   */
   private void safeDelete(final String path) throws Exception {
 if (exists(path)) {
  safeDeleteIfExists(path);
 }
   }
{code}

What I see is that {{safeDelete()}} deletes the path if it exists, and 
{{safeDeleteIfExists()}} deletes the path blindly.  Might want to swap those 
method names.


{code}
  private AppNodeSplitInfo getAlternatePath(String appId) throws Exception {
for (Map.Entry entry : rmAppRootHierarchies.entrySet()) {
  // Look for other paths
  int splitIndex = entry.getKey();
  if (splitIndex != appIdNodeSplitIndex) {
String alternatePath =
getLeafAppIdNodePath(appId, entry.getValue(), splitIndex, false);
if (exists(alternatePath)) {
  return new AppNodeSplitInfo(alternatePath, splitIndex);
}
  }
}
return null;
  }
{code}

Näive question: what happens if an app happens to exist in more than one split? 
 I know that's not the expected case, but never underestimate the users...

I would also love to see some use of newlines to make the code a little more 
readable.

I would love to see javadoc comments on your test methods.

{code}
  HashMap attempts =
  new HashMap();
{code}

should be

{code}
  HashMap attempts = new HashMap<>();
{code}


Your assert methods should have some message text to explain what went wrong.

It would be really swell if those two long test methods had some more 
explanatory comments so that they're easier to understand.

> ZKRMStateStore: Limit the number of znodes under a znode
> 
>
> Key: YARN-2962
> URL: https://issues.apache.org/jira/browse/YARN-2962
>

[jira] [Resolved] (YARN-2048) List all of the containers of an application from the yarn web

2016-03-19 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla resolved YARN-2048.

Resolution: Duplicate

> List all of the containers of an application from the yarn web
> --
>
> Key: YARN-2048
> URL: https://issues.apache.org/jira/browse/YARN-2048
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, webapp
>Affects Versions: 2.3.0, 2.4.0, 2.5.0
>Reporter: Min Zhou
>Assignee: Xuan Gong
> Attachments: YARN-2048-trunk-v1.patch
>
>
> Currently, Yarn haven't provide a way to list all of the containers of an 
> application from its web. This kind of information is needed by the 
> application user. They can conveniently know how many containers their 
> applications already acquired as well as which nodes those containers were 
> launched on.  They also want to view the logs of each container of an 
> application.
> One approach is maintain a container list in RMAppImpl and expose this info 
> to Application page. I will submit a patch soon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-2048) List all of the containers of an application from the yarn web

2016-03-19 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reopened YARN-2048:


> List all of the containers of an application from the yarn web
> --
>
> Key: YARN-2048
> URL: https://issues.apache.org/jira/browse/YARN-2048
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, webapp
>Affects Versions: 2.3.0, 2.4.0, 2.5.0
>Reporter: Min Zhou
>Assignee: Xuan Gong
> Attachments: YARN-2048-trunk-v1.patch
>
>
> Currently, Yarn haven't provide a way to list all of the containers of an 
> application from its web. This kind of information is needed by the 
> application user. They can conveniently know how many containers their 
> applications already acquired as well as which nodes those containers were 
> launched on.  They also want to view the logs of each container of an 
> application.
> One approach is maintain a container list in RMAppImpl and expose this info 
> to Application page. I will submit a patch soon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started

2016-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15202474#comment-15202474
 ] 

Hadoop QA commented on YARN-4686:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 10s {color} 
| {color:red} YARN-4686 does not apply to branch-2.7. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12794247/YARN-4686-branch-2.7.006.patch
 |
| JIRA Issue | YARN-4686 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10820/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> MiniYARNCluster.start() returns before cluster is completely started
> 
>
> Key: YARN-4686
> URL: https://issues.apache.org/jira/browse/YARN-4686
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Rohith Sharma K S
>Assignee: Eric Badger
> Attachments: MAPREDUCE-6507.001.patch, 
> YARN-4686-branch-2.7.006.patch, YARN-4686.001.patch, YARN-4686.002.patch, 
> YARN-4686.003.patch, YARN-4686.004.patch, YARN-4686.005.patch, 
> YARN-4686.006.patch
>
>
> TestRMNMInfo fails intermittently. Below is trace for the failure
> {noformat}
> testRMNMInfo(org.apache.hadoop.mapreduce.v2.TestRMNMInfo)  Time elapsed: 0.28 
> sec  <<< FAILURE!
> java.lang.AssertionError: Unexpected number of live nodes: expected:<4> but 
> was:<3>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.mapreduce.v2.TestRMNMInfo.testRMNMInfo(TestRMNMInfo.java:111)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4829) Add support for binary units

2016-03-19 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200878#comment-15200878
 ] 

Varun Vasudev commented on YARN-4829:
-

[~asuresh] - can you take a look at the latest patch and commit it to branch 
YARN-3926 if it looks good? Thanks!

> Add support for binary units
> 
>
> Key: YARN-4829
> URL: https://issues.apache.org/jira/browse/YARN-4829
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4829-YARN-3926.001.patch, 
> YARN-4829-YARN-3926.002.patch, YARN-4829-YARN-3926.003.patch, 
> YARN-4829-YARN-3926.004.patch
>
>
> The units conversion util should have support for binary units.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4636) Make blacklist tracking policy pluggable for more extensions.

2016-03-19 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201206#comment-15201206
 ] 

Vinod Kumar Vavilapalli commented on YARN-4636:
---

-1 for something like this without understanding the use-cases. IMO, the "AM 
blacklisting" doesn't even need to be user-visible (YARN-4837) let alone be 
pluggable.

> Make blacklist tracking policy pluggable for more extensions.
> -
>
> Key: YARN-4636
> URL: https://issues.apache.org/jira/browse/YARN-4636
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Sunil G
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4390) Consider container request size during CS preemption

2016-03-19 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4390:
-
Attachment: YARN-4390.poc-WIP.1.patch

And uploaded a WIP POC patch if you're interested to see what's the patch will 
look like. (Need apply patch of YARN-4822 first). I haven't done any test yet, 
no guarantee that it can be compiled successfully.

Could you please share your thoughts about the proposal? [~eepayne], [~sunilg], 
[~curino].

> Consider container request size during CS preemption
> 
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.0.0, 2.8.0, 2.7.3
>Reporter: Eric Payne
>Assignee: Wangda Tan
> Attachments: YARN-4390-design.1.pdf, YARN-4390.poc-WIP.1.patch
>
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4837) User facing aspects of 'AM blacklisting' feature need fixing

2016-03-19 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-4837:
-

 Summary: User facing aspects of 'AM blacklisting' feature need 
fixing
 Key: YARN-4837
 URL: https://issues.apache.org/jira/browse/YARN-4837
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


Was reviewing the user-facing aspects that we are releasing as part of 2.8.0.

Looking at the 'AM blacklisting feature', I see several things to be fixed 
before we release it in 2.8.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4062) Add the flush and compaction functionality via coprocessors and scanners for flow run table

2016-03-19 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200414#comment-15200414
 ] 

Sangjin Lee commented on YARN-4062:
---

LGTM pending jenkins. I'll commit it once the jenkins comes back clean.

> Add the flush and compaction functionality via coprocessors and scanners for 
> flow run table
> ---
>
> Key: YARN-4062
> URL: https://issues.apache.org/jira/browse/YARN-4062
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4062-YARN-2928.04.patch, 
> YARN-4062-YARN-2928.05.patch, YARN-4062-YARN-2928.06.patch, 
> YARN-4062-YARN-2928.07.patch, YARN-4062-YARN-2928.08.patch, 
> YARN-4062-YARN-2928.09.patch, YARN-4062-YARN-2928.1.patch, 
> YARN-4062-feature-YARN-2928.01.patch, YARN-4062-feature-YARN-2928.02.patch, 
> YARN-4062-feature-YARN-2928.03.patch
>
>
> As part of YARN-3901, coprocessor and scanner is being added for storing into 
> the flow_run table. It also needs a flush & compaction processing in the 
> coprocessor and perhaps a new scanner to deal with the data during flushing 
> and compaction stages. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4820) ResourceManager web redirects in HA mode drops query parameters

2016-03-19 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197235#comment-15197235
 ] 

Varun Vasudev commented on YARN-4820:
-

[~steve_l] - sorry I didn't understand the case you mentioned. You're talking 
about a scenario where the active RM web services redirect you to a standby RM?

> ResourceManager web redirects in HA mode drops query parameters
> ---
>
> Key: YARN-4820
> URL: https://issues.apache.org/jira/browse/YARN-4820
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4820.001.patch
>
>
> The RMWebAppFilter redirects http requests from the standby to the active. 
> However it drops all the query parameters when it does the redirect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4823) Refactor the nested reservation id field in listReservation to simple string field

2016-03-19 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-4823:
-
Attachment: YARN-4823-v1.patch

Attaching a patch that refactors the nested reservation id field in 
listReservation to simple string field

> Refactor the nested reservation id field in listReservation to simple string 
> field
> --
>
> Key: YARN-4823
> URL: https://issues.apache.org/jira/browse/YARN-4823
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-4823-v1.patch
>
>
> The listReservation REST API returns a ReservationId field which has a nested 
> id field which is also called ReservationId. This JIRA proposes to rename the 
> nested field to a string as it's easier to read and moreover what the 
> update/delete APIs take in as input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4746) yarn web services should convert parse failures of appId to 400

2016-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201265#comment-15201265
 ] 

Hadoop QA commented on YARN-4746:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 41s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 15s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 35s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 7s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 23s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 29s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
32s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
34s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 12s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 44s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 8s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_74. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 16s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 53s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 20s {color} 
| {color:red} 

[jira] [Commented] (YARN-4502) Fix two AM containers get allocated when AM restart

2016-03-19 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200423#comment-15200423
 ] 

Vinod Kumar Vavilapalli commented on YARN-4502:
---

[~zxu] / [~djp]
bq. It looks like the implementation for 
AbstractYarnScheduler#getApplicationAttempt(ApplicationAttemptId 
applicationAttemptId) is also confusing.
This is by design - see YARN-1041 - we want to route all the events destined 
for AppAttempt *only* to the current attempt. We should just document this and 
move on.

> Fix two AM containers get allocated when AM restart
> ---
>
> Key: YARN-4502
> URL: https://issues.apache.org/jira/browse/YARN-4502
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-4502-20160114.txt, YARN-4502-20160212.txt
>
>
> Scenario : 
> * set yarn.resourcemanager.am.max-attempts = 2
> * start dshell application
> {code}
>  yarn  org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
> hadoop-yarn-applications-distributedshell-*.jar 
> -attempt_failures_validity_interval 6 -shell_command "sleep 150" 
> -num_containers 16
> {code}
> * Kill AM pid
> * Print container list for 2nd attempt
> {code}
> yarn container -list appattempt_1450825622869_0001_02
> INFO impl.TimelineClientImpl: Timeline service address: 
> http://xxx:port/ws/v1/timeline/
> INFO client.RMProxy: Connecting to ResourceManager at xxx/10.10.10.10:
> Total number of containers :2
> Container-Id Start Time Finish Time   
> StateHost   Node Http Address 
>LOG-URL
> container_e12_1450825622869_0001_02_02 Tue Dec 22 23:07:35 + 2015 
>   N/A RUNNINGxxx:25454   http://xxx:8042 
> http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_02/hrt_qa
> container_e12_1450825622869_0001_02_01 Tue Dec 22 23:07:34 + 2015 
>   N/A RUNNINGxxx:25454   http://xxx:8042 
> http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_01/hrt_qa
> {code}
> * look for new AM pid 
> Here, 2nd AM container was suppose to be started on  
> container_e12_1450825622869_0001_02_01. But AM was not launched on 
> container_e12_1450825622869_0001_02_01. It was in AQUIRED state. 
> On other hand, container_e12_1450825622869_0001_02_02 got the AM running. 
> Expected behavior: RM should not start 2 containers for starting AM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4837) User facing aspects of 'AM blacklisting' feature need fixing

2016-03-19 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201546#comment-15201546
 ] 

Sunil G commented on YARN-4837:
---

Thanks [~vinodkv] for pitching in.

YARN-2005 blacklists nodes if AM container launch failed due to DISK_FAILED. 
And after YARN-4284, blacklisting for am-container-failure is made for all 
container failure except PREEMPTED. There were few discussion on usecase 
aspects for this change.

If blacklisting (am container failure) feature is enabled in cluster level, all 
applications will be forced to comply the blacklisting rule. YARN-4389 had also 
an option to disable this feature from application end. Also it could control 
the threshold if its too strict (and vice versa). Yes, agreeing to your point 
and its early for user  to take blacklisting decisions w/o having much 
needed/useful information. But by seeing the current aggressive nature, this 
change was helping in skipping this feature.

Agreeing that this has to be a controllable feature without causing problems in 
a busy cluster. I think may be a time based purging solution can be ideal to 
allow same app to use the node again.

> User facing aspects of 'AM blacklisting' feature need fixing
> 
>
> Key: YARN-4837
> URL: https://issues.apache.org/jira/browse/YARN-4837
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> Was reviewing the user-facing aspects that we are releasing as part of 2.8.0.
> Looking at the 'AM blacklisting feature', I see several things to be fixed 
> before we release it in 2.8.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs

2016-03-19 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201146#comment-15201146
 ] 

Vinod Kumar Vavilapalli commented on YARN-2005:
---

-1 for backporting this, while I understand that the original feature-ask is 
useful for avoiding AM scheduling getting blocked, there are far too many 
issues with the feature as it is. Please see my comments on YARN-4576 and 
YARN-4837.

> Blacklisting support for scheduling AMs
> ---
>
> Key: YARN-2005
> URL: https://issues.apache.org/jira/browse/YARN-2005
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
> YARN-2005.003.patch, YARN-2005.004.patch, YARN-2005.005.patch, 
> YARN-2005.006.patch, YARN-2005.006.patch, YARN-2005.007.patch, 
> YARN-2005.008.patch, YARN-2005.009.patch
>
>
> It would be nice if the RM supported blacklisting a node for an AM launch 
> after the same node fails a configurable number of AM attempts.  This would 
> be similar to the blacklisting support for scheduling task attempts in the 
> MapReduce AM but for scheduling AM attempts on the RM side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4820) ResourceManager web redirects in HA mode drops query parameters

2016-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200374#comment-15200374
 ] 

Hadoop QA commented on YARN-4820:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
3s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 57s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 41s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 41s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 2s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 7s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 34s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_74. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 73m 41s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 44s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 302m 49s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK 

[jira] [Commented] (YARN-4595) Add support for configurable read-only mounts

2016-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200385#comment-15200385
 ] 

Hadoop QA commented on YARN-4595:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
50s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 5s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_74. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 40s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 33m 20s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12794016/YARN-4595.2.patch |
| JIRA Issue | YARN-4595 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux b1bf64ea3a7a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / dc951e6 |
| Default Java | 1.7.0_95 |
| Multi-JDK versions |  

[jira] [Created] (YARN-4828) Create a pull request template for github

2016-03-19 Thread Steve Loughran (JIRA)
Steve Loughran created YARN-4828:


 Summary: Create a pull request template for github
 Key: YARN-4828
 URL: https://issues.apache.org/jira/browse/YARN-4828
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: build
Affects Versions: 3.0.0
 Environment: github
Reporter: Steve Loughran
Priority: Minor


We're starting to see PRs appear without any JIRA, explanation etc. These are 
going to be ignored without them.

It's possible to [create a PR text 
template](https://help.github.com/articles/creating-a-pull-request-template-for-your-repository/)
 under {{.github/PULL_REQUEST_TEMPLATE}}

We can do such a template, which provides template summary points, such as:

* which JIRA
* if against an object store, how did you test it?
* if its a shell script, how did you test it?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1508) Document Dynamic Resource Configuration feature

2016-03-19 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1508:
-
Summary: Document Dynamic Resource Configuration feature  (was: Rename 
ResourceOption and document resource over-commitment cases)

> Document Dynamic Resource Configuration feature
> ---
>
> Key: YARN-1508
> URL: https://issues.apache.org/jira/browse/YARN-1508
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
>
> Per Vinod's comment in 
> YARN-312(https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087)
>  and Bikas' comment in 
> YARN-311(https://issues.apache.org/jira/browse/YARN-311?focusedCommentId=13848615),
>  the name of ResourceOption is not good enough for being understood. Also, we 
> need to document more on resource overcommitment time and use cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4820) ResourceManager web redirects in HA mode drops query parameters

2016-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197561#comment-15197561
 ] 

Hadoop QA commented on YARN-4820:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
42s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 43s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 7s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 40s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 18s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 28s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 35s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_74. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 73m 42s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 45s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | 

[jira] [Created] (YARN-4832) NM side resource value should get updated if change applied in RM side

2016-03-19 Thread Junping Du (JIRA)
Junping Du created YARN-4832:


 Summary: NM side resource value should get updated if change 
applied in RM side
 Key: YARN-4832
 URL: https://issues.apache.org/jira/browse/YARN-4832
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical


Now, if we execute CLI to update node (single or multiple) resource in RM side, 
NM will not receive any notification. It doesn't affect resource scheduling but 
will make resource usage metrics reported by NM a bit weird. We should sync up 
new resource between RM and NM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4711) NM is going down with NPE's due to single thread processing of events by Timeline client

2016-03-19 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197864#comment-15197864
 ] 

Naganarasimha G R commented on YARN-4711:
-

offline comments from [~sjlee0] : 
{quote}
 The current state of NM timeline integration seems to have quite a few rough 
edges. I did look at the exceptions and here are my early thoughts. I agree 
with your other points.

(1) NPE in {{NMTimelinePublisher$ContainerEventHandler}}
I understand that this happens because the event is handled after the container 
object was removed in the NM context, correct? As a rule, I think any attempt 
to retrieve objects from the NM context in the async event handler is 
inherently dangerous because there is no guarantee that those objects are still 
there in the context. So we should review the {{NMTimelinePublisher}} code to 
spot those cases. This is one of them.

What this event handler needs is the container's resource and priority. What I 
would suggest is to add the resource and priority into the event itself. I'm 
not sure if we need to subclass {{ContainerEvent}} for this purpose... Thoughts?

(2) NPE in {{NMTimelinePublisher.putEntity()}}
This is the other place in {{NMTimelinePublisher}} where it attempts to 
retrieve an object from the context, and it fails for a similar reason. My 
question when I looked at this is, who should own {{TimelineClient}}s? 
Currently they are owned by the individual {{ApplicationImpl}} instances. I'm 
not sure if we went back and forth on this, but if {{ApplicationImpl}} goes 
away but we still need to publish, there doesn't seem to be a way. Since it's 
really {{NMTimelinePublisher}} that needs the timeline clients, should they be 
owned and managed by {{NMTImelinePublisher}}? I know it might be a rather big 
change, but I'm not sure if there is any other way to resolve this.
{quote}

> NM is going down with NPE's due to single thread processing of events by 
> Timeline client
> 
>
> Key: YARN-4711
> URL: https://issues.apache.org/jira/browse/YARN-4711
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: 4711Analysis.txt
>
>
> After YARN-3367, while testing the latest 2928 branch came across few NPEs 
> due to which NM is shutting down.
> {code}
> 2016-02-21 23:19:54,078 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:306)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:296)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:213)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerFinishedEvent(NMTimelinePublisher.java:192)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.access$400(NMTimelinePublisher.java:63)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:289)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:280)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> On analysis found that the there was delay in processing of events, as after 
> YARN-3367 all the events were getting processed by a single thread inside the 
> timeline client. 
> Additionally found one scenario where there is possibility of NPE:
> * TimelineEntity.toString() when {{real}} is not null



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4768) getAvailablePhysicalMemorySize can be inaccurate on linux

2016-03-19 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199732#comment-15199732
 ] 

Nathan Roberts commented on YARN-4768:
--

Any comments on this approach?


> getAvailablePhysicalMemorySize can be inaccurate on linux
> -
>
> Key: YARN-4768
> URL: https://issues.apache.org/jira/browse/YARN-4768
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.7.2
> Environment: Linux
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4768.patch
>
>
> Algorithm currently uses "MemFree" + "Inactive" from /proc/meminfo
> "Inactive" may not be a very good indication of how much memory can be 
> readily freed because it contains both:
> - Pages mapped with MAP_SHARED|MAP_ANONYMOUS (regardless of whether they're 
> being actively accessed or not. Unclear to me why this is the case...)
> - Pages mapped MAP_PRIVATE|MAP_ANONYMOUS that have not been accessed recently
> Both of these types of pages probably shouldn't be considered "Available".
> "Inactive(file)" would seem more accurate but it's not available in all 
> kernel versions. To keep things simple, maybe just use "Inactive(file)" if 
> available, otherwise fallback to "Inactive".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4822) Refactor existing Preemption Policy of CS for easier adding new approach to select preemption candidates

2016-03-19 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4822:
-
Attachment: YARN-4822.1.patch

Attached ver.1 patch for review.

> Refactor existing Preemption Policy of CS for easier adding new approach to 
> select preemption candidates
> 
>
> Key: YARN-4822
> URL: https://issues.apache.org/jira/browse/YARN-4822
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4822.1.patch
>
>
> Currently, ProportionalCapacityPreemptionPolicy has hard coded logic to 
> select candidates to be preempted (based on FIFO order of 
> applications/containers). It's not a simple to add new candidate-selection 
> logics, such as preemption for large container, intra-queeu fairness/policy, 
> etc.
> In this JIRA, I propose to do following changes:
> 1) Cleanup code bases, consolidate current logic into 3 stages:
> - Compute ideal sharing of queues
> - Select to-be-preempt candidates
> - Send preemption/kill events to scheduler
> 2) Add a new interface: {{PreemptionCandidatesSelectionPolicy}} for above 
> "select to-be-preempt candidates" part. Move existing how to select 
> candidates logics to {{FifoPreemptionCandidatesSelectionPolicy}}. 
> 3) Allow multiple PreemptionCandidatesSelectionPolicies work together in a 
> chain. Preceding PreemptionCandidatesSelectionPolicy has higher priority to 
> select candidates, and later PreemptionCandidatesSelectionPolicy can make 
> decisions according to already selected candidates and pre-computed queue 
> ideal shares of resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4823) Refactor the nested reservation id field in listReservation to simple string field

2016-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200786#comment-15200786
 ] 

Hadoop QA commented on YARN-4823:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
44s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 36s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 58s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 152m 46s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_74 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12794057/YARN-4823-v1.patch |
| JIRA Issue | YARN-4823 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| 

[jira] [Commented] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started

2016-03-19 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198081#comment-15198081
 ] 

Eric Payne commented on YARN-4686:
--

Thanks, [~ebadger], for this patch.
+1 LGTM.

I will wait a day or so to give [~jlowe], [~kasha], and others to comment. Then 
will commit if no further concerns.

> MiniYARNCluster.start() returns before cluster is completely started
> 
>
> Key: YARN-4686
> URL: https://issues.apache.org/jira/browse/YARN-4686
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Rohith Sharma K S
>Assignee: Eric Badger
> Attachments: MAPREDUCE-6507.001.patch, YARN-4686.001.patch, 
> YARN-4686.002.patch, YARN-4686.003.patch, YARN-4686.004.patch, 
> YARN-4686.005.patch, YARN-4686.006.patch
>
>
> TestRMNMInfo fails intermittently. Below is trace for the failure
> {noformat}
> testRMNMInfo(org.apache.hadoop.mapreduce.v2.TestRMNMInfo)  Time elapsed: 0.28 
> sec  <<< FAILURE!
> java.lang.AssertionError: Unexpected number of live nodes: expected:<4> but 
> was:<3>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.mapreduce.v2.TestRMNMInfo.testRMNMInfo(TestRMNMInfo.java:111)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4831) Recovered containers will be killed after NM stateful restart

2016-03-19 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-4831:
--
Description: 
{code}
2016-03-04 19:43:48,130 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1456335621285_0040_01_66 transitioned from NEW to DONE
2016-03-04 19:43:48,130 INFO 
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=henkins-service   
   OPERATION=Container Finished - Killed   TARGET=ContainerImpl
RESULT=SUCCESS  APPID=application_1456335621285_0040
{code}

> Recovered containers will be killed after NM stateful restart 
> --
>
> Key: YARN-4831
> URL: https://issues.apache.org/jira/browse/YARN-4831
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>
> {code}
> 2016-03-04 19:43:48,130 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1456335621285_0040_01_66 transitioned from NEW to 
> DONE
> 2016-03-04 19:43:48,130 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=henkins-service 
>OPERATION=Container Finished - Killed   TARGET=ContainerImpl
> RESULT=SUCCESS  APPID=application_1456335621285_0040
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4517) [YARN-3368] Add nodes page

2016-03-19 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200211#comment-15200211
 ] 

Varun Saxena commented on YARN-4517:


Filed YARN-4835

> [YARN-3368] Add nodes page
> --
>
> Key: YARN-4517
> URL: https://issues.apache.org/jira/browse/YARN-4517
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Wangda Tan
>Assignee: Varun Saxena
>  Labels: webui
> Attachments: (21-Feb-2016)yarn-ui-screenshots.zip, 
> Screenshot_after_4709.png, Screenshot_after_4709_1.png, 
> YARN-4517-YARN-3368.01.patch, YARN-4517-YARN-3368.02.patch
>
>
> We need nodes page added to next generation web UI, similar to existing 
> RM/nodes page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-4390) Consider container request size during CS preemption

2016-03-19 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne reopened YARN-4390:
--
  Assignee: Wangda Tan  (was: Eric Payne)

{quote}
Since YARN-4108 doesn't solve all the issues. (I planned to solve this together 
with YARN-4108, but YARN-4108 only tackled half of the problem: when containers 
selected, only preempt useful containers). However, we need select container 
more clever based on requirement. I'm thinking about this recently and I plan 
to make some progresses as soon as possible. May I reopen this JIRA and take 
over from you?
{quote}
[~leftnoteasy], I had forgotten that we had closed this JIRA in favor of 
YARN-4108. Yes, I had noticed that the selection of containers to preempt in 
YARN-4108 do not actually consider the properties of the needed resources like 
size or locality. Even still, YARN-4108 is a big improvement and does prevent 
unnecessary preemption. However, you are correct, implementing this JIRA would 
eliminate some extra event passing and processing if killable containers are 
rejected over and over.

I am reopening and assigning to you.

> Consider container request size during CS preemption
> 
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.0.0, 2.8.0, 2.7.3
>Reporter: Eric Payne
>Assignee: Wangda Tan
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4746) yarn web services should convert parse failures of appId to 400

2016-03-19 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4746:
---
Attachment: 0003-YARN-4746.patch

[~ste...@apache.org]
Thanks you for review.Uploading patch with below changes
# testInvalidAppAttempts corrected to check invalid attempt earlier was 
checking invalid appId
# Moved parse validation for applicationID to WebAppUtil and re-factoring done
Please do review

> yarn web services should convert parse failures of appId to 400
> ---
>
> Key: YARN-4746
> URL: https://issues.apache.org/jira/browse/YARN-4746
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Priority: Minor
> Attachments: 0001-YARN-4746.patch, 0002-YARN-4746.patch, 
> 0003-YARN-4746.patch
>
>
> I'm seeing somewhere in the WS API tests of mine an error with exception 
> conversion of  a bad app ID sent in as an argument to a GET. I know it's in 
> ATS, but a scan of the core RM web services implies a same problem
> {{WebServices.parseApplicationId()}} uses {{ConverterUtils.toApplicationId}} 
> to convert an argument; this throws IllegalArgumentException, which is then 
> handled somewhere by jetty as a 500 error.
> In fact, it's a bad argument, which should be handled by returning a 400. 
> This can be done by catching the raised argument and explicitly converting it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4785) inconsistent value type of the "type" field for LeafQueueInfo in response of RM REST API - cluster/scheduler

2016-03-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200116#comment-15200116
 ] 

Hudson commented on YARN-4785:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9473 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9473/])
YARN-4785. inconsistent value type of the type field for LeafQueueInfo 
(junping_du: rev ca8106d2dd03458944303d93679daa03b1d82ad5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java


> inconsistent value type of the "type" field for LeafQueueInfo in response of 
> RM REST API - cluster/scheduler
> 
>
> Key: YARN-4785
> URL: https://issues.apache.org/jira/browse/YARN-4785
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.6.0
>Reporter: Jayesh
>Assignee: Varun Vasudev
>  Labels: REST_API
> Fix For: 2.8.0, 2.7.3, 2.6.5
>
> Attachments: YARN-4785.001.patch, YARN-4785.branch-2.6.001.patch, 
> YARN-4785.branch-2.7.001.patch
>
>
> I see inconsistent value type ( String and Array ) of the "type" field for 
> LeafQueueInfo in response of RM REST API - cluster/scheduler
> as per the spec it should be always String.
> here is the sample output ( removed non-relevant fields )
> {code}
> {
>   "scheduler": {
> "schedulerInfo": {
>   "type": "capacityScheduler",
>   "capacity": 100,
>   ...
>   "queueName": "root",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 0.1,
> 
>   },
>   {
> "type": [
>   "capacitySchedulerLeafQueueInfo"
> ],
> "capacity": 0.1,
> "queueName": "test-queue",
> "state": "RUNNING",
> 
>   },
>   {
> "type": [
>   "capacitySchedulerLeafQueueInfo"
> ],
> "capacity": 2.5,
> 
>   },
>   {
> "capacity": 25,
> 
> "state": "RUNNING",
> "queues": {
>   "queue": [
> {
>   "capacity": 6,
>   "state": "RUNNING",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 100,
> ...
>   }
> ]
>   },
>   
> },
> {
>   "capacity": 6,
>   ...
>   "state": "RUNNING",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 100,
> ...
>   }
> ]
>   },
>   ...
> },
> ...
>   ]
> },
> ...
>   }
> ]
>   }
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4576) Enhancement for tracking Blacklist in AM Launching

2016-03-19 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201139#comment-15201139
 ] 

Vinod Kumar Vavilapalli commented on YARN-4576:
---

Please read my comments on YARN-4837, this whole "AM blacklisting" feature is 
unnecessarily blown way out of proportion - we just don't need this amount of 
complexity. Adding more functionality like global lists (YARN-4635), per-user 
lists (YARN-4790), pluggable blacklisting ((!)) (YARN-4636) etc will makes 
things far worse.

Containers are marked DISKS_FAILED only if all the disks have become bad, in 
which case the node itself becomes unhealthy. So there is no need for 
blacklisting per app at all !!

If an AM is killed due to memory over-flow, blacklisting the node will not help 
at all!

Overall, like I commented on [the JIRA 
YARN-4790|https://issues.apache.org/jira/browse/YARN-4790?focusedCommentId=15191217=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15191217],
 what we need is to not penalize applications for system related issues. When 
YARN finds a node with configuration / permission issues, it should itself take 
an action to (a) avoid scheduling on that node, (b) alert administrators etc. 
Implementing heuristics for app / user level blacklisting to work-around 
platform problems should be a last-ditch effort. We did that in Hadoop 1 
MapReduce as we didn't have clear demarcation between app vs system failures. 
But that isn't the case with YARN - part of the reason why we never implemented 
heuristics based per-app blacklisting in YARN - we left that completely up to 
applications.





> Enhancement for tracking Blacklist in AM Launching
> --
>
> Key: YARN-4576
> URL: https://issues.apache.org/jira/browse/YARN-4576
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: EnhancementAMLaunchingBlacklist.pdf
>
>
> Before YARN-2005, YARN blacklist mechanism is to track the bad nodes by AM:  
> If AM tried to launch containers on a specific node get failed for several 
> times, AM will blacklist this node in future resource asking. This mechanism 
> works fine for normal containers. However, from our observation on behaviors 
> of several clusters: if this problematic node launch AM failed, then RM could 
> pickup this problematic node to launch next AM attempts again and again that 
> cause application failure in case other functional nodes are busy. In normal 
> case, the customized healthy checker script cannot be so sensitive to mark 
> node as unhealthy when one or two containers get launched failed. 
> After YARN-2005, we can have a BlacklistManager in each RMapp, so those nodes 
> who launching AM attempts failed for specific application before will get 
> blacklisted. To get rid of potential risks that all nodes being blacklisted 
> by BlacklistManager, a disable-failure-threshold is involved to stop adding 
> more nodes into blacklist if hit certain ratio already. 
> There are already some enhancements for this AM blacklist mechanism: 
> YARN-4284 is to address the more wider case for AM container get launched 
> failure and YARN-4389 tries to make configuration settings available for 
> change by App to meet app specific requirement. However, there are still 
> several gaps to address more scenarios:
> 1. We may need a global blacklist instead of each app maintain a separated 
> one. The reason is: AM could get more chance to fail if other AM get failed 
> before. A quick example is: in a busy cluster, all nodes are busy except two 
> problematic nodes: node a and node b, app1 already submit and get failed in 
> two AM attempts on a and b. app2 and other apps should wait for other busy 
> nodes rather than waste attempts on these two problematic nodes.
> 2. If AM container failure is recognized as global event instead app own 
> issue, we should consider the blacklist is not a permanent thing but with a 
> specific time window. 
> 3. We could have user defined black list polices to address more possible 
> cases and scenarios, so it reasonable to make blacklist policy pluggable.
> 4. For some test scenario, we could have whitelist mechanism for AM launching.
> 5. Some minor issues: it sounds like NM reconnect won't refresh blacklist so 
> far.
> Will try to address all issues here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-998) Persistent resource change during NM/RM restart

2016-03-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197345#comment-15197345
 ] 

Junping Du commented on YARN-998:
-

Thanks Jian for review and comments.
bq. DynamicResourceConfiguration(configuration, true), the second parameter is 
not needed because it’s always passing ‘true’;
Nice catch! Will remove it in next patch.

bq. instead of reload the config again, looks like we can just call 
resourceTrackerServce.set(newConf) to replace the config? newConfig is reloaded 
earlier in the same call path.
I thought of this before but my original concern is a bit risky to have an api 
to replace config with whatever come in. Will update it if this is not a valid 
concern.

> Persistent resource change during NM/RM restart
> ---
>
> Key: YARN-998
> URL: https://issues.apache.org/jira/browse/YARN-998
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-998-sample.patch, YARN-998-v1.patch
>
>
> When NM is restarted by plan or from a failure, previous dynamic resource 
> setting should be kept for consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4517) [YARN-3368] Add nodes page

2016-03-19 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199784#comment-15199784
 ] 

Sunil G commented on YARN-4517:
---

[~varun_saxena]
bq.Regarding node labels, we can add it in REST response indicating if labels 
are enabled or not. We can do this later because this would require another 
JIRA for REST changes

For this, I have some done changes and we can immediately know whether labels 
are in cluster (i am finishing up with node-label page now, will upload a patch 
soon). I will try and see whether we can make a unified patch for all REST 
changes needed for UI together. Will syncup with you offline and will share 
summary here.

bq.Maybe something like: "You have to ssh to the missing nodes' /xxx/ dir 
to look for the logs" 
I am +1 for giving more information's. From RM, we can get the node 
ip/hostname. Atleast we can give a relative patch for getting dir for logs (may 
be from available path from yarn-default.xml). User can changes, so message can 
be a possible suggestion.


> [YARN-3368] Add nodes page
> --
>
> Key: YARN-4517
> URL: https://issues.apache.org/jira/browse/YARN-4517
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Wangda Tan
>Assignee: Varun Saxena
>  Labels: webui
> Attachments: (21-Feb-2016)yarn-ui-screenshots.zip, 
> Screenshot_after_4709.png, Screenshot_after_4709_1.png, 
> YARN-4517-YARN-3368.01.patch, YARN-4517-YARN-3368.02.patch
>
>
> We need nodes page added to next generation web UI, similar to existing 
> RM/nodes page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list

2016-03-19 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198218#comment-15198218
 ] 

Daniel Templeton commented on YARN-4311:


I don't have any further comments. LGTM.

> Removing nodes from include and exclude lists will not remove them from 
> decommissioned nodes list
> -
>
> Key: YARN-4311
> URL: https://issues.apache.org/jira/browse/YARN-4311
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-4311-v1.patch, YARN-4311-v10.patch, 
> YARN-4311-v11.patch, YARN-4311-v11.patch, YARN-4311-v2.patch, 
> YARN-4311-v3.patch, YARN-4311-v4.patch, YARN-4311-v5.patch, 
> YARN-4311-v6.patch, YARN-4311-v7.patch, YARN-4311-v8.patch, YARN-4311-v9.patch
>
>
> In order to fully forget about a node, removing the node from include and 
> exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The 
> tricky part that [~jlowe] pointed out was the case when include lists are not 
> used, in that case we don't want the nodes to fall off if they are not active.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4829) Add support for binary units

2016-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200326#comment-15200326
 ] 

Hadoop QA commented on YARN-4829:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 27s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
42s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 49s 
{color} | {color:green} YARN-3926 passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s 
{color} | {color:green} YARN-3926 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
37s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s 
{color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
43s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s 
{color} | {color:green} YARN-3926 passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 19s 
{color} | {color:green} YARN-3926 passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 43s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 43s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 31s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 1 new + 
7 unchanged - 0 fixed = 8 total (was 7) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 14s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_74. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 5s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_74. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 15s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | 

[jira] [Commented] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started

2016-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197845#comment-15197845
 ] 

Hadoop QA commented on YARN-4686:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 49s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 10s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
49s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
58s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 50s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 31s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 1 new + 
30 unchanged - 0 fixed = 31 total (was 30) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 0s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 20s {color} 
| {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.8.0_74. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 18s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_74. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 33s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 30s {color} 
| {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.7.0_95. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | 

[jira] [Updated] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started

2016-03-19 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-4686:
--
Attachment: YARN-4686-branch-2.7.006.patch

[~eepayne] Attaching the branch-2.7 patch. It passed all of the tests locally 
on my machine. 

> MiniYARNCluster.start() returns before cluster is completely started
> 
>
> Key: YARN-4686
> URL: https://issues.apache.org/jira/browse/YARN-4686
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Rohith Sharma K S
>Assignee: Eric Badger
> Attachments: MAPREDUCE-6507.001.patch, 
> YARN-4686-branch-2.7.006.patch, YARN-4686.001.patch, YARN-4686.002.patch, 
> YARN-4686.003.patch, YARN-4686.004.patch, YARN-4686.005.patch, 
> YARN-4686.006.patch
>
>
> TestRMNMInfo fails intermittently. Below is trace for the failure
> {noformat}
> testRMNMInfo(org.apache.hadoop.mapreduce.v2.TestRMNMInfo)  Time elapsed: 0.28 
> sec  <<< FAILURE!
> java.lang.AssertionError: Unexpected number of live nodes: expected:<4> but 
> was:<3>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.mapreduce.v2.TestRMNMInfo.testRMNMInfo(TestRMNMInfo.java:111)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4785) inconsistent value type of the "type" field for LeafQueueInfo in response of RM REST API - cluster/scheduler

2016-03-19 Thread Jayesh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197781#comment-15197781
 ] 

Jayesh commented on YARN-4785:
--

+1 ( thanks for explaining the solution in code comment )

> inconsistent value type of the "type" field for LeafQueueInfo in response of 
> RM REST API - cluster/scheduler
> 
>
> Key: YARN-4785
> URL: https://issues.apache.org/jira/browse/YARN-4785
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.6.0
>Reporter: Jayesh
>Assignee: Varun Vasudev
>  Labels: REST_API
> Attachments: YARN-4785.001.patch
>
>
> I see inconsistent value type ( String and Array ) of the "type" field for 
> LeafQueueInfo in response of RM REST API - cluster/scheduler
> as per the spec it should be always String.
> here is the sample output ( removed non-relevant fields )
> {code}
> {
>   "scheduler": {
> "schedulerInfo": {
>   "type": "capacityScheduler",
>   "capacity": 100,
>   ...
>   "queueName": "root",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 0.1,
> 
>   },
>   {
> "type": [
>   "capacitySchedulerLeafQueueInfo"
> ],
> "capacity": 0.1,
> "queueName": "test-queue",
> "state": "RUNNING",
> 
>   },
>   {
> "type": [
>   "capacitySchedulerLeafQueueInfo"
> ],
> "capacity": 2.5,
> 
>   },
>   {
> "capacity": 25,
> 
> "state": "RUNNING",
> "queues": {
>   "queue": [
> {
>   "capacity": 6,
>   "state": "RUNNING",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 100,
> ...
>   }
> ]
>   },
>   
> },
> {
>   "capacity": 6,
>   ...
>   "state": "RUNNING",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 100,
> ...
>   }
> ]
>   },
>   ...
> },
> ...
>   ]
> },
> ...
>   }
> ]
>   }
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4839) ResourceManager deadlock between RMAppAttemptImpl and SchedulerApplicationAttempt

2016-03-19 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201578#comment-15201578
 ] 

Jason Lowe commented on YARN-4839:
--

Stack trace of the relevant threads:
{noformat}
"IPC Server handler 32 on 8030" #153 daemon prio=5 os_prio=0 
tid=0x7fb649603800 nid=0x20b1 waiting on condition [0x7fb5888d2000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00036de978f0> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getMasterContainer(RMAppAttemptImpl.java:779)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.pullNewlyAllocatedContainersAndNMTokens(SchedulerApplicationAttempt.java:467)
- locked <0x00032a106f00> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.getAllocation(FiCaSchedulerApp.java:278)
- locked <0x00032a106f00> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:1008)
- locked <0x00032a106f00> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
at 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:534)
- locked <0x000383ce08b0> (a 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
at 
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:608)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server.call(Server.java:2267)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:648)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:615)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2217)

[...]

"1413615244@qtp-1677286081-37" #40337 daemon prio=5 os_prio=0 
tid=0x7fb62c089800 nid=0x1b8d waiting for monitor entry [0x7fb5ca40e000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getResourceUsageReport(SchedulerApplicationAttempt.java:580)
- waiting to lock <0x00032a106f00> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getAppResourceUsageReport(AbstractYarnScheduler.java:267)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getApplicationResourceUsageReport(RMAppAttemptImpl.java:826)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:580)
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:815)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:457)
at sun.reflect.GeneratedMethodAccessor142.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at 

[jira] [Commented] (YARN-4751) In 2.7, Labeled queue usage not shown properly in capacity scheduler UI

2016-03-19 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197668#comment-15197668
 ] 

Sunil G commented on YARN-4751:
---

Yes, agreeing to the point that we have to aggregate and peek into multiple 
patches to get the functionality. If 2.7 doesnt need new features/enhancements 
for labels, we can bring in patches on use case basis. cc/[~wangda.tan]

> In 2.7, Labeled queue usage not shown properly in capacity scheduler UI
> ---
>
> Key: YARN-4751
> URL: https://issues.apache.org/jira/browse/YARN-4751
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.7.3
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: 2.7 CS UI No BarGraph.jpg, 
> YARH-4752-branch-2.7.001.patch, YARH-4752-branch-2.7.002.patch
>
>
> In 2.6 and 2.7, the capacity scheduler UI does not have the queue graphs 
> separated by partition. When applications are running on a labeled queue, no 
> color is shown in the bar graph, and several of the "Used" metrics are zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4836) [YARN-3368] Add AM related pages

2016-03-19 Thread Varun Saxena (JIRA)
Varun Saxena created YARN-4836:
--

 Summary: [YARN-3368] Add AM related pages
 Key: YARN-4836
 URL: https://issues.apache.org/jira/browse/YARN-4836
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: webapp
Reporter: Varun Saxena
Assignee: Varun Saxena






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4839) ResourceManager deadlock between RMAppAttemptImpl and SchedulerApplicationAttempt

2016-03-19 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201680#comment-15201680
 ] 

Jason Lowe commented on YARN-4839:
--

This appears to have been fixed as a side-effect of YARN-3361 which wasn't in 
the build that reproduced this issue.  That change updated getMasterContainer 
to avoid locking the RMAppAttemptImpl.

> ResourceManager deadlock between RMAppAttemptImpl and 
> SchedulerApplicationAttempt
> -
>
> Key: YARN-4839
> URL: https://issues.apache.org/jira/browse/YARN-4839
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Jason Lowe
>Priority: Blocker
>
> Hit a deadlock in the ResourceManager as one thread was holding the 
> SchedulerApplicationAttempt lock and trying to call 
> RMAppAttemptImpl.getMasterContainer while another thread had the 
> RMAppAttemptImpl lock and was trying to call 
> SchedulerApplicationAttempt.getResourceUsageReport.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4595) Add support for configurable read-only mounts

2016-03-19 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-4595:
-
Attachment: YARN-4595.2.patch

Rebased patch.

> Add support for configurable read-only mounts
> -
>
> Key: YARN-4595
> URL: https://issues.apache.org/jira/browse/YARN-4595
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
> Attachments: YARN-4595.1.patch, YARN-4595.2.patch
>
>
> Mounting files or directories from the host is one way of passing 
> configuration and other information into a docker container.  We could allow 
> the user to set a list of mounts in the environment of ContainerLaunchContext 
> (e.g. /dir1:/targetdir1,/dir2:/targetdir2).  These would be mounted read-only 
> to the specified target locations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2915) Enable YARN RM scale out via federation using multiple RM's

2016-03-19 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197956#comment-15197956
 ] 

Vinod Kumar Vavilapalli commented on YARN-2915:
---

One thing that occurred to me in an offline conversation with [~subru] and 
[~leftnoteasy] is about the modeling of queues and their shares in different 
sub-clusters.

As seems to be already proposed, it is very desirable to have a unified *logic 
queues* that are applicable across all sub-clusters.

With unified logical queues, looks like there are some proposals for ways of 
how resources can get sub-divided amongst different sub-clusters. But to me, 
they already map to an existing concept in YARN - *Node Partitions* / 
node-labels !

Essentially you have *one YARN cluster* -> *multiple sub-clusters* -> *each 
sub-cluster with multiple node-partitions*. This can further be extended to 
more levels. For e.g. we can unify rack also under the same concept.

The advantage of unifying this with node-partitions is that we can have
 - one single administrative view philosophy of sub-clusters, node-partitions, 
racks etc
 - unified configuration mechanisms: Today we support centralized and 
distributed node-partition mechanisms, exclusive / non-exclusive access etc.
 - unified queue-sharing models - today we already can assign X% of a 
node-partition to a queue. This way we can, again, reuse existing concepts, 
mental models and allocation policies - instead of creating specific policies 
for sub-cluster sharing like the user-based share that is proposed.

We will have to dig deeper into the details, but it seems to me that 
node-partition and sub-cluster are equivalence classes except for the fact that 
two sub-clusters report to two different RMs (physically / implementation wise) 
which isn't the case today with node-partitions.

Thoughts? /cc [~curino] [~chris.douglas]

> Enable YARN RM scale out via federation using multiple RM's
> ---
>
> Key: YARN-2915
> URL: https://issues.apache.org/jira/browse/YARN-2915
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Sriram Rao
>Assignee: Subru Krishnan
> Attachments: FEDERATION_CAPACITY_ALLOCATION_JIRA.pdf, 
> Federation-BoF.pdf, Yarn_federation_design_v1.pdf, federation-prototype.patch
>
>
> This is an umbrella JIRA that proposes to scale out YARN to support large 
> clusters comprising of tens of thousands of nodes.   That is, rather than 
> limiting a YARN managed cluster to about 4k in size, the proposal is to 
> enable the YARN managed cluster to be elastically scalable.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4838) TestLogAggregationService. testLocalFileDeletionOnDiskFull failed

2016-03-19 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-4838:


 Summary: TestLogAggregationService. 
testLocalFileDeletionOnDiskFull failed
 Key: YARN-4838
 URL: https://issues.apache.org/jira/browse/YARN-4838
 Project: Hadoop YARN
  Issue Type: Test
  Components: log-aggregation
Reporter: Haibo Chen


org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
testLocalFileDeletionOnDiskFull failed

java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertFalse(Assert.java:64)
at org.junit.Assert.assertFalse(Assert.java:74)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.verifyLocalFileDeletion(TestLogAggregationService.java:232)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLocalFileDeletionOnDiskFull(TestLogAggregationService.java:288)

The failure is caused by the time issue of DeletionService. DeletionService 
runs its only thread pool to delete files. When verfiyLocalFileDeletion() 
method checks file existence, it is possible that the FileDeletionTask has been 
executed by the thread pool in DeletionService.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4825) Remove redundant code in ClientRMService::listReservations

2016-03-19 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-4825:
-
Attachment: YARN-4825-v1.patch

Attaching a patch that removes redundant code.

Note: the patch does NOT have any test case modifications as it only removes 
duplicate code

> Remove redundant code in ClientRMService::listReservations
> --
>
> Key: YARN-4825
> URL: https://issues.apache.org/jira/browse/YARN-4825
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>Priority: Minor
> Attachments: YARN-4825-v1.patch
>
>
> We do the null check and parsing of ReservationId twice currently in 
> ClientRMService::listReservations. This happened due to parallel changes as 
> part of YARN-4340 and YARN-2575. This JIRA proposes cleaning up the redundant 
> code



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3926) Extend the YARN resource model for easier resource-type management and profiles

2016-03-19 Thread Lei Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197925#comment-15197925
 ] 

Lei Guo commented on YARN-3926:
---

Another topic related to rm-nm protocol is  constraint label. It's not a must 
to be considered in this Jira, but I'd like to raise it as I can see the design 
in this Jira may affect the constraint label one. 

The constraint label could be some server attribute reported by NM, it could be 
required to be predefined in RM, but if we can allow NM to define something not 
defined in RM, and then RM automatically add it in label repository, it will be 
great. for example, for OS version or JDK version, customer may prefer 
automatically added instead of adding label before use.

> Extend the YARN resource model for easier resource-type management and 
> profiles
> ---
>
> Key: YARN-3926
> URL: https://issues.apache.org/jira/browse/YARN-3926
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Proposal for modifying resource model and profiles.pdf
>
>
> Currently, there are efforts to add support for various resource-types such 
> as disk(YARN-2139), network(YARN-2140), and  HDFS bandwidth(YARN-2681). These 
> efforts all aim to add support for a new resource type and are fairly 
> involved efforts. In addition, once support is added, it becomes harder for 
> users to specify the resources they need. All existing jobs have to be 
> modified, or have to use the minimum allocation.
> This ticket is a proposal to extend the YARN resource model to a more 
> flexible model which makes it easier to support additional resource-types. It 
> also considers the related aspect of “resource profiles” which allow users to 
> easily specify the various resources they need for any given container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4815) ATS 1.5 timelineclinet impl try to create attempt directory for every event call

2016-03-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199611#comment-15199611
 ] 

Junping Du commented on YARN-4815:
--

The patch seems out of sync with trunk. [~xgong], can you rebase the patch 
against latest trunk? Thanks!

> ATS 1.5 timelineclinet impl try to create attempt directory for every event 
> call
> 
>
> Key: YARN-4815
> URL: https://issues.apache.org/jira/browse/YARN-4815
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4815.1.patch
>
>
> ATS 1.5 timelineclinet impl, try to create attempt directory for every event 
> call. Since per attempt only one call to create directory is enough, this is 
> causing perf issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >