[jira] [Commented] (YARN-4795) ContainerMetrics drops records

2016-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15202571#comment-15202571
 ] 

Hadoop QA commented on YARN-4795:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
47s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
51s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 23s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_74. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 9s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
20s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 34m 50s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12793202/YARN-4795.001.patch |
| JIRA Issue | YARN-4795 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 6bf5bc037e48 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 33239c9 |
| Default Java | 1.7.0_95 |
| Multi-JDK versions |  

[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2016-03-19 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15202592#comment-15202592
 ] 

Wangda Tan commented on YARN-4002:
--

[~rohithsharma], [~zhiguohong],
Thanks for working on this patch, generally looks good.
Do you think we need to acquire readlock in printConfiguredHosts and 
setDecomissionedNMsMetrics?

> make ResourceTrackerService.nodeHeartbeat more concurrent
> -
>
> Key: YARN-4002
> URL: https://issues.apache.org/jira/browse/YARN-4002
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Critical
> Attachments: 0001-YARN-4002.patch, YARN-4002-lockless-read.patch, 
> YARN-4002-rwlock.patch, YARN-4002-v0.patch
>
>
> We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By 
> design the method ResourceTrackerService.nodeHeartbeat should be concurrent 
> enough to scale for large clusters.
> But we have a "BIG" lock in NodesListManager.isValidNode which I think it's 
> unnecessary.
> First, the fields "includes" and "excludes" of HostsFileReader are only 
> updated on "refresh nodes".  All RPC threads handling node heartbeats are 
> only readers.  So RWLock could be used to  alow concurrent access by RPC 
> threads.
> Second, since he fields "includes" and "excludes" of HostsFileReader are 
> always updated by "reference assignment", which is atomic in Java, the reader 
> side lock could just be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4815) ATS 1.5 timelineclinet impl try to create attempt directory for every event call

2016-03-19 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198733#comment-15198733
 ] 

Li Lu commented on YARN-4815:
-

Fine... My concern is that we do not need to have separate caches for each use 
cases if they can be modeled by Guava. I'm fine with either ways. 

> ATS 1.5 timelineclinet impl try to create attempt directory for every event 
> call
> 
>
> Key: YARN-4815
> URL: https://issues.apache.org/jira/browse/YARN-4815
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4815.1.patch
>
>
> ATS 1.5 timelineclinet impl, try to create attempt directory for every event 
> call. Since per attempt only one call to create directory is enough, this is 
> causing perf issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started

2016-03-19 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200536#comment-15200536
 ] 

Karthik Kambatla commented on YARN-4686:


bq. Currently it is in the RM start method, but if you feel that it is better 
to put it in the MiniYARNCluster start method then we can add the check there.
I was just wondering if we needed to transition to active. I am fine with the 
check being at either place. We can stick to this if it keeps the code clean.

> MiniYARNCluster.start() returns before cluster is completely started
> 
>
> Key: YARN-4686
> URL: https://issues.apache.org/jira/browse/YARN-4686
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Rohith Sharma K S
>Assignee: Eric Badger
> Attachments: MAPREDUCE-6507.001.patch, YARN-4686.001.patch, 
> YARN-4686.002.patch, YARN-4686.003.patch, YARN-4686.004.patch, 
> YARN-4686.005.patch, YARN-4686.006.patch
>
>
> TestRMNMInfo fails intermittently. Below is trace for the failure
> {noformat}
> testRMNMInfo(org.apache.hadoop.mapreduce.v2.TestRMNMInfo)  Time elapsed: 0.28 
> sec  <<< FAILURE!
> java.lang.AssertionError: Unexpected number of live nodes: expected:<4> but 
> was:<3>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.mapreduce.v2.TestRMNMInfo.testRMNMInfo(TestRMNMInfo.java:111)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4746) yarn web services should convert parse failures of appId to 400

2016-03-19 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199400#comment-15199400
 ] 

Steve Loughran commented on YARN-4746:
--

looks reasonable.

Given all uses of the conversion in these pages now look like: 
{code}

try {
  id = ConverterUtils.toApplicationId(recordFactory, appId);
} catch (IllegalArgumentException e) {
  throw new BadRequestException(e);
}

{code}

could we factor this out into something which always does this for the web UIs?

> yarn web services should convert parse failures of appId to 400
> ---
>
> Key: YARN-4746
> URL: https://issues.apache.org/jira/browse/YARN-4746
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Priority: Minor
> Attachments: 0001-YARN-4746.patch, 0002-YARN-4746.patch
>
>
> I'm seeing somewhere in the WS API tests of mine an error with exception 
> conversion of  a bad app ID sent in as an argument to a GET. I know it's in 
> ATS, but a scan of the core RM web services implies a same problem
> {{WebServices.parseApplicationId()}} uses {{ConverterUtils.toApplicationId}} 
> to convert an argument; this throws IllegalArgumentException, which is then 
> handled somewhere by jetty as a 500 error.
> In fact, it's a bad argument, which should be handled by returning a 400. 
> This can be done by catching the raised argument and explicitly converting it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4820) ResourceManager web redirects in HA mode drops query parameters

2016-03-19 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197218#comment-15197218
 ] 

Steve Loughran commented on YARN-4820:
--

What about 307 redirects? Even if they aren't generated in any RM REST API yet, 
they might ... this filter should be ready for them

> ResourceManager web redirects in HA mode drops query parameters
> ---
>
> Key: YARN-4820
> URL: https://issues.apache.org/jira/browse/YARN-4820
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4820.001.patch
>
>
> The RMWebAppFilter redirects http requests from the standby to the active. 
> However it drops all the query parameters when it does the redirect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4835) [YARN-3368] REST API related changes for new Web UI

2016-03-19 Thread Varun Saxena (JIRA)
Varun Saxena created YARN-4835:
--

 Summary: [YARN-3368] REST API related changes for new Web UI
 Key: YARN-4835
 URL: https://issues.apache.org/jira/browse/YARN-4835
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: webapp
Reporter: Varun Saxena
Assignee: Varun Saxena






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4833) For Queue AccessControlException client retries multiple times on both RM

2016-03-19 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4833:
---
Description: 
Submit application to queue where ACL is enabled and submitted user is not  
having access. Client retries till failMaxattempt 10 times.

{noformat}
16/03/18 10:01:06 INFO retry.RetryInvocationHandler: Exception while invoking 
submitApplication of class ApplicationClientProtocolPBClientImpl over rm1. 
Trying to fail over immediately.
org.apache.hadoop.security.AccessControlException: User hdfs does not have 
permission to submit application_1458273884145_0001 to queue default
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:380)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:291)
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:618)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:252)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:483)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2360)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2356)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2356)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.instantiateIOException(RPCUtil.java:80)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:119)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.submitApplication(ApplicationClientProtocolPBClientImpl.java:272)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:257)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
at com.sun.proxy.$Proxy23.submitApplication(Unknown Source)
at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:261)
at 
org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:295)
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:301)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:244)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1359)
at 
org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
at 
org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:359)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at 
org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:367)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 

[jira] [Commented] (YARN-4825) Remove redundant code in ClientRMService::listReservations

2016-03-19 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200854#comment-15200854
 ] 

Subru Krishnan commented on YARN-4825:
--

The test case failures are consistent and unrelated and are covered in YARN-4478

> Remove redundant code in ClientRMService::listReservations
> --
>
> Key: YARN-4825
> URL: https://issues.apache.org/jira/browse/YARN-4825
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>Priority: Minor
> Attachments: YARN-4825-v1.patch
>
>
> We do the null check and parsing of ReservationId twice currently in 
> ClientRMService::listReservations. This happened due to parallel changes as 
> part of YARN-4340 and YARN-2575. This JIRA proposes cleaning up the redundant 
> code



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4712) CPU Usage Metric is not captured properly in YARN-2928

2016-03-19 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201868#comment-15201868
 ] 

Varun Saxena commented on YARN-4712:


I have committed this to branch YARN-2928.
Thanks [~Naganarasimha] for your contribution.
And thanks [~sjlee0], [~djp] and [~sunilg] for reviews.

> CPU Usage Metric is not captured properly in YARN-2928
> --
>
> Key: YARN-4712
> URL: https://issues.apache.org/jira/browse/YARN-4712
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Fix For: YARN-2928
>
> Attachments: YARN-4712-YARN-2928.v1.001.patch, 
> YARN-4712-YARN-2928.v1.002.patch, YARN-4712-YARN-2928.v1.003.patch, 
> YARN-4712-YARN-2928.v1.004.patch, YARN-4712-YARN-2928.v1.005.patch, 
> YARN-4712-YARN-2928.v1.006.patch
>
>
> There are 2 issues with CPU usage collection 
> * I was able to observe that that many times CPU usage got from 
> {{pTree.getCpuUsagePercent()}} is 
> ResourceCalculatorProcessTree.UNAVAILABLE(i.e. -1) but ContainersMonitor do 
> the calculation  i.e. {{cpuUsageTotalCoresPercentage = cpuUsagePercentPerCore 
> /resourceCalculatorPlugin.getNumProcessors()}} because of which UNAVAILABLE 
> check in {{NMTimelinePublisher.reportContainerResourceUsage}} is not 
> encountered. so proper checks needs to be handled
> * {{EntityColumnPrefix.METRIC}} uses always LongConverter but 
> ContainerMonitor is publishing decimal values for the CPU usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4822) Refactor existing Preemption Policy of CS for easier adding new approach to select preemption candidates

2016-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200526#comment-15200526
 ] 

Hadoop QA commented on YARN-4822:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
36s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 2m 54s {color} 
| {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_74
 with JDK v1.8.0_74 generated 1 new + 1 unchanged - 1 fixed = 2 total (was 2) 
{color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 39 new + 30 unchanged - 29 fixed = 69 total (was 59) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 9s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 12s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 153m 25s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_74 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_95 Failed junit tests | 

[jira] [Commented] (YARN-4820) ResourceManager web redirects in HA mode drops query parameters

2016-03-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199449#comment-15199449
 ] 

Junping Du commented on YARN-4820:
--

Hi [~ste...@apache.org], I think the redirect logic now is return 307 now:
{code}
response.setStatus(HttpServletResponse.SC_TEMPORARY_REDIRECT);
{code}
Do I miss something here?


> ResourceManager web redirects in HA mode drops query parameters
> ---
>
> Key: YARN-4820
> URL: https://issues.apache.org/jira/browse/YARN-4820
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4820.001.patch
>
>
> The RMWebAppFilter redirects http requests from the standby to the active. 
> However it drops all the query parameters when it does the redirect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4108) CapacityScheduler: Improve preemption to only kill containers that would satisfy the incoming request

2016-03-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198600#comment-15198600
 ] 

Hudson commented on YARN-4108:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9470 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9470/])
YARN-4108. CapacityScheduler: Improve preemption to only kill containers 
(wangda: rev ae14e5d07f1b6702a5160637438028bb03d9387e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationPriority.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/preemption/KillableContainer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSAssignment.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicyForNodePartitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/preemption/PreemptableQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/preemption/PreemptionManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerPreemption.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerPreemption.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/PreemptableResourceScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 

[jira] [Commented] (YARN-4576) Enhancement for tracking Blacklist in AM Launching

2016-03-19 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15202664#comment-15202664
 ] 

Varun Vasudev commented on YARN-4576:
-

bq. No. DISKS_FAILED mark on bad disks are transient status for the node. Take 
an example, if 
"yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" 
is set to 90% (by default), another job (no matter YARN or not) writing some 
files and deleting them afterwards back-and-forth - if disks usage for the node 
is just happen to be around 90%, it make NM's healthy status report to RM 
between healthy and unhealthy back and forth. Blacklist of AM launching can 
evaluate history record to decide a better place to launch AM. The bar for 
launching normal containers could be different or we could end up with so less 
choice.

Couple of points here -
# Specifically to the disks failed issue, there is now support for a watermark 
to avoid the issue you described - YARN-3943.
# To the more general point of nodes switching back and forth from good to bad 
and back - the better solution would be to have the RM detect bouncing nodes 
and then not to allocate new containers to bouncing nodes until they stabilize.


> Enhancement for tracking Blacklist in AM Launching
> --
>
> Key: YARN-4576
> URL: https://issues.apache.org/jira/browse/YARN-4576
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: EnhancementAMLaunchingBlacklist.pdf
>
>
> Before YARN-2005, YARN blacklist mechanism is to track the bad nodes by AM:  
> If AM tried to launch containers on a specific node get failed for several 
> times, AM will blacklist this node in future resource asking. This mechanism 
> works fine for normal containers. However, from our observation on behaviors 
> of several clusters: if this problematic node launch AM failed, then RM could 
> pickup this problematic node to launch next AM attempts again and again that 
> cause application failure in case other functional nodes are busy. In normal 
> case, the customized healthy checker script cannot be so sensitive to mark 
> node as unhealthy when one or two containers get launched failed. 
> After YARN-2005, we can have a BlacklistManager in each RMapp, so those nodes 
> who launching AM attempts failed for specific application before will get 
> blacklisted. To get rid of potential risks that all nodes being blacklisted 
> by BlacklistManager, a disable-failure-threshold is involved to stop adding 
> more nodes into blacklist if hit certain ratio already. 
> There are already some enhancements for this AM blacklist mechanism: 
> YARN-4284 is to address the more wider case for AM container get launched 
> failure and YARN-4389 tries to make configuration settings available for 
> change by App to meet app specific requirement. However, there are still 
> several gaps to address more scenarios:
> 1. We may need a global blacklist instead of each app maintain a separated 
> one. The reason is: AM could get more chance to fail if other AM get failed 
> before. A quick example is: in a busy cluster, all nodes are busy except two 
> problematic nodes: node a and node b, app1 already submit and get failed in 
> two AM attempts on a and b. app2 and other apps should wait for other busy 
> nodes rather than waste attempts on these two problematic nodes.
> 2. If AM container failure is recognized as global event instead app own 
> issue, we should consider the blacklist is not a permanent thing but with a 
> specific time window. 
> 3. We could have user defined black list polices to address more possible 
> cases and scenarios, so it reasonable to make blacklist policy pluggable.
> 4. For some test scenario, we could have whitelist mechanism for AM launching.
> 5. Some minor issues: it sounds like NM reconnect won't refresh blacklist so 
> far.
> Will try to address all issues here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4834) ProcfsBasedProcessTree doesn't track daemonized processes

2016-03-19 Thread Nathan Roberts (JIRA)
Nathan Roberts created YARN-4834:


 Summary: ProcfsBasedProcessTree doesn't track daemonized processes
 Key: YARN-4834
 URL: https://issues.apache.org/jira/browse/YARN-4834
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.2, 3.0.0
Reporter: Nathan Roberts
Assignee: Nathan Roberts


Currently the algorithm uses ppid from /proc//stat which can be 1 if a 
child process has daemonized itself. This causes potentially large processes 
from not being monitored. 

session id might be a better choice since that's what we use to signal the 
container during teardown. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4743) ResourceManager crash because TimSort

2016-03-19 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-4743:
--

Assignee: Yufei Gu

> ResourceManager crash because TimSort
> -
>
> Key: YARN-4743
> URL: https://issues.apache.org/jira/browse/YARN-4743
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.4
>Reporter: Zephyr Guo
>Assignee: Yufei Gu
>
> {code}
> 2016-02-26 14:08:50,821 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>  at java.util.TimSort.mergeHi(TimSort.java:868)
>  at java.util.TimSort.mergeAt(TimSort.java:485)
>  at java.util.TimSort.mergeCollapse(TimSort.java:410)
>  at java.util.TimSort.sort(TimSort.java:214)
>  at java.util.TimSort.sort(TimSort.java:173)
>  at java.util.Arrays.sort(Arrays.java:659)
>  at java.util.Collections.sort(Collections.java:217)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:316)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:240)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1091)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:989)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1185)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
>  at java.lang.Thread.run(Thread.java:745)
> 2016-02-26 14:08:50,822 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {code}
> Actually, this issue found in 2.6.0-cdh5.4.7.
> I think the cause is that we modify {{Resouce}} while we are sorting 
> {{runnableApps}}.
> {code:title=FSLeafQueue.java}
> Comparator comparator = policy.getComparator();
> writeLock.lock();
> try {
>   Collections.sort(runnableApps, comparator);
> } finally {
>   writeLock.unlock();
> }
> readLock.lock();
> {code}
> {code:title=FairShareComparator}
> public int compare(Schedulable s1, Schedulable s2) {
> ..
>   s1.getResourceUsage(), minShare1);
>   boolean s2Needy = Resources.lessThan(RESOURCE_CALCULATOR, null,
>   s2.getResourceUsage(), minShare2);
>   minShareRatio1 = (double) s1.getResourceUsage().getMemory()
>   / Resources.max(RESOURCE_CALCULATOR, null, minShare1, 
> ONE).getMemory();
>   minShareRatio2 = (double) s2.getResourceUsage().getMemory()
>   / Resources.max(RESOURCE_CALCULATOR, null, minShare2, 
> ONE).getMemory();
> ..
> {code}
> {{getResourceUsage}} will return current Resource. The current Resource is 
> unstable. 
> {code:title=FSAppAttempt.java}
> @Override
>   public Resource getResourceUsage() {
> // Here the getPreemptedResources() always return zero, except in
> // a preemption round
> return Resources.subtract(getCurrentConsumption(), 
> getPreemptedResources());
>   }
> {code}
> {code:title=SchedulerApplicationAttempt}
>  public Resource getCurrentConsumption() {
> return currentConsumption;
>   }
> // This method may modify current Resource.
> public synchronized void recoverContainer(RMContainer rmContainer) {
> ..
> Resources.addTo(currentConsumption, rmContainer.getContainer()
>   .getResource());
> ..
>   }
> {code}
> I suggest that use stable Resource in comparator.
> Is there something i think wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3926) Extend the YARN resource model for easier resource-type management and profiles

2016-03-19 Thread Lei Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197925#comment-15197925
 ] 

Lei Guo commented on YARN-3926:
---

Another topic related to rm-nm protocol is  constraint label. It's not a must 
to be considered in this Jira, but I'd like to raise it as I can see the design 
in this Jira may affect the constraint label one. 

The constraint label could be some server attribute reported by NM, it could be 
required to be predefined in RM, but if we can allow NM to define something not 
defined in RM, and then RM automatically add it in label repository, it will be 
great. for example, for OS version or JDK version, customer may prefer 
automatically added instead of adding label before use.

> Extend the YARN resource model for easier resource-type management and 
> profiles
> ---
>
> Key: YARN-3926
> URL: https://issues.apache.org/jira/browse/YARN-3926
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Proposal for modifying resource model and profiles.pdf
>
>
> Currently, there are efforts to add support for various resource-types such 
> as disk(YARN-2139), network(YARN-2140), and  HDFS bandwidth(YARN-2681). These 
> efforts all aim to add support for a new resource type and are fairly 
> involved efforts. In addition, once support is added, it becomes harder for 
> users to specify the resources they need. All existing jobs have to be 
> modified, or have to use the minimum allocation.
> This ticket is a proposal to extend the YARN resource model to a more 
> flexible model which makes it easier to support additional resource-types. It 
> also considers the related aspect of “resource profiles” which allow users to 
> easily specify the various resources they need for any given container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4825) Remove redundant code in ClientRMService::listReservations

2016-03-19 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-4825:
-
Attachment: YARN-4825-v1.patch

Attaching a patch that removes redundant code.

Note: the patch does NOT have any test case modifications as it only removes 
duplicate code

> Remove redundant code in ClientRMService::listReservations
> --
>
> Key: YARN-4825
> URL: https://issues.apache.org/jira/browse/YARN-4825
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>Priority: Minor
> Attachments: YARN-4825-v1.patch
>
>
> We do the null check and parsing of ReservationId twice currently in 
> ClientRMService::listReservations. This happened due to parallel changes as 
> part of YARN-4340 and YARN-2575. This JIRA proposes cleaning up the redundant 
> code



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4838) TestLogAggregationService. testLocalFileDeletionOnDiskFull failed

2016-03-19 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-4838:


 Summary: TestLogAggregationService. 
testLocalFileDeletionOnDiskFull failed
 Key: YARN-4838
 URL: https://issues.apache.org/jira/browse/YARN-4838
 Project: Hadoop YARN
  Issue Type: Test
  Components: log-aggregation
Reporter: Haibo Chen


org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
testLocalFileDeletionOnDiskFull failed

java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertFalse(Assert.java:64)
at org.junit.Assert.assertFalse(Assert.java:74)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.verifyLocalFileDeletion(TestLogAggregationService.java:232)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLocalFileDeletionOnDiskFull(TestLogAggregationService.java:288)

The failure is caused by the time issue of DeletionService. DeletionService 
runs its only thread pool to delete files. When verfiyLocalFileDeletion() 
method checks file existence, it is possible that the FileDeletionTask has been 
executed by the thread pool in DeletionService.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2915) Enable YARN RM scale out via federation using multiple RM's

2016-03-19 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197956#comment-15197956
 ] 

Vinod Kumar Vavilapalli commented on YARN-2915:
---

One thing that occurred to me in an offline conversation with [~subru] and 
[~leftnoteasy] is about the modeling of queues and their shares in different 
sub-clusters.

As seems to be already proposed, it is very desirable to have a unified *logic 
queues* that are applicable across all sub-clusters.

With unified logical queues, looks like there are some proposals for ways of 
how resources can get sub-divided amongst different sub-clusters. But to me, 
they already map to an existing concept in YARN - *Node Partitions* / 
node-labels !

Essentially you have *one YARN cluster* -> *multiple sub-clusters* -> *each 
sub-cluster with multiple node-partitions*. This can further be extended to 
more levels. For e.g. we can unify rack also under the same concept.

The advantage of unifying this with node-partitions is that we can have
 - one single administrative view philosophy of sub-clusters, node-partitions, 
racks etc
 - unified configuration mechanisms: Today we support centralized and 
distributed node-partition mechanisms, exclusive / non-exclusive access etc.
 - unified queue-sharing models - today we already can assign X% of a 
node-partition to a queue. This way we can, again, reuse existing concepts, 
mental models and allocation policies - instead of creating specific policies 
for sub-cluster sharing like the user-based share that is proposed.

We will have to dig deeper into the details, but it seems to me that 
node-partition and sub-cluster are equivalence classes except for the fact that 
two sub-clusters report to two different RMs (physically / implementation wise) 
which isn't the case today with node-partitions.

Thoughts? /cc [~curino] [~chris.douglas]

> Enable YARN RM scale out via federation using multiple RM's
> ---
>
> Key: YARN-2915
> URL: https://issues.apache.org/jira/browse/YARN-2915
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Sriram Rao
>Assignee: Subru Krishnan
> Attachments: FEDERATION_CAPACITY_ALLOCATION_JIRA.pdf, 
> Federation-BoF.pdf, Yarn_federation_design_v1.pdf, federation-prototype.patch
>
>
> This is an umbrella JIRA that proposes to scale out YARN to support large 
> clusters comprising of tens of thousands of nodes.   That is, rather than 
> limiting a YARN managed cluster to about 4k in size, the proposal is to 
> enable the YARN managed cluster to be elastically scalable.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4595) Add support for configurable read-only mounts

2016-03-19 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-4595:
-
Attachment: YARN-4595.2.patch

Rebased patch.

> Add support for configurable read-only mounts
> -
>
> Key: YARN-4595
> URL: https://issues.apache.org/jira/browse/YARN-4595
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
> Attachments: YARN-4595.1.patch, YARN-4595.2.patch
>
>
> Mounting files or directories from the host is one way of passing 
> configuration and other information into a docker container.  We could allow 
> the user to set a list of mounts in the environment of ContainerLaunchContext 
> (e.g. /dir1:/targetdir1,/dir2:/targetdir2).  These would be mounted read-only 
> to the specified target locations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4839) ResourceManager deadlock between RMAppAttemptImpl and SchedulerApplicationAttempt

2016-03-19 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201680#comment-15201680
 ] 

Jason Lowe commented on YARN-4839:
--

This appears to have been fixed as a side-effect of YARN-3361 which wasn't in 
the build that reproduced this issue.  That change updated getMasterContainer 
to avoid locking the RMAppAttemptImpl.

> ResourceManager deadlock between RMAppAttemptImpl and 
> SchedulerApplicationAttempt
> -
>
> Key: YARN-4839
> URL: https://issues.apache.org/jira/browse/YARN-4839
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Jason Lowe
>Priority: Blocker
>
> Hit a deadlock in the ResourceManager as one thread was holding the 
> SchedulerApplicationAttempt lock and trying to call 
> RMAppAttemptImpl.getMasterContainer while another thread had the 
> RMAppAttemptImpl lock and was trying to call 
> SchedulerApplicationAttempt.getResourceUsageReport.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4837) User facing aspects of 'AM blacklisting' feature need fixing

2016-03-19 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200490#comment-15200490
 ] 

Vinod Kumar Vavilapalli commented on YARN-4837:
---

Here are my concerns
 - First up the feature isn't 'AM blacklisting' - we are not blacklisting AMs. 
The goal is for the system to not schedule AMs on faulty nodes. The right 
solution is to identify why we keep launching on bad-nodes instead of marking 
them unhealthy - but I can see why a blacklist threshold is useful when we 
*simply don't know*.
 - The configurations are all named yarn.am.blacklisting even though they 
should be under a yarn.resourcemanager hierarchy
 - We just blindly add a node to the app's blacklist even if we just hit *one* 
AM failure. And the error / exit-code doesn't matter at all.
 - Irrespective of all that, I actually don't see why we should already expose 
this to end-users i.e the whole premise of YARN-4389. Why should an app 
specifically care "the number of nodes YARN blacklists for my AM container 
launch"?

I'm digging into the feature more for a careful look.

/cc
 - [~adhoot], [~jlowe], [~kasha] who were involved with YARN-2005 for the 
naming changes
 - [~sunilg] / [~djp] who worked on YARN-4389.

While we discuss this, I think we should take the private feature before 2.8.0 
goes out.

> User facing aspects of 'AM blacklisting' feature need fixing
> 
>
> Key: YARN-4837
> URL: https://issues.apache.org/jira/browse/YARN-4837
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> Was reviewing the user-facing aspects that we are releasing as part of 2.8.0.
> Looking at the 'AM blacklisting feature', I see several things to be fixed 
> before we release it in 2.8.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4829) Add support for binary units

2016-03-19 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-4829:

Attachment: YARN-4829-YARN-3926.003.patch

Good catch. Updated both of them.

> Add support for binary units
> 
>
> Key: YARN-4829
> URL: https://issues.apache.org/jira/browse/YARN-4829
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4829-YARN-3926.001.patch, 
> YARN-4829-YARN-3926.002.patch, YARN-4829-YARN-3926.003.patch
>
>
> The units conversion util should have support for binary units.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4790) Per user blacklist node for user specific error for container launch failure.

2016-03-19 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201156#comment-15201156
 ] 

Vinod Kumar Vavilapalli commented on YARN-4790:
---

bq. when enabling LinuxContainerExecutor, but some node doesn't have such user 
exists
As for the root-cause reported on this JIRA, this invalidates our fundamental 
assumptions of LinuxContainerExecutor. We assume that user-accounts 
corresponding to all job-submitters are present on all the machines. If not it 
is a gross misconfiguration of the system, and should be handled by lower 
layers like installers / management systems.

If it is really deemed that we should support different user-accounts on 
different hosts (for whatever reason), then the right way to look at solving 
that problem is by recognizing user-accounts as a resource on each host - kind 
of like node-constraints. Blacklisting that node for an app is absolutely the 
wrong way to go about it.

> Per user blacklist node for user specific error for container launch failure.
> -
>
> Key: YARN-4790
> URL: https://issues.apache.org/jira/browse/YARN-4790
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications
>Reporter: Junping Du
>Assignee: Junping Du
>
> There are some user specific error for container launch failure, like:
> when enabling LinuxContainerExecutor, but some node doesn't have such user 
> exists, so container launch should get failed with following information:
> {noformat}
> 2016-02-14 15:37:03,111 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1434045496283_0036_02 State change from LAUNCHED to FAILED 
> 2016-02-14 15:37:03,111 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application 
> application_1434045496283_0036 failed 2 times due to AM Container for 
> appattempt_1434045496283_0036_02 exited with exitCode: -1000 due to: 
> Application application_1434045496283_0036 initialization failed 
> (exitCode=255) with output: User jdu not found 
> {noformat}
> Obviously, this node is not suitable for launching container for this user's 
> other applications. We need a per user blacklist track mechanism rather than 
> per application now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2016-03-19 Thread Yi Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197300#comment-15197300
 ] 

Yi Zhou commented on YARN-796:
--

Thanks [~Naganarasimha]

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, 
> Node-labels-Requirements-Design-doc-V2.pdf, 
> Non-exclusive-Node-Partition-Design.pdf, YARN-796-Diagram.pdf, 
> YARN-796.node-label.consolidate.1.patch, 
> YARN-796.node-label.consolidate.10.patch, 
> YARN-796.node-label.consolidate.11.patch, 
> YARN-796.node-label.consolidate.12.patch, 
> YARN-796.node-label.consolidate.13.patch, 
> YARN-796.node-label.consolidate.14.patch, 
> YARN-796.node-label.consolidate.2.patch, 
> YARN-796.node-label.consolidate.3.patch, 
> YARN-796.node-label.consolidate.4.patch, 
> YARN-796.node-label.consolidate.5.patch, 
> YARN-796.node-label.consolidate.6.patch, 
> YARN-796.node-label.consolidate.7.patch, 
> YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
> YARN-796.patch, YARN-796.patch4
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4785) inconsistent value type of the "type" field for LeafQueueInfo in response of RM REST API - cluster/scheduler

2016-03-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197388#comment-15197388
 ] 

Junping Du commented on YARN-4785:
--

Thanks [~vvasudev] for explanation. I verified it in branch-2.7 that the test 
without main code change will failed with below failures:
{noformat}
testClusterScheduler(org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched)
  Time elapsed: 0.463 sec  <<< FAILURE!
org.junit.ComparisonFailure: "type" field is incorrect 
expected:<[capacitySchedulerLeafQueueInfo]> but 
was:<[["capacitySchedulerLeafQueueInfo"]]>
at org.junit.Assert.assertEquals(Assert.java:115)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched.verifySubQueue(TestRMWebServicesCapacitySched.java:378)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched.verifySubQueue(TestRMWebServicesCapacitySched.java:375)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched.verifyClusterScheduler(TestRMWebServicesCapacitySched.java:331)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched.testClusterScheduler(TestRMWebServicesCapacitySched.java:179)
{noformat}
The test failure reported by Jenkins is not related and we already saw it many 
times (like in YARN-998). 
So +1. Patch LGTM. Will wait a while to commit in case someone else want to 
review it also.


> inconsistent value type of the "type" field for LeafQueueInfo in response of 
> RM REST API - cluster/scheduler
> 
>
> Key: YARN-4785
> URL: https://issues.apache.org/jira/browse/YARN-4785
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.6.0
>Reporter: Jayesh
>Assignee: Varun Vasudev
>  Labels: REST_API
> Attachments: YARN-4785.001.patch
>
>
> I see inconsistent value type ( String and Array ) of the "type" field for 
> LeafQueueInfo in response of RM REST API - cluster/scheduler
> as per the spec it should be always String.
> here is the sample output ( removed non-relevant fields )
> {code}
> {
>   "scheduler": {
> "schedulerInfo": {
>   "type": "capacityScheduler",
>   "capacity": 100,
>   ...
>   "queueName": "root",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 0.1,
> 
>   },
>   {
> "type": [
>   "capacitySchedulerLeafQueueInfo"
> ],
> "capacity": 0.1,
> "queueName": "test-queue",
> "state": "RUNNING",
> 
>   },
>   {
> "type": [
>   "capacitySchedulerLeafQueueInfo"
> ],
> "capacity": 2.5,
> 
>   },
>   {
> "capacity": 25,
> 
> "state": "RUNNING",
> "queues": {
>   "queue": [
> {
>   "capacity": 6,
>   "state": "RUNNING",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 100,
> ...
>   }
> ]
>   },
>   
> },
> {
>   "capacity": 6,
>   ...
>   "state": "RUNNING",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 100,
> ...
>   }
> ]
>   },
>   ...
> },
> ...
>   ]
> },
> ...
>   }
> ]
>   }
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4062) Add the flush and compaction functionality via coprocessors and scanners for flow run table

2016-03-19 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-4062:
-
Attachment: YARN-4062-YARN-2928.08.patch


Attaching patch v8 to address Sangjin’s review comments. 

bq. (FlowRunCoprocessor.java) l.268: Logging of request.isMajor() seems a bit 
cryptic. It would print strings like "Compactionrequest= ... true for 
ObserverContext ...". Should we do something like request.isMajor() ? "major 
compaction" : "minor compaction" instead? And did you mean it to be an INFO 
logging statement?

Updated the log message. I would like to log it at the INFO level. Compactions 
have traditionally taken up cycles on region servers such that they have been 
blocked from other operations, hence it would be good to know the details of 
this request that has come in. 

bq. (FlowRunColumnPrefix.java) l.43: if we're removing the aggregation 
operation from the only enum entry, should we remove the aggregation operation 
(aggOp) completely from this class?
As we chatted offline, the aggOp operation should be part of any ColumnPrefix 
for this table. I have filed YARN-4786 for enhancements to add in a FINAL 
attribute (among other things).

bq. l.381-382: should we throw an exception as this is not possible 
(currently)?
Hmm. Not sure if throwing an exception is the right thing here. Thinking out 
loud, in the case that we add in another cell agg operation but have a 
different scanner invoked, throwing an exception won’t be correct.  

thanks
Vrushali
bq. ▪   l.220: could you elaborate on why this change is needed? I'm 
generally not too clear on the difference betweencellLimit and limit. 

There was a warning generated by check style I think, which is why I modified 
the code. cellLimit and limit are differently named variables since previously 
some functions had an argument which was called limit and check style flagged 
that in the maven jenkins build.


> Add the flush and compaction functionality via coprocessors and scanners for 
> flow run table
> ---
>
> Key: YARN-4062
> URL: https://issues.apache.org/jira/browse/YARN-4062
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4062-YARN-2928.04.patch, 
> YARN-4062-YARN-2928.05.patch, YARN-4062-YARN-2928.06.patch, 
> YARN-4062-YARN-2928.07.patch, YARN-4062-YARN-2928.08.patch, 
> YARN-4062-YARN-2928.1.patch, YARN-4062-feature-YARN-2928.01.patch, 
> YARN-4062-feature-YARN-2928.02.patch, YARN-4062-feature-YARN-2928.03.patch
>
>
> As part of YARN-3901, coprocessor and scanner is being added for storing into 
> the flow_run table. It also needs a flush & compaction processing in the 
> coprocessor and perhaps a new scanner to deal with the data during flushing 
> and compaction stages. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4831) Recovered containers will be killed after NM stateful restart

2016-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198512#comment-15198512
 ] 

Hadoop QA commented on YARN-4831:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
3s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
53s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 patch generated 1 new + 91 unchanged - 0 fixed = 92 total (was 91) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 11s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_74. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 50s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 34m 35s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12793824/YARN-4831.v1.patch |
| JIRA Issue | YARN-4831 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux e771231775da 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 

[jira] [Commented] (YARN-4595) Add support for configurable read-only mounts

2016-03-19 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1518#comment-1518
 ] 

Vinod Kumar Vavilapalli commented on YARN-4595:
---

Can this also be the general solution for accessing all (or read-only 
files/dirs) of distributed-cache files inside a docker container? If so, this 
will enable us to mix and match artifacts inside the image and outside the 
image.

> Add support for configurable read-only mounts
> -
>
> Key: YARN-4595
> URL: https://issues.apache.org/jira/browse/YARN-4595
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
> Attachments: YARN-4595.1.patch, YARN-4595.2.patch
>
>
> Mounting files or directories from the host is one way of passing 
> configuration and other information into a docker container.  We could allow 
> the user to set a list of mounts in the environment of ContainerLaunchContext 
> (e.g. /dir1:/targetdir1,/dir2:/targetdir2).  These would be mounted read-only 
> to the specified target locations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4839) ResourceManager deadlock between RMAppAttemptImpl and SchedulerApplicationAttempt

2016-03-19 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201884#comment-15201884
 ] 

Sunil G commented on YARN-4839:
---

I feel that a call to {{RMAppAttemptImpl.getMasterContainer }} is not very 
clean or good from Scheduler side. I mentioned this point once in YARN-2005. 
Its better we set that AM container is allocated via an Event to Scheduler or 
via a direct api. But it comes with few lines of extra code. There were few 
discussions on YARN-4143 for this. Will this be clean solution for this, so 
that we can remove the dependency.

> ResourceManager deadlock between RMAppAttemptImpl and 
> SchedulerApplicationAttempt
> -
>
> Key: YARN-4839
> URL: https://issues.apache.org/jira/browse/YARN-4839
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Jason Lowe
>Priority: Blocker
>
> Hit a deadlock in the ResourceManager as one thread was holding the 
> SchedulerApplicationAttempt lock and trying to call 
> RMAppAttemptImpl.getMasterContainer while another thread had the 
> RMAppAttemptImpl lock and was trying to call 
> SchedulerApplicationAttempt.getResourceUsageReport.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4767) Network issues can cause persistent RM UI outage

2016-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200138#comment-15200138
 ] 

Hadoop QA commented on YARN-4767:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 58s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
53s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 10s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 19s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 16s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
3s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 9s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 16s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 16s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 23s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: patch 
generated 6 new + 58 unchanged - 9 fixed = 64 total (was 67) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
33s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK 
v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 19s {color} 
| {color:red} hadoop-yarn-server-web-proxy in the patch failed with JDK 
v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 9s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_74. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 22s 

[jira] [Commented] (YARN-4815) ATS 1.5 timelineclinet impl try to create attempt directory for every event call

2016-03-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199611#comment-15199611
 ] 

Junping Du commented on YARN-4815:
--

The patch seems out of sync with trunk. [~xgong], can you rebase the patch 
against latest trunk? Thanks!

> ATS 1.5 timelineclinet impl try to create attempt directory for every event 
> call
> 
>
> Key: YARN-4815
> URL: https://issues.apache.org/jira/browse/YARN-4815
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4815.1.patch
>
>
> ATS 1.5 timelineclinet impl, try to create attempt directory for every event 
> call. Since per attempt only one call to create directory is enough, this is 
> causing perf issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4836) [YARN-3368] Add AM related pages

2016-03-19 Thread Varun Saxena (JIRA)
Varun Saxena created YARN-4836:
--

 Summary: [YARN-3368] Add AM related pages
 Key: YARN-4836
 URL: https://issues.apache.org/jira/browse/YARN-4836
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: webapp
Reporter: Varun Saxena
Assignee: Varun Saxena






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4751) In 2.7, Labeled queue usage not shown properly in capacity scheduler UI

2016-03-19 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197668#comment-15197668
 ] 

Sunil G commented on YARN-4751:
---

Yes, agreeing to the point that we have to aggregate and peek into multiple 
patches to get the functionality. If 2.7 doesnt need new features/enhancements 
for labels, we can bring in patches on use case basis. cc/[~wangda.tan]

> In 2.7, Labeled queue usage not shown properly in capacity scheduler UI
> ---
>
> Key: YARN-4751
> URL: https://issues.apache.org/jira/browse/YARN-4751
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.7.3
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: 2.7 CS UI No BarGraph.jpg, 
> YARH-4752-branch-2.7.001.patch, YARH-4752-branch-2.7.002.patch
>
>
> In 2.6 and 2.7, the capacity scheduler UI does not have the queue graphs 
> separated by partition. When applications are running on a labeled queue, no 
> color is shown in the bar graph, and several of the "Used" metrics are zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4839) ResourceManager deadlock between RMAppAttemptImpl and SchedulerApplicationAttempt

2016-03-19 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201578#comment-15201578
 ] 

Jason Lowe commented on YARN-4839:
--

Stack trace of the relevant threads:
{noformat}
"IPC Server handler 32 on 8030" #153 daemon prio=5 os_prio=0 
tid=0x7fb649603800 nid=0x20b1 waiting on condition [0x7fb5888d2000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00036de978f0> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getMasterContainer(RMAppAttemptImpl.java:779)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.pullNewlyAllocatedContainersAndNMTokens(SchedulerApplicationAttempt.java:467)
- locked <0x00032a106f00> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.getAllocation(FiCaSchedulerApp.java:278)
- locked <0x00032a106f00> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:1008)
- locked <0x00032a106f00> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
at 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:534)
- locked <0x000383ce08b0> (a 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
at 
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:608)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server.call(Server.java:2267)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:648)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:615)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2217)

[...]

"1413615244@qtp-1677286081-37" #40337 daemon prio=5 os_prio=0 
tid=0x7fb62c089800 nid=0x1b8d waiting for monitor entry [0x7fb5ca40e000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getResourceUsageReport(SchedulerApplicationAttempt.java:580)
- waiting to lock <0x00032a106f00> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getAppResourceUsageReport(AbstractYarnScheduler.java:267)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getApplicationResourceUsageReport(RMAppAttemptImpl.java:826)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:580)
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:815)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:457)
at sun.reflect.GeneratedMethodAccessor142.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at 

[jira] [Commented] (YARN-4785) inconsistent value type of the "type" field for LeafQueueInfo in response of RM REST API - cluster/scheduler

2016-03-19 Thread Jayesh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197781#comment-15197781
 ] 

Jayesh commented on YARN-4785:
--

+1 ( thanks for explaining the solution in code comment )

> inconsistent value type of the "type" field for LeafQueueInfo in response of 
> RM REST API - cluster/scheduler
> 
>
> Key: YARN-4785
> URL: https://issues.apache.org/jira/browse/YARN-4785
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.6.0
>Reporter: Jayesh
>Assignee: Varun Vasudev
>  Labels: REST_API
> Attachments: YARN-4785.001.patch
>
>
> I see inconsistent value type ( String and Array ) of the "type" field for 
> LeafQueueInfo in response of RM REST API - cluster/scheduler
> as per the spec it should be always String.
> here is the sample output ( removed non-relevant fields )
> {code}
> {
>   "scheduler": {
> "schedulerInfo": {
>   "type": "capacityScheduler",
>   "capacity": 100,
>   ...
>   "queueName": "root",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 0.1,
> 
>   },
>   {
> "type": [
>   "capacitySchedulerLeafQueueInfo"
> ],
> "capacity": 0.1,
> "queueName": "test-queue",
> "state": "RUNNING",
> 
>   },
>   {
> "type": [
>   "capacitySchedulerLeafQueueInfo"
> ],
> "capacity": 2.5,
> 
>   },
>   {
> "capacity": 25,
> 
> "state": "RUNNING",
> "queues": {
>   "queue": [
> {
>   "capacity": 6,
>   "state": "RUNNING",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 100,
> ...
>   }
> ]
>   },
>   
> },
> {
>   "capacity": 6,
>   ...
>   "state": "RUNNING",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 100,
> ...
>   }
> ]
>   },
>   ...
> },
> ...
>   ]
> },
> ...
>   }
> ]
>   }
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started

2016-03-19 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-4686:
--
Attachment: YARN-4686-branch-2.7.006.patch

[~eepayne] Attaching the branch-2.7 patch. It passed all of the tests locally 
on my machine. 

> MiniYARNCluster.start() returns before cluster is completely started
> 
>
> Key: YARN-4686
> URL: https://issues.apache.org/jira/browse/YARN-4686
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Rohith Sharma K S
>Assignee: Eric Badger
> Attachments: MAPREDUCE-6507.001.patch, 
> YARN-4686-branch-2.7.006.patch, YARN-4686.001.patch, YARN-4686.002.patch, 
> YARN-4686.003.patch, YARN-4686.004.patch, YARN-4686.005.patch, 
> YARN-4686.006.patch
>
>
> TestRMNMInfo fails intermittently. Below is trace for the failure
> {noformat}
> testRMNMInfo(org.apache.hadoop.mapreduce.v2.TestRMNMInfo)  Time elapsed: 0.28 
> sec  <<< FAILURE!
> java.lang.AssertionError: Unexpected number of live nodes: expected:<4> but 
> was:<3>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.mapreduce.v2.TestRMNMInfo.testRMNMInfo(TestRMNMInfo.java:111)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started

2016-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197845#comment-15197845
 ] 

Hadoop QA commented on YARN-4686:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 49s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 10s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
49s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
58s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 50s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 31s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 1 new + 
30 unchanged - 0 fixed = 31 total (was 30) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 0s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 20s {color} 
| {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.8.0_74. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 18s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_74. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 33s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 30s {color} 
| {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.7.0_95. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | 

[jira] [Commented] (YARN-4829) Add support for binary units

2016-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200326#comment-15200326
 ] 

Hadoop QA commented on YARN-4829:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 27s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
42s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 49s 
{color} | {color:green} YARN-3926 passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s 
{color} | {color:green} YARN-3926 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
37s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s 
{color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
43s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s 
{color} | {color:green} YARN-3926 passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 19s 
{color} | {color:green} YARN-3926 passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 43s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 43s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 31s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 1 new + 
7 unchanged - 0 fixed = 8 total (was 7) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 14s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_74. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 5s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_74. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 15s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | 

[jira] [Commented] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list

2016-03-19 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198218#comment-15198218
 ] 

Daniel Templeton commented on YARN-4311:


I don't have any further comments. LGTM.

> Removing nodes from include and exclude lists will not remove them from 
> decommissioned nodes list
> -
>
> Key: YARN-4311
> URL: https://issues.apache.org/jira/browse/YARN-4311
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-4311-v1.patch, YARN-4311-v10.patch, 
> YARN-4311-v11.patch, YARN-4311-v11.patch, YARN-4311-v2.patch, 
> YARN-4311-v3.patch, YARN-4311-v4.patch, YARN-4311-v5.patch, 
> YARN-4311-v6.patch, YARN-4311-v7.patch, YARN-4311-v8.patch, YARN-4311-v9.patch
>
>
> In order to fully forget about a node, removing the node from include and 
> exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The 
> tricky part that [~jlowe] pointed out was the case when include lists are not 
> used, in that case we don't want the nodes to fall off if they are not active.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4831) Recovered containers will be killed after NM stateful restart

2016-03-19 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-4831:
--
Description: 
{code}
2016-03-04 19:43:48,130 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1456335621285_0040_01_66 transitioned from NEW to DONE
2016-03-04 19:43:48,130 INFO 
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=henkins-service   
   OPERATION=Container Finished - Killed   TARGET=ContainerImpl
RESULT=SUCCESS  APPID=application_1456335621285_0040
{code}

> Recovered containers will be killed after NM stateful restart 
> --
>
> Key: YARN-4831
> URL: https://issues.apache.org/jira/browse/YARN-4831
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>
> {code}
> 2016-03-04 19:43:48,130 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1456335621285_0040_01_66 transitioned from NEW to 
> DONE
> 2016-03-04 19:43:48,130 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=henkins-service 
>OPERATION=Container Finished - Killed   TARGET=ContainerImpl
> RESULT=SUCCESS  APPID=application_1456335621285_0040
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4517) [YARN-3368] Add nodes page

2016-03-19 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199784#comment-15199784
 ] 

Sunil G commented on YARN-4517:
---

[~varun_saxena]
bq.Regarding node labels, we can add it in REST response indicating if labels 
are enabled or not. We can do this later because this would require another 
JIRA for REST changes

For this, I have some done changes and we can immediately know whether labels 
are in cluster (i am finishing up with node-label page now, will upload a patch 
soon). I will try and see whether we can make a unified patch for all REST 
changes needed for UI together. Will syncup with you offline and will share 
summary here.

bq.Maybe something like: "You have to ssh to the missing nodes' /xxx/ dir 
to look for the logs" 
I am +1 for giving more information's. From RM, we can get the node 
ip/hostname. Atleast we can give a relative patch for getting dir for logs (may 
be from available path from yarn-default.xml). User can changes, so message can 
be a possible suggestion.


> [YARN-3368] Add nodes page
> --
>
> Key: YARN-4517
> URL: https://issues.apache.org/jira/browse/YARN-4517
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Wangda Tan
>Assignee: Varun Saxena
>  Labels: webui
> Attachments: (21-Feb-2016)yarn-ui-screenshots.zip, 
> Screenshot_after_4709.png, Screenshot_after_4709_1.png, 
> YARN-4517-YARN-3368.01.patch, YARN-4517-YARN-3368.02.patch
>
>
> We need nodes page added to next generation web UI, similar to existing 
> RM/nodes page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-998) Persistent resource change during NM/RM restart

2016-03-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197345#comment-15197345
 ] 

Junping Du commented on YARN-998:
-

Thanks Jian for review and comments.
bq. DynamicResourceConfiguration(configuration, true), the second parameter is 
not needed because it’s always passing ‘true’;
Nice catch! Will remove it in next patch.

bq. instead of reload the config again, looks like we can just call 
resourceTrackerServce.set(newConf) to replace the config? newConfig is reloaded 
earlier in the same call path.
I thought of this before but my original concern is a bit risky to have an api 
to replace config with whatever come in. Will update it if this is not a valid 
concern.

> Persistent resource change during NM/RM restart
> ---
>
> Key: YARN-998
> URL: https://issues.apache.org/jira/browse/YARN-998
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-998-sample.patch, YARN-998-v1.patch
>
>
> When NM is restarted by plan or from a failure, previous dynamic resource 
> setting should be kept for consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4576) Enhancement for tracking Blacklist in AM Launching

2016-03-19 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201139#comment-15201139
 ] 

Vinod Kumar Vavilapalli commented on YARN-4576:
---

Please read my comments on YARN-4837, this whole "AM blacklisting" feature is 
unnecessarily blown way out of proportion - we just don't need this amount of 
complexity. Adding more functionality like global lists (YARN-4635), per-user 
lists (YARN-4790), pluggable blacklisting ((!)) (YARN-4636) etc will makes 
things far worse.

Containers are marked DISKS_FAILED only if all the disks have become bad, in 
which case the node itself becomes unhealthy. So there is no need for 
blacklisting per app at all !!

If an AM is killed due to memory over-flow, blacklisting the node will not help 
at all!

Overall, like I commented on [the JIRA 
YARN-4790|https://issues.apache.org/jira/browse/YARN-4790?focusedCommentId=15191217=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15191217],
 what we need is to not penalize applications for system related issues. When 
YARN finds a node with configuration / permission issues, it should itself take 
an action to (a) avoid scheduling on that node, (b) alert administrators etc. 
Implementing heuristics for app / user level blacklisting to work-around 
platform problems should be a last-ditch effort. We did that in Hadoop 1 
MapReduce as we didn't have clear demarcation between app vs system failures. 
But that isn't the case with YARN - part of the reason why we never implemented 
heuristics based per-app blacklisting in YARN - we left that completely up to 
applications.





> Enhancement for tracking Blacklist in AM Launching
> --
>
> Key: YARN-4576
> URL: https://issues.apache.org/jira/browse/YARN-4576
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: EnhancementAMLaunchingBlacklist.pdf
>
>
> Before YARN-2005, YARN blacklist mechanism is to track the bad nodes by AM:  
> If AM tried to launch containers on a specific node get failed for several 
> times, AM will blacklist this node in future resource asking. This mechanism 
> works fine for normal containers. However, from our observation on behaviors 
> of several clusters: if this problematic node launch AM failed, then RM could 
> pickup this problematic node to launch next AM attempts again and again that 
> cause application failure in case other functional nodes are busy. In normal 
> case, the customized healthy checker script cannot be so sensitive to mark 
> node as unhealthy when one or two containers get launched failed. 
> After YARN-2005, we can have a BlacklistManager in each RMapp, so those nodes 
> who launching AM attempts failed for specific application before will get 
> blacklisted. To get rid of potential risks that all nodes being blacklisted 
> by BlacklistManager, a disable-failure-threshold is involved to stop adding 
> more nodes into blacklist if hit certain ratio already. 
> There are already some enhancements for this AM blacklist mechanism: 
> YARN-4284 is to address the more wider case for AM container get launched 
> failure and YARN-4389 tries to make configuration settings available for 
> change by App to meet app specific requirement. However, there are still 
> several gaps to address more scenarios:
> 1. We may need a global blacklist instead of each app maintain a separated 
> one. The reason is: AM could get more chance to fail if other AM get failed 
> before. A quick example is: in a busy cluster, all nodes are busy except two 
> problematic nodes: node a and node b, app1 already submit and get failed in 
> two AM attempts on a and b. app2 and other apps should wait for other busy 
> nodes rather than waste attempts on these two problematic nodes.
> 2. If AM container failure is recognized as global event instead app own 
> issue, we should consider the blacklist is not a permanent thing but with a 
> specific time window. 
> 3. We could have user defined black list polices to address more possible 
> cases and scenarios, so it reasonable to make blacklist policy pluggable.
> 4. For some test scenario, we could have whitelist mechanism for AM launching.
> 5. Some minor issues: it sounds like NM reconnect won't refresh blacklist so 
> far.
> Will try to address all issues here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4785) inconsistent value type of the "type" field for LeafQueueInfo in response of RM REST API - cluster/scheduler

2016-03-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200116#comment-15200116
 ] 

Hudson commented on YARN-4785:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9473 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9473/])
YARN-4785. inconsistent value type of the type field for LeafQueueInfo 
(junping_du: rev ca8106d2dd03458944303d93679daa03b1d82ad5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java


> inconsistent value type of the "type" field for LeafQueueInfo in response of 
> RM REST API - cluster/scheduler
> 
>
> Key: YARN-4785
> URL: https://issues.apache.org/jira/browse/YARN-4785
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.6.0
>Reporter: Jayesh
>Assignee: Varun Vasudev
>  Labels: REST_API
> Fix For: 2.8.0, 2.7.3, 2.6.5
>
> Attachments: YARN-4785.001.patch, YARN-4785.branch-2.6.001.patch, 
> YARN-4785.branch-2.7.001.patch
>
>
> I see inconsistent value type ( String and Array ) of the "type" field for 
> LeafQueueInfo in response of RM REST API - cluster/scheduler
> as per the spec it should be always String.
> here is the sample output ( removed non-relevant fields )
> {code}
> {
>   "scheduler": {
> "schedulerInfo": {
>   "type": "capacityScheduler",
>   "capacity": 100,
>   ...
>   "queueName": "root",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 0.1,
> 
>   },
>   {
> "type": [
>   "capacitySchedulerLeafQueueInfo"
> ],
> "capacity": 0.1,
> "queueName": "test-queue",
> "state": "RUNNING",
> 
>   },
>   {
> "type": [
>   "capacitySchedulerLeafQueueInfo"
> ],
> "capacity": 2.5,
> 
>   },
>   {
> "capacity": 25,
> 
> "state": "RUNNING",
> "queues": {
>   "queue": [
> {
>   "capacity": 6,
>   "state": "RUNNING",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 100,
> ...
>   }
> ]
>   },
>   
> },
> {
>   "capacity": 6,
>   ...
>   "state": "RUNNING",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 100,
> ...
>   }
> ]
>   },
>   ...
> },
> ...
>   ]
> },
> ...
>   }
> ]
>   }
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4746) yarn web services should convert parse failures of appId to 400

2016-03-19 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4746:
---
Attachment: 0003-YARN-4746.patch

[~ste...@apache.org]
Thanks you for review.Uploading patch with below changes
# testInvalidAppAttempts corrected to check invalid attempt earlier was 
checking invalid appId
# Moved parse validation for applicationID to WebAppUtil and re-factoring done
Please do review

> yarn web services should convert parse failures of appId to 400
> ---
>
> Key: YARN-4746
> URL: https://issues.apache.org/jira/browse/YARN-4746
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Priority: Minor
> Attachments: 0001-YARN-4746.patch, 0002-YARN-4746.patch, 
> 0003-YARN-4746.patch
>
>
> I'm seeing somewhere in the WS API tests of mine an error with exception 
> conversion of  a bad app ID sent in as an argument to a GET. I know it's in 
> ATS, but a scan of the core RM web services implies a same problem
> {{WebServices.parseApplicationId()}} uses {{ConverterUtils.toApplicationId}} 
> to convert an argument; this throws IllegalArgumentException, which is then 
> handled somewhere by jetty as a 500 error.
> In fact, it's a bad argument, which should be handled by returning a 400. 
> This can be done by catching the raised argument and explicitly converting it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-4390) Consider container request size during CS preemption

2016-03-19 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne reopened YARN-4390:
--
  Assignee: Wangda Tan  (was: Eric Payne)

{quote}
Since YARN-4108 doesn't solve all the issues. (I planned to solve this together 
with YARN-4108, but YARN-4108 only tackled half of the problem: when containers 
selected, only preempt useful containers). However, we need select container 
more clever based on requirement. I'm thinking about this recently and I plan 
to make some progresses as soon as possible. May I reopen this JIRA and take 
over from you?
{quote}
[~leftnoteasy], I had forgotten that we had closed this JIRA in favor of 
YARN-4108. Yes, I had noticed that the selection of containers to preempt in 
YARN-4108 do not actually consider the properties of the needed resources like 
size or locality. Even still, YARN-4108 is a big improvement and does prevent 
unnecessary preemption. However, you are correct, implementing this JIRA would 
eliminate some extra event passing and processing if killable containers are 
rejected over and over.

I am reopening and assigning to you.

> Consider container request size during CS preemption
> 
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.0.0, 2.8.0, 2.7.3
>Reporter: Eric Payne
>Assignee: Wangda Tan
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4517) [YARN-3368] Add nodes page

2016-03-19 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200211#comment-15200211
 ] 

Varun Saxena commented on YARN-4517:


Filed YARN-4835

> [YARN-3368] Add nodes page
> --
>
> Key: YARN-4517
> URL: https://issues.apache.org/jira/browse/YARN-4517
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Wangda Tan
>Assignee: Varun Saxena
>  Labels: webui
> Attachments: (21-Feb-2016)yarn-ui-screenshots.zip, 
> Screenshot_after_4709.png, Screenshot_after_4709_1.png, 
> YARN-4517-YARN-3368.01.patch, YARN-4517-YARN-3368.02.patch
>
>
> We need nodes page added to next generation web UI, similar to existing 
> RM/nodes page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4736) Issues with HBaseTimelineWriterImpl

2016-03-19 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198862#comment-15198862
 ] 

Anoop Sam John commented on YARN-4736:
--

When one RS is down, those regions in that would get moved to other RS(s).  And 
if that was a META serving RS, then META will get moved to new RS. So any way 
next retries may get things moving. The issue here was that the entire HBase 
cluster went down and no one brought it back. 
There are certain must fix things in HBase. Will talk abt that in HBase jira.


> Issues with HBaseTimelineWriterImpl
> ---
>
> Key: YARN-4736
> URL: https://issues.apache.org/jira/browse/YARN-4736
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Naganarasimha G R
>Assignee: Vrushali C
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: NM_Hang_hbase1.0.3.tar.gz, hbaseException.log, 
> threaddump.log
>
>
> Faced some issues while running ATSv2 in single node Hadoop cluster and in 
> the same node had launched Hbase with embedded zookeeper.
> # Due to some NPE issues i was able to see NM was trying to shutdown, but the 
> NM daemon process was not completed due to the locks.
> # Got some exception related to Hbase after application finished execution 
> successfully. 
> will attach logs and the trace for the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4829) Add support for binary units

2016-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198456#comment-15198456
 ] 

Hadoop QA commented on YARN-4829:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 46s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
36s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 16s 
{color} | {color:green} YARN-3926 passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 29s 
{color} | {color:green} YARN-3926 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
39s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s 
{color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
32s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 3s 
{color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s 
{color} | {color:green} YARN-3926 passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 42s 
{color} | {color:green} YARN-3926 passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
58s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 19s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 23s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 31s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 1 new + 
7 unchanged - 0 fixed = 8 total (was 7) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
58s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_74. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 1m 56s {color} 
| {color:red} hadoop-yarn-common in the patch failed with JDK v1.8.0_74. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 13s {color} 
| {color:red} hadoop-yarn-common in the patch failed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 

[jira] [Updated] (YARN-4336) YARN NodeManager - Container Initialization - Excessive load on NSS/LDAP

2016-03-19 Thread Feng Yuan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Yuan updated YARN-4336:

Description: 
Hi folks after performing some debug for our Unix Engineering and Active 
Directory teams it was discovered that on YARN Container Initialization a call 
via Hadoop Common AccessControlList.java:

  for(String group: ugi.getGroupNames()) {
if (groups.contains(group)) {
  return true;
}
  }

Unfortunately with the security call to check access on 
"appattempt_X_X_X" will always return false but will make 
unnecessary calls to NameSwitch service on linux which will call things like 
SSSD/Quest VASD which will then initiate LDAP calls looking for non existent 
userid's causing excessive load on LDAP.

For now our tactical work around is as follows:



Example of VASD Debug log showing the lookups for one task attempt 32 of them:
{code}
/**
   * Checks if a user represented by the provided {@link UserGroupInformation}
   * is a member of the Access Control List
   * @param ugi UserGroupInformation to check if contained in the ACL
   * @return true if ugi is member of the list
   */
  public final boolean isUserInList(UserGroupInformation ugi) {
if (allAllowed || users.contains(ugi.getShortUserName())) {
  return true;
} else {
String patternString = "^appattempt_\\d+_\\d+_\\d+$";

Pattern pattern = Pattern.compile(patternString);

Matcher matcher = pattern.matcher(ugi.getShortUserName());
boolean matches = matcher.matches();
if (matches) {
LOG.debug("Bailing !! AppAttempt Matches DONOT call UGI FOR 
GROUPS!!");;
return false;
}


  for(String group: ugi.getGroupNames()) {
if (groups.contains(group)) {
  return true;
}
  }
}
return false;
  }

  public boolean isUserAllowed(UserGroupInformation ugi) {
return isUserInList(ugi);
  }{code}
One task:
Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:56:45 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:56:45 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:57:18 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:57:18 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:57:18 xhadoopm5d 

[jira] [Commented] (YARN-4743) ResourceManager crash because TimSort

2016-03-19 Thread Zephyr Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201747#comment-15201747
 ] 

Zephyr Guo commented on YARN-4743:
--

I am trying to solve the issue, but I am failed.
In my opinion, the issue cause by concurrent operation on {{FSAppAttempt}}.When 
{{FSLeafQueue}} is sorting FSAppAttempt, the inner {{Resource}} of FsAppAttempt 
is modified.In this case, {{FairShareComparator}} may cannot work 
correctly.Base on this idea, I write YARN-4743-cdh5.4.7.patch(I have 
attached).The patch use snapshot to protect elements during the sorting.Sadly, 
this problem doesn't resolve with the patch.I got same exception on sorting and 
more frequently crash.I begin to doubt whether the comparator have a problem 
really.I reviewed {{FairShareComparator}} code and simulate all cases, but did 
not found any bugs.

I need some idea. I'd like to verify two things.1)Can inner Resource be 
modified during the sorting?Who could review it for me? 2)Does comparator also 
have mistakes really or my patch is incorrect?

I doubt that float-point precision in comparator, but it's hard to reappear in 
test cluster(never reappear). It happen with low probability in larger cluster.

> ResourceManager crash because TimSort
> -
>
> Key: YARN-4743
> URL: https://issues.apache.org/jira/browse/YARN-4743
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.4
>Reporter: Zephyr Guo
>Assignee: Yufei Gu
>
> {code}
> 2016-02-26 14:08:50,821 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>  at java.util.TimSort.mergeHi(TimSort.java:868)
>  at java.util.TimSort.mergeAt(TimSort.java:485)
>  at java.util.TimSort.mergeCollapse(TimSort.java:410)
>  at java.util.TimSort.sort(TimSort.java:214)
>  at java.util.TimSort.sort(TimSort.java:173)
>  at java.util.Arrays.sort(Arrays.java:659)
>  at java.util.Collections.sort(Collections.java:217)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:316)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:240)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1091)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:989)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1185)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
>  at java.lang.Thread.run(Thread.java:745)
> 2016-02-26 14:08:50,822 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {code}
> Actually, this issue found in 2.6.0-cdh5.4.7.
> I think the cause is that we modify {{Resouce}} while we are sorting 
> {{runnableApps}}.
> {code:title=FSLeafQueue.java}
> Comparator comparator = policy.getComparator();
> writeLock.lock();
> try {
>   Collections.sort(runnableApps, comparator);
> } finally {
>   writeLock.unlock();
> }
> readLock.lock();
> {code}
> {code:title=FairShareComparator}
> public int compare(Schedulable s1, Schedulable s2) {
> ..
>   s1.getResourceUsage(), minShare1);
>   boolean s2Needy = Resources.lessThan(RESOURCE_CALCULATOR, null,
>   s2.getResourceUsage(), minShare2);
>   minShareRatio1 = (double) s1.getResourceUsage().getMemory()
>   / Resources.max(RESOURCE_CALCULATOR, null, minShare1, 
> ONE).getMemory();
>   minShareRatio2 = (double) s2.getResourceUsage().getMemory()
>   / Resources.max(RESOURCE_CALCULATOR, null, minShare2, 
> ONE).getMemory();
> ..
> {code}
> {{getResourceUsage}} will return current Resource. The current Resource is 
> unstable. 
> {code:title=FSAppAttempt.java}
> @Override
>   public Resource getResourceUsage() {
> // Here the getPreemptedResources() always return zero, except in
> // a preemption round
> return Resources.subtract(getCurrentConsumption(), 
> getPreemptedResources());
>   }
> {code}
> {code:title=SchedulerApplicationAttempt}
>  public Resource getCurrentConsumption() {
> return currentConsumption;
>   }
> // This method may modify current Resource.
> public synchronized void 

[jira] [Updated] (YARN-4336) YARN NodeManager - Container Initialization - Excessive load on NSS/LDAP

2016-03-19 Thread Feng Yuan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Yuan updated YARN-4336:

Description: 
Hi folks after performing some debug for our Unix Engineering and Active 
Directory teams it was discovered that on YARN Container Initialization a call 
via Hadoop Common AccessControlList.java:

  for(String group: ugi.getGroupNames()) {
if (groups.contains(group)) {
  return true;
}
  }

Unfortunately with the security call to check access on 
"appattempt_X_X_X" will always return false but will make 
unnecessary calls to NameSwitch service on linux which will call things like 
SSSD/Quest VASD which will then initiate LDAP calls looking for non existent 
userid's causing excessive load on LDAP.

For now our tactical work around is as follows:



Example of VASD Debug log showing the lookups for one task attempt 32 of them:
{
/**
   * Checks if a user represented by the provided {@link UserGroupInformation}
   * is a member of the Access Control List
   * @param ugi UserGroupInformation to check if contained in the ACL
   * @return true if ugi is member of the list
   */
  public final boolean isUserInList(UserGroupInformation ugi) {
if (allAllowed || users.contains(ugi.getShortUserName())) {
  return true;
} else {
String patternString = "^appattempt_\\d+_\\d+_\\d+$";

Pattern pattern = Pattern.compile(patternString);

Matcher matcher = pattern.matcher(ugi.getShortUserName());
boolean matches = matcher.matches();
if (matches) {
LOG.debug("Bailing !! AppAttempt Matches DONOT call UGI FOR 
GROUPS!!");;
return false;
}


  for(String group: ugi.getGroupNames()) {
if (groups.contains(group)) {
  return true;
}
  }
}
return false;
  }

  public boolean isUserAllowed(UserGroupInformation ugi) {
return isUserInList(ugi);
  }}
One task:
Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:56:45 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:56:45 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:57:18 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:57:18 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:57:18 xhadoopm5d vasd[20741]: 

[jira] [Commented] (YARN-4390) Consider container request size during CS preemption

2016-03-19 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199987#comment-15199987
 ] 

Wangda Tan commented on YARN-4390:
--

Thanks [~eepayne]/[~sunilg].
Will add implementation soon.

> Consider container request size during CS preemption
> 
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.0.0, 2.8.0, 2.7.3
>Reporter: Eric Payne
>Assignee: Wangda Tan
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4336) YARN NodeManager - Container Initialization - Excessive load on NSS/LDAP

2016-03-19 Thread Feng Yuan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Yuan updated YARN-4336:

Description: 
Hi folks after performing some debug for our Unix Engineering and Active 
Directory teams it was discovered that on YARN Container Initialization a call 
via Hadoop Common AccessControlList.java:

  for(String group: ugi.getGroupNames()) {
if (groups.contains(group)) {
  return true;
}
  }

Unfortunately with the security call to check access on 
"appattempt_X_X_X" will always return false but will make 
unnecessary calls to NameSwitch service on linux which will call things like 
SSSD/Quest VASD which will then initiate LDAP calls looking for non existent 
userid's causing excessive load on LDAP.

For now our tactical work around is as follows:



Example of VASD Debug log showing the lookups for one task attempt 32 of them:
{code}
/**
   * Checks if a user represented by the provided {@link UserGroupInformation}
   * is a member of the Access Control List
   * @param ugi UserGroupInformation to check if contained in the ACL
   * @return true if ugi is member of the list
   */
  public final boolean isUserInList(UserGroupInformation ugi) {
if (allAllowed || users.contains(ugi.getShortUserName())) {
  return true;
} else {
String patternString = "^appattempt_\\d+_\\d+_\\d+$";

Pattern pattern = Pattern.compile(patternString);

Matcher matcher = pattern.matcher(ugi.getShortUserName());
boolean matches = matcher.matches();
if (matches) {
LOG.debug("Bailing !! AppAttempt Matches DONOT call UGI FOR 
GROUPS!!");;
return false;
}


  for(String group: ugi.getGroupNames()) {
if (groups.contains(group)) {
  return true;
}
  }
}
return false;
  }

  public boolean isUserAllowed(UserGroupInformation ugi) {
return isUserInList(ugi);
  }{code}
One task:
Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:56:45 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:56:45 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:57:18 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:57:18 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:57:18 xhadoopm5d 

[jira] [Updated] (YARN-4336) YARN NodeManager - Container Initialization - Excessive load on NSS/LDAP

2016-03-19 Thread Feng Yuan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Yuan updated YARN-4336:

Description: 
Hi folks after performing some debug for our Unix Engineering and Active 
Directory teams it was discovered that on YARN Container Initialization a call 
via Hadoop Common AccessControlList.java:

  for(String group: ugi.getGroupNames()) {
if (groups.contains(group)) {
  return true;
}
  }

Unfortunately with the security call to check access on 
"appattempt_X_X_X" will always return false but will make 
unnecessary calls to NameSwitch service on linux which will call things like 
SSSD/Quest VASD which will then initiate LDAP calls looking for non existent 
userid's causing excessive load on LDAP.

For now our tactical work around is as follows:



Example of VASD Debug log showing the lookups for one task attempt 32 of them:
/**
   * Checks if a user represented by the provided {@link UserGroupInformation}
   * is a member of the Access Control List
   * @param ugi UserGroupInformation to check if contained in the ACL
   * @return true if ugi is member of the list
   */
  public final boolean isUserInList(UserGroupInformation ugi) {
if (allAllowed || users.contains(ugi.getShortUserName())) {
  return true;
} else {
String patternString = "^appattempt_\\d+_\\d+_\\d+$";

Pattern pattern = Pattern.compile(patternString);

Matcher matcher = pattern.matcher(ugi.getShortUserName());
boolean matches = matcher.matches();
if (matches) {
LOG.debug("Bailing !! AppAttempt Matches DONOT call UGI FOR 
GROUPS!!");;
return false;
}


  for(String group: ugi.getGroupNames()) {
if (groups.contains(group)) {
  return true;
}
  }
}
return false;
  }

  public boolean isUserAllowed(UserGroupInformation ugi) {
return isUserInList(ugi);
  } 
One task:
Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:56:45 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:56:45 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:57:18 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:57:18 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:57:18 xhadoopm5d vasd[20741]: 

[jira] [Commented] (YARN-4829) Add support for binary units

2016-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198171#comment-15198171
 ] 

Hadoop QA commented on YARN-4829:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 33s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
28s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 52s 
{color} | {color:green} YARN-3926 passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 9s 
{color} | {color:green} YARN-3926 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
35s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s 
{color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
43s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s 
{color} | {color:green} YARN-3926 passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 18s 
{color} | {color:green} YARN-3926 passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 48s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 8s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 8s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 30s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 1 new + 
7 unchanged - 0 fixed = 8 total (was 7) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_74. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 1m 54s {color} 
| {color:red} hadoop-yarn-common in the patch failed with JDK v1.8.0_74. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 12s {color} 
| {color:red} hadoop-yarn-common in the patch failed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 42m 

[jira] [Commented] (YARN-4593) Deadlock in AbstractService.getConfig()

2016-03-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197504#comment-15197504
 ] 

Hudson commented on YARN-4593:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9469 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9469/])
YARN-4593 Deadlock in AbstractService.getConfig() (stevel) (stevel: rev 
605fdcbb81687c73ba91a3bd0d607cabd3dc5a67)
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/service/AbstractService.java


> Deadlock in AbstractService.getConfig()
> ---
>
> Key: YARN-4593
> URL: https://issues.apache.org/jira/browse/YARN-4593
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.2
> Environment: AM restarting on kerberized cluster
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Fix For: 2.9.0
>
> Attachments: YARN-4593-001.patch
>
>
> SLIDER-1052 has found a deadlock which can arise in it during AM restart. 
> Looking at the thread trace, one of the blockages is actually 
> {{AbstractService.getConfig()}} —this is synchronized and so blocked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4820) ResourceManager web redirects in HA mode drops query parameters

2016-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200374#comment-15200374
 ] 

Hadoop QA commented on YARN-4820:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
3s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 57s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 41s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 41s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 2s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 7s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 34s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_74. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 73m 41s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 44s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 302m 49s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK 

[jira] [Updated] (YARN-1508) Document Dynamic Resource Configuration feature

2016-03-19 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1508:
-
Summary: Document Dynamic Resource Configuration feature  (was: Rename 
ResourceOption and document resource over-commitment cases)

> Document Dynamic Resource Configuration feature
> ---
>
> Key: YARN-1508
> URL: https://issues.apache.org/jira/browse/YARN-1508
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
>
> Per Vinod's comment in 
> YARN-312(https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087)
>  and Bikas' comment in 
> YARN-311(https://issues.apache.org/jira/browse/YARN-311?focusedCommentId=13848615),
>  the name of ResourceOption is not good enough for being understood. Also, we 
> need to document more on resource overcommitment time and use cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4595) Add support for configurable read-only mounts

2016-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200385#comment-15200385
 ] 

Hadoop QA commented on YARN-4595:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
50s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 5s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_74. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 40s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 33m 20s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12794016/YARN-4595.2.patch |
| JIRA Issue | YARN-4595 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux b1bf64ea3a7a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / dc951e6 |
| Default Java | 1.7.0_95 |
| Multi-JDK versions |  

[jira] [Commented] (YARN-4837) User facing aspects of 'AM blacklisting' feature need fixing

2016-03-19 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201546#comment-15201546
 ] 

Sunil G commented on YARN-4837:
---

Thanks [~vinodkv] for pitching in.

YARN-2005 blacklists nodes if AM container launch failed due to DISK_FAILED. 
And after YARN-4284, blacklisting for am-container-failure is made for all 
container failure except PREEMPTED. There were few discussion on usecase 
aspects for this change.

If blacklisting (am container failure) feature is enabled in cluster level, all 
applications will be forced to comply the blacklisting rule. YARN-4389 had also 
an option to disable this feature from application end. Also it could control 
the threshold if its too strict (and vice versa). Yes, agreeing to your point 
and its early for user  to take blacklisting decisions w/o having much 
needed/useful information. But by seeing the current aggressive nature, this 
change was helping in skipping this feature.

Agreeing that this has to be a controllable feature without causing problems in 
a busy cluster. I think may be a time based purging solution can be ideal to 
allow same app to use the node again.

> User facing aspects of 'AM blacklisting' feature need fixing
> 
>
> Key: YARN-4837
> URL: https://issues.apache.org/jira/browse/YARN-4837
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> Was reviewing the user-facing aspects that we are releasing as part of 2.8.0.
> Looking at the 'AM blacklisting feature', I see several things to be fixed 
> before we release it in 2.8.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4502) Fix two AM containers get allocated when AM restart

2016-03-19 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200423#comment-15200423
 ] 

Vinod Kumar Vavilapalli commented on YARN-4502:
---

[~zxu] / [~djp]
bq. It looks like the implementation for 
AbstractYarnScheduler#getApplicationAttempt(ApplicationAttemptId 
applicationAttemptId) is also confusing.
This is by design - see YARN-1041 - we want to route all the events destined 
for AppAttempt *only* to the current attempt. We should just document this and 
move on.

> Fix two AM containers get allocated when AM restart
> ---
>
> Key: YARN-4502
> URL: https://issues.apache.org/jira/browse/YARN-4502
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-4502-20160114.txt, YARN-4502-20160212.txt
>
>
> Scenario : 
> * set yarn.resourcemanager.am.max-attempts = 2
> * start dshell application
> {code}
>  yarn  org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
> hadoop-yarn-applications-distributedshell-*.jar 
> -attempt_failures_validity_interval 6 -shell_command "sleep 150" 
> -num_containers 16
> {code}
> * Kill AM pid
> * Print container list for 2nd attempt
> {code}
> yarn container -list appattempt_1450825622869_0001_02
> INFO impl.TimelineClientImpl: Timeline service address: 
> http://xxx:port/ws/v1/timeline/
> INFO client.RMProxy: Connecting to ResourceManager at xxx/10.10.10.10:
> Total number of containers :2
> Container-Id Start Time Finish Time   
> StateHost   Node Http Address 
>LOG-URL
> container_e12_1450825622869_0001_02_02 Tue Dec 22 23:07:35 + 2015 
>   N/A RUNNINGxxx:25454   http://xxx:8042 
> http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_02/hrt_qa
> container_e12_1450825622869_0001_02_01 Tue Dec 22 23:07:34 + 2015 
>   N/A RUNNINGxxx:25454   http://xxx:8042 
> http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_01/hrt_qa
> {code}
> * look for new AM pid 
> Here, 2nd AM container was suppose to be started on  
> container_e12_1450825622869_0001_02_01. But AM was not launched on 
> container_e12_1450825622869_0001_02_01. It was in AQUIRED state. 
> On other hand, container_e12_1450825622869_0001_02_02 got the AM running. 
> Expected behavior: RM should not start 2 containers for starting AM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4746) yarn web services should convert parse failures of appId to 400

2016-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201265#comment-15201265
 ] 

Hadoop QA commented on YARN-4746:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 41s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 15s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 35s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 7s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 23s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 29s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
32s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
34s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 12s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 44s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 8s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_74. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 16s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 53s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 20s {color} 
| {color:red} 

[jira] [Created] (YARN-4828) Create a pull request template for github

2016-03-19 Thread Steve Loughran (JIRA)
Steve Loughran created YARN-4828:


 Summary: Create a pull request template for github
 Key: YARN-4828
 URL: https://issues.apache.org/jira/browse/YARN-4828
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: build
Affects Versions: 3.0.0
 Environment: github
Reporter: Steve Loughran
Priority: Minor


We're starting to see PRs appear without any JIRA, explanation etc. These are 
going to be ignored without them.

It's possible to [create a PR text 
template](https://help.github.com/articles/creating-a-pull-request-template-for-your-repository/)
 under {{.github/PULL_REQUEST_TEMPLATE}}

We can do such a template, which provides template summary points, such as:

* which JIRA
* if against an object store, how did you test it?
* if its a shell script, how did you test it?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs

2016-03-19 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201146#comment-15201146
 ] 

Vinod Kumar Vavilapalli commented on YARN-2005:
---

-1 for backporting this, while I understand that the original feature-ask is 
useful for avoiding AM scheduling getting blocked, there are far too many 
issues with the feature as it is. Please see my comments on YARN-4576 and 
YARN-4837.

> Blacklisting support for scheduling AMs
> ---
>
> Key: YARN-2005
> URL: https://issues.apache.org/jira/browse/YARN-2005
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
> YARN-2005.003.patch, YARN-2005.004.patch, YARN-2005.005.patch, 
> YARN-2005.006.patch, YARN-2005.006.patch, YARN-2005.007.patch, 
> YARN-2005.008.patch, YARN-2005.009.patch
>
>
> It would be nice if the RM supported blacklisting a node for an AM launch 
> after the same node fails a configurable number of AM attempts.  This would 
> be similar to the blacklisting support for scheduling task attempts in the 
> MapReduce AM but for scheduling AM attempts on the RM side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4832) NM side resource value should get updated if change applied in RM side

2016-03-19 Thread Junping Du (JIRA)
Junping Du created YARN-4832:


 Summary: NM side resource value should get updated if change 
applied in RM side
 Key: YARN-4832
 URL: https://issues.apache.org/jira/browse/YARN-4832
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical


Now, if we execute CLI to update node (single or multiple) resource in RM side, 
NM will not receive any notification. It doesn't affect resource scheduling but 
will make resource usage metrics reported by NM a bit weird. We should sync up 
new resource between RM and NM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4820) ResourceManager web redirects in HA mode drops query parameters

2016-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197561#comment-15197561
 ] 

Hadoop QA commented on YARN-4820:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
42s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 43s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 7s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 40s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 18s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 28s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 35s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_74. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 73m 42s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 45s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | 

[jira] [Updated] (YARN-4390) Consider container request size during CS preemption

2016-03-19 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4390:
-
Attachment: YARN-4390-design.1.pdf

Uploaded ver.1 design doc for review.

You can jump to "Proposal" part directly if you are already familiar with 
preemption implementation.

> Consider container request size during CS preemption
> 
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.0.0, 2.8.0, 2.7.3
>Reporter: Eric Payne
>Assignee: Wangda Tan
> Attachments: YARN-4390-design.1.pdf
>
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3926) Extend the YARN resource model for easier resource-type management and profiles

2016-03-19 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197440#comment-15197440
 ] 

Varun Vasudev commented on YARN-3926:
-

[~asuresh] - I was speaking with [~leftnoteasy] offline. I suspect we'll have 
to support 3 modes for the RM-NM handshake which admins can configure  -
# Strict - the RM and NM resource types must match
# RM subset - as long the NM resource types are a superset of the RM, the 
handshake proceeds - I believe this will address your concerns. Correct?
# Allow mismatch - the handshake will not fail due to missing resource types - 
missing resource types are presumed to be of value 0 by the RM.

Does that make sense?

> Extend the YARN resource model for easier resource-type management and 
> profiles
> ---
>
> Key: YARN-3926
> URL: https://issues.apache.org/jira/browse/YARN-3926
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Proposal for modifying resource model and profiles.pdf
>
>
> Currently, there are efforts to add support for various resource-types such 
> as disk(YARN-2139), network(YARN-2140), and  HDFS bandwidth(YARN-2681). These 
> efforts all aim to add support for a new resource type and are fairly 
> involved efforts. In addition, once support is added, it becomes harder for 
> users to specify the resources they need. All existing jobs have to be 
> modified, or have to use the minimum allocation.
> This ticket is a proposal to extend the YARN resource model to a more 
> flexible model which makes it easier to support additional resource-types. It 
> also considers the related aspect of “resource profiles” which allow users to 
> easily specify the various resources they need for any given container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode

2016-03-19 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200170#comment-15200170
 ] 

Varun Saxena commented on YARN-2962:


bq. Could you also explain in the parameter description why one would want to 
change it from the default of 0 and how to know what a good split value would 
be?
Ok...

bq. I'm not sure this constant adds anything. I found it made the code harder 
to read than just hard-coding in 0.
Hmm...If its harder to read, I can put 0 everywhere. This value should not 
change in future.

bq. This violates the Principle of Least Astonishment. At least log a warning 
that you're not doing what the user said to.
Correct, a warning log should be added.

bq. I don't think the accessors are needed. 
Yes, they are not required.

bq. Might want to swap those method names.
Agree. safeDeleteIfExists makes more sense in the other case.

bq. should be  HashMap attempts = new 
HashMap<>();
Yeah, <> can be used to reduce clutter.

Regarding other comments, will add more comments to make tests and main code 
more easier to read and fix missing javadocs.

> ZKRMStateStore: Limit the number of znodes under a znode
> 
>
> Key: YARN-2962
> URL: https://issues.apache.org/jira/browse/YARN-2962
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: YARN-2962.01.patch, YARN-2962.04.patch, 
> YARN-2962.2.patch, YARN-2962.3.patch
>
>
> We ran into this issue where we were hitting the default ZK server message 
> size configs, primarily because the message had too many znodes even though 
> they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4336) YARN NodeManager - Container Initialization - Excessive load on NSS/LDAP

2016-03-19 Thread Feng Yuan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Yuan updated YARN-4336:

Description: 
Hi folks after performing some debug for our Unix Engineering and Active 
Directory teams it was discovered that on YARN Container Initialization a call 
via Hadoop Common AccessControlList.java:

  for(String group: ugi.getGroupNames()) {
if (groups.contains(group)) {
  return true;
}
  }

Unfortunately with the security call to check access on 
"appattempt_X_X_X" will always return false but will make 
unnecessary calls to NameSwitch service on linux which will call things like 
SSSD/Quest VASD which will then initiate LDAP calls looking for non existent 
userid's causing excessive load on LDAP.

For now our tactical work around is as follows:



Example of VASD Debug log showing the lookups for one task attempt 32 of them:
{code
/**
   * Checks if a user represented by the provided {@link UserGroupInformation}
   * is a member of the Access Control List
   * @param ugi UserGroupInformation to check if contained in the ACL
   * @return true if ugi is member of the list
   */
  public final boolean isUserInList(UserGroupInformation ugi) {
if (allAllowed || users.contains(ugi.getShortUserName())) {
  return true;
} else {
String patternString = "^appattempt_\\d+_\\d+_\\d+$";

Pattern pattern = Pattern.compile(patternString);

Matcher matcher = pattern.matcher(ugi.getShortUserName());
boolean matches = matcher.matches();
if (matches) {
LOG.debug("Bailing !! AppAttempt Matches DONOT call UGI FOR 
GROUPS!!");;
return false;
}


  for(String group: ugi.getGroupNames()) {
if (groups.contains(group)) {
  return true;
}
  }
}
return false;
  }

  public boolean isUserAllowed(UserGroupInformation ugi) {
return isUserInList(ugi);
  }}
One task:
Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:56:45 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:56:45 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:57:18 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:57:18 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:57:18 xhadoopm5d vasd[20741]: 

[jira] [Updated] (YARN-4829) Add support for binary units

2016-03-19 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-4829:

Attachment: YARN-4829-YARN-3926.002.patch

Thanks for the review [~asuresh]!

Uploaded a new patch with new test cases and some re-factoring of code. With 
regards to the Ki, Mi, etc - the binary prefix symbols used are the ones in the 
IEC standard. It's also the format used by Kubernetes.

> Add support for binary units
> 
>
> Key: YARN-4829
> URL: https://issues.apache.org/jira/browse/YARN-4829
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4829-YARN-3926.001.patch, 
> YARN-4829-YARN-3926.002.patch
>
>
> The units conversion util should have support for binary units.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4829) Add support for binary units

2016-03-19 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-4829:

Attachment: YARN-4829-YARN-3926.004.patch

Uploaded a new patch fixing the test failure.

> Add support for binary units
> 
>
> Key: YARN-4829
> URL: https://issues.apache.org/jira/browse/YARN-4829
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4829-YARN-3926.001.patch, 
> YARN-4829-YARN-3926.002.patch, YARN-4829-YARN-3926.003.patch, 
> YARN-4829-YARN-3926.004.patch
>
>
> The units conversion util should have support for binary units.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4783) Log aggregation failure for application when Nodemanager is restarted

2016-03-19 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe resolved YARN-4783.
--
Resolution: Won't Fix

Resolving as Won't Fix per the above discussion since we don't want to keep an 
application's security tokens valid for an arbitrary time after the application 
completes.

> Log aggregation failure for application when Nodemanager is restarted 
> --
>
> Key: YARN-4783
> URL: https://issues.apache.org/jira/browse/YARN-4783
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Surendra Singh Lilhore
>
> Scenario :
> =
> 1.Start NM with user dsperf:hadoop
> 2.Configure linux-execute user as dsperf
> 3.Submit application with yarn user 
> 4.Once few containers are allocated to NM 1
> 5.Nodemanager 1 is stopped  (wait for expiry )
> 6.Start node manager after application is completed
> 7.Check the log aggregation is happening for the containers log in NMLocal 
> directory
> Expect Output :
> ===
> Log aggregation should be succesfull
> Actual Output :
> ===
> Log aggreation not successfull



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4820) ResourceManager web redirects in HA mode drops query parameters

2016-03-19 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-4820:

Attachment: YARN-4820.002.patch

Uploaded a new patch to address the findbugs warnings.

> ResourceManager web redirects in HA mode drops query parameters
> ---
>
> Key: YARN-4820
> URL: https://issues.apache.org/jira/browse/YARN-4820
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4820.001.patch, YARN-4820.002.patch
>
>
> The RMWebAppFilter redirects http requests from the standby to the active. 
> However it drops all the query parameters when it does the redirect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4830) Add support for resource types in the nodemanager

2016-03-19 Thread Varun Vasudev (JIRA)
Varun Vasudev created YARN-4830:
---

 Summary: Add support for resource types in the nodemanager
 Key: YARN-4830
 URL: https://issues.apache.org/jira/browse/YARN-4830
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started

2016-03-19 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198410#comment-15198410
 ] 

Karthik Kambatla commented on YARN-4686:


On a cursory look, the patch looks reasonable. One thing that caught my eye: 
are we explicitly transitioning the RM to active even when HA is not enabled? 
Is that required? 

> MiniYARNCluster.start() returns before cluster is completely started
> 
>
> Key: YARN-4686
> URL: https://issues.apache.org/jira/browse/YARN-4686
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Rohith Sharma K S
>Assignee: Eric Badger
> Attachments: MAPREDUCE-6507.001.patch, YARN-4686.001.patch, 
> YARN-4686.002.patch, YARN-4686.003.patch, YARN-4686.004.patch, 
> YARN-4686.005.patch, YARN-4686.006.patch
>
>
> TestRMNMInfo fails intermittently. Below is trace for the failure
> {noformat}
> testRMNMInfo(org.apache.hadoop.mapreduce.v2.TestRMNMInfo)  Time elapsed: 0.28 
> sec  <<< FAILURE!
> java.lang.AssertionError: Unexpected number of live nodes: expected:<4> but 
> was:<3>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.mapreduce.v2.TestRMNMInfo.testRMNMInfo(TestRMNMInfo.java:111)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-998) Persistent resource change during NM/RM restart

2016-03-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199584#comment-15199584
 ] 

Junping Du commented on YARN-998:
-

Hi [~jianhe], would you kindly review the patch again? Thanks!

> Persistent resource change during NM/RM restart
> ---
>
> Key: YARN-998
> URL: https://issues.apache.org/jira/browse/YARN-998
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-998-sample.patch, YARN-998-v1.patch, 
> YARN-998-v2.patch
>
>
> When NM is restarted by plan or from a failure, previous dynamic resource 
> setting should be kept for consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4785) inconsistent value type of the "type" field for LeafQueueInfo in response of RM REST API - cluster/scheduler

2016-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199865#comment-15199865
 ] 

Hadoop QA commented on YARN-4785:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s {color} 
| {color:red} YARN-4785 does not apply to branch-2.6. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12793996/YARN-4785.branch-2.6.001.patch
 |
| JIRA Issue | YARN-4785 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10806/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> inconsistent value type of the "type" field for LeafQueueInfo in response of 
> RM REST API - cluster/scheduler
> 
>
> Key: YARN-4785
> URL: https://issues.apache.org/jira/browse/YARN-4785
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.6.0
>Reporter: Jayesh
>Assignee: Varun Vasudev
>  Labels: REST_API
> Attachments: YARN-4785.001.patch, YARN-4785.branch-2.6.001.patch, 
> YARN-4785.branch-2.7.001.patch
>
>
> I see inconsistent value type ( String and Array ) of the "type" field for 
> LeafQueueInfo in response of RM REST API - cluster/scheduler
> as per the spec it should be always String.
> here is the sample output ( removed non-relevant fields )
> {code}
> {
>   "scheduler": {
> "schedulerInfo": {
>   "type": "capacityScheduler",
>   "capacity": 100,
>   ...
>   "queueName": "root",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 0.1,
> 
>   },
>   {
> "type": [
>   "capacitySchedulerLeafQueueInfo"
> ],
> "capacity": 0.1,
> "queueName": "test-queue",
> "state": "RUNNING",
> 
>   },
>   {
> "type": [
>   "capacitySchedulerLeafQueueInfo"
> ],
> "capacity": 2.5,
> 
>   },
>   {
> "capacity": 25,
> 
> "state": "RUNNING",
> "queues": {
>   "queue": [
> {
>   "capacity": 6,
>   "state": "RUNNING",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 100,
> ...
>   }
> ]
>   },
>   
> },
> {
>   "capacity": 6,
>   ...
>   "state": "RUNNING",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 100,
> ...
>   }
> ]
>   },
>   ...
> },
> ...
>   ]
> },
> ...
>   }
> ]
>   }
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4823) Refactor the nested reservation id field in listReservation to simple string field

2016-03-19 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200852#comment-15200852
 ] 

Subru Krishnan commented on YARN-4823:
--

The test case failures are consistent and unrelated and are covered in YARN-4478

> Refactor the nested reservation id field in listReservation to simple string 
> field
> --
>
> Key: YARN-4823
> URL: https://issues.apache.org/jira/browse/YARN-4823
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-4823-v1.patch
>
>
> The listReservation REST API returns a ReservationId field which has a nested 
> id field which is also called ReservationId. This JIRA proposes to rename the 
> nested field to a string as it's easier to read and moreover what the 
> update/delete APIs take in as input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4835) [YARN-3368] REST API related changes for new Web UI

2016-03-19 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4835:
---
Description: 
Following things need to be added for AM related web pages.

1. Support task state query param in REST URL for fetching tasks.
2. Support task attempt state query param in REST URL for fetching task 
attempts.
3. A new REST endpoint to fetch counters for each task belonging to a job. Also 
have a query param for counter name.
   i.e. something like :
  {{/jobs/\{jobid\}/taskCounters}}
4. A REST endpoint in NM for fetching all log files associated with a 
container. Useful if logs served by NM.

> [YARN-3368] REST API related changes for new Web UI
> ---
>
> Key: YARN-4835
> URL: https://issues.apache.org/jira/browse/YARN-4835
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>
> Following things need to be added for AM related web pages.
> 1. Support task state query param in REST URL for fetching tasks.
> 2. Support task attempt state query param in REST URL for fetching task 
> attempts.
> 3. A new REST endpoint to fetch counters for each task belonging to a job. 
> Also have a query param for counter name.
>i.e. something like :
>   {{/jobs/\{jobid\}/taskCounters}}
> 4. A REST endpoint in NM for fetching all log files associated with a 
> container. Useful if logs served by NM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4815) ATS 1.5 timelineclinet impl try to create attempt directory for every event call

2016-03-19 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4815:

Attachment: YARN-4815.2.patch

rebase the patch

> ATS 1.5 timelineclinet impl try to create attempt directory for every event 
> call
> 
>
> Key: YARN-4815
> URL: https://issues.apache.org/jira/browse/YARN-4815
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4815.1.patch, YARN-4815.2.patch
>
>
> ATS 1.5 timelineclinet impl, try to create attempt directory for every event 
> call. Since per attempt only one call to create directory is enough, this is 
> causing perf issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4808) SchedulerNode can use a few more cosmetic changes

2016-03-19 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15202938#comment-15202938
 ] 

Karthik Kambatla commented on YARN-4808:


[~leftnoteasy] - will you be able to review this? 

> SchedulerNode can use a few more cosmetic changes
> -
>
> Key: YARN-4808
> URL: https://issues.apache.org/jira/browse/YARN-4808
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-4808-1.patch
>
>
> We have made some cosmetic changes to SchedulerNode recently. While working 
> on YARN-4511, realized we could improve it a little more:
> # Remove volatile variables - don't see the need for them being volatile
> # Some methods end up doing very similar things, so consolidating them
> # Renaming totalResource to capacity. YARN-4511 plans to add inflatedCapacity 
> to include the un-utilized resources, and having two totals can be a little 
> confusing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4062) Add the flush and compaction functionality via coprocessors and scanners for flow run table

2016-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200736#comment-15200736
 ] 

Hadoop QA commented on YARN-4062:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 48s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
12s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 57s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 28s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
42s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 30s 
{color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
42s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
14s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 24s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 45s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 58s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 3m 23s {color} 
| {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.8.0_74 with JDK v1.8.0_74 
generated 1 new + 9 unchanged - 0 fixed = 10 total (was 9) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 58s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 22s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 5m 45s {color} 
| {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.7.0_95 with JDK v1.7.0_95 
generated 1 new + 10 unchanged - 0 fixed = 11 total (was 10) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
37s {color} | {color:green} hadoop-yarn-project/hadoop-yarn: patch generated 0 
new + 212 unchanged - 1 fixed = 212 total (was 213) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
36s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 38s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_74. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 0s 
{color} | {color:green} 

[jira] [Commented] (YARN-4517) [YARN-3368] Add nodes page

2016-03-19 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199761#comment-15199761
 ] 

Varun Saxena commented on YARN-4517:


[~gtCarrera9], as per discussion with Wangda, unification of app/container 
information, we can do after this branch goes into trunk. I think we can 
definitely unify and have a single container page. I will do this later as part 
of another JIRA. NM single app page, will have to see. 

Regarding this, 
bq. However, in the new UI, when the node is shutdown, I just could not hold 
myself to try to find a link to the NM logs to figure out why. I think the 
workflow here changed slightly, hence the user experience. Some other projects 
like Apache Ambari may want to maintain those information as well, but in YARN, 
it will be great if we could provide our users a way out. Maybe something like: 
"You have to ssh to the missing nodes' /xxx/ dir to look for the logs" 
would even be helpful.
In old UI, we did not show anything if node was shutdown. So here, the change 
is that we are showing even the nodes which have been SHUTDOWN. Just thought 
that this might be useful info for the admin.
Now, some information regarding on which path, user can check the logs may be 
useful but currently this information is not available in RM. I am not sure how 
NM can report this to RM. You can report it in node registration but do we need 
to ?
Ambari may have this information because I guess it knows exactly where 
installation has been done through it.

Regarding node labels, we can add it in REST response indicating if labels are 
enabled or not. We can do this later because this would require another JIRA 
for REST changes.

500 error I think you are encountering on app page. I will fix it while doing 
AM pages or as a separate JIRA.


> [YARN-3368] Add nodes page
> --
>
> Key: YARN-4517
> URL: https://issues.apache.org/jira/browse/YARN-4517
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Wangda Tan
>Assignee: Varun Saxena
>  Labels: webui
> Attachments: (21-Feb-2016)yarn-ui-screenshots.zip, 
> Screenshot_after_4709.png, Screenshot_after_4709_1.png, 
> YARN-4517-YARN-3368.01.patch, YARN-4517-YARN-3368.02.patch
>
>
> We need nodes page added to next generation web UI, similar to existing 
> RM/nodes page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2965) Enhance Node Managers to monitor and report the resource usage on machines

2016-03-19 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198204#comment-15198204
 ] 

Srikanth Kandula commented on YARN-2965:


Go for it :-)  We have some dummy code that was good enough to get numbers and 
experiments but are not actively working on pushing that in. Inigo, i will 
share that code with you offline so you can pick any useful pieces if you like 
from that.

> Enhance Node Managers to monitor and report the resource usage on machines
> --
>
> Key: YARN-2965
> URL: https://issues.apache.org/jira/browse/YARN-2965
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Robert Grandl
>Assignee: Robert Grandl
> Attachments: ddoc_RT.docx
>
>
> This JIRA is about augmenting Node Managers to monitor the resource usage on 
> the machine, aggregates these reports and exposes them to the RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4820) ResourceManager web redirects in HA mode drops query parameters

2016-03-19 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197235#comment-15197235
 ] 

Varun Vasudev commented on YARN-4820:
-

[~steve_l] - sorry I didn't understand the case you mentioned. You're talking 
about a scenario where the active RM web services redirect you to a standby RM?

> ResourceManager web redirects in HA mode drops query parameters
> ---
>
> Key: YARN-4820
> URL: https://issues.apache.org/jira/browse/YARN-4820
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4820.001.patch
>
>
> The RMWebAppFilter redirects http requests from the standby to the active. 
> However it drops all the query parameters when it does the redirect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4062) Add the flush and compaction functionality via coprocessors and scanners for flow run table

2016-03-19 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200414#comment-15200414
 ] 

Sangjin Lee commented on YARN-4062:
---

LGTM pending jenkins. I'll commit it once the jenkins comes back clean.

> Add the flush and compaction functionality via coprocessors and scanners for 
> flow run table
> ---
>
> Key: YARN-4062
> URL: https://issues.apache.org/jira/browse/YARN-4062
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4062-YARN-2928.04.patch, 
> YARN-4062-YARN-2928.05.patch, YARN-4062-YARN-2928.06.patch, 
> YARN-4062-YARN-2928.07.patch, YARN-4062-YARN-2928.08.patch, 
> YARN-4062-YARN-2928.09.patch, YARN-4062-YARN-2928.1.patch, 
> YARN-4062-feature-YARN-2928.01.patch, YARN-4062-feature-YARN-2928.02.patch, 
> YARN-4062-feature-YARN-2928.03.patch
>
>
> As part of YARN-3901, coprocessor and scanner is being added for storing into 
> the flow_run table. It also needs a flush & compaction processing in the 
> coprocessor and perhaps a new scanner to deal with the data during flushing 
> and compaction stages. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4837) User facing aspects of 'AM blacklisting' feature need fixing

2016-03-19 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-4837:
-

 Summary: User facing aspects of 'AM blacklisting' feature need 
fixing
 Key: YARN-4837
 URL: https://issues.apache.org/jira/browse/YARN-4837
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


Was reviewing the user-facing aspects that we are releasing as part of 2.8.0.

Looking at the 'AM blacklisting feature', I see several things to be fixed 
before we release it in 2.8.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4823) Refactor the nested reservation id field in listReservation to simple string field

2016-03-19 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-4823:
-
Attachment: YARN-4823-v1.patch

Attaching a patch that refactors the nested reservation id field in 
listReservation to simple string field

> Refactor the nested reservation id field in listReservation to simple string 
> field
> --
>
> Key: YARN-4823
> URL: https://issues.apache.org/jira/browse/YARN-4823
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-4823-v1.patch
>
>
> The listReservation REST API returns a ReservationId field which has a nested 
> id field which is also called ReservationId. This JIRA proposes to rename the 
> nested field to a string as it's easier to read and moreover what the 
> update/delete APIs take in as input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4820) ResourceManager web redirects in HA mode drops query parameters

2016-03-19 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200204#comment-15200204
 ] 

Steve Loughran commented on YARN-4820:
--

ok, I'm wrong, this is a 307, not a 302 ... ignore everything I was complaining 
about

> ResourceManager web redirects in HA mode drops query parameters
> ---
>
> Key: YARN-4820
> URL: https://issues.apache.org/jira/browse/YARN-4820
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4820.001.patch, YARN-4820.002.patch
>
>
> The RMWebAppFilter redirects http requests from the standby to the active. 
> However it drops all the query parameters when it does the redirect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4829) Add support for binary units

2016-03-19 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197795#comment-15197795
 ] 

Arun Suresh commented on YARN-4829:
---

The patch looks mostly good. Thanks [~vvasudev]

Couplo minor nits:
* Maybe rename *Mi, Ti, Pi* to *Me, Te, Pe* or maybe replace eveything with *b 
(Kb, Mb..) to signify binary ?
* Can we have a test case that converts between a binary to non-binary (K to 
Ki) for eg. ? 

> Add support for binary units
> 
>
> Key: YARN-4829
> URL: https://issues.apache.org/jira/browse/YARN-4829
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4829-YARN-3926.001.patch
>
>
> The units conversion util should have support for binary units.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4839) ResourceManager deadlock between RMAppAttemptImpl and SchedulerApplicationAttempt

2016-03-19 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201669#comment-15201669
 ] 

Sangjin Lee commented on YARN-4839:
---

Could this be the same issue as pointed out by YARN-4247? We did see this issue 
in our environment (which is 2.6 + patches), but that was because we backported 
YARN-2005 without YARN-3361. Not sure if there has been a more recent 
regression.

> ResourceManager deadlock between RMAppAttemptImpl and 
> SchedulerApplicationAttempt
> -
>
> Key: YARN-4839
> URL: https://issues.apache.org/jira/browse/YARN-4839
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Jason Lowe
>Priority: Blocker
>
> Hit a deadlock in the ResourceManager as one thread was holding the 
> SchedulerApplicationAttempt lock and trying to call 
> RMAppAttemptImpl.getMasterContainer while another thread had the 
> RMAppAttemptImpl lock and was trying to call 
> SchedulerApplicationAttempt.getResourceUsageReport.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4830) Add support for resource types in the nodemanager

2016-03-19 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-4830:

Component/s: (was: resourcemanager)

> Add support for resource types in the nodemanager
> -
>
> Key: YARN-4830
> URL: https://issues.apache.org/jira/browse/YARN-4830
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2016-03-19 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197237#comment-15197237
 ] 

Naganarasimha G R commented on YARN-796:


Hi [~jameszhouyi], In 2.6.0 label exclusivity is not supported and hope you are 
also aware that labels are supported only in CS

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, 
> Node-labels-Requirements-Design-doc-V2.pdf, 
> Non-exclusive-Node-Partition-Design.pdf, YARN-796-Diagram.pdf, 
> YARN-796.node-label.consolidate.1.patch, 
> YARN-796.node-label.consolidate.10.patch, 
> YARN-796.node-label.consolidate.11.patch, 
> YARN-796.node-label.consolidate.12.patch, 
> YARN-796.node-label.consolidate.13.patch, 
> YARN-796.node-label.consolidate.14.patch, 
> YARN-796.node-label.consolidate.2.patch, 
> YARN-796.node-label.consolidate.3.patch, 
> YARN-796.node-label.consolidate.4.patch, 
> YARN-796.node-label.consolidate.5.patch, 
> YARN-796.node-label.consolidate.6.patch, 
> YARN-796.node-label.consolidate.7.patch, 
> YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
> YARN-796.patch, YARN-796.patch4
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4785) inconsistent value type of the "type" field for LeafQueueInfo in response of RM REST API - cluster/scheduler

2016-03-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199809#comment-15199809
 ] 

Junping Du commented on YARN-4785:
--

Thanks [~vvasudev], the patch for branch-2.6 and branch-2.7 LGTM. Will commit 
them shortly.

> inconsistent value type of the "type" field for LeafQueueInfo in response of 
> RM REST API - cluster/scheduler
> 
>
> Key: YARN-4785
> URL: https://issues.apache.org/jira/browse/YARN-4785
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.6.0
>Reporter: Jayesh
>Assignee: Varun Vasudev
>  Labels: REST_API
> Attachments: YARN-4785.001.patch, YARN-4785.branch-2.6.001.patch, 
> YARN-4785.branch-2.7.001.patch
>
>
> I see inconsistent value type ( String and Array ) of the "type" field for 
> LeafQueueInfo in response of RM REST API - cluster/scheduler
> as per the spec it should be always String.
> here is the sample output ( removed non-relevant fields )
> {code}
> {
>   "scheduler": {
> "schedulerInfo": {
>   "type": "capacityScheduler",
>   "capacity": 100,
>   ...
>   "queueName": "root",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 0.1,
> 
>   },
>   {
> "type": [
>   "capacitySchedulerLeafQueueInfo"
> ],
> "capacity": 0.1,
> "queueName": "test-queue",
> "state": "RUNNING",
> 
>   },
>   {
> "type": [
>   "capacitySchedulerLeafQueueInfo"
> ],
> "capacity": 2.5,
> 
>   },
>   {
> "capacity": 25,
> 
> "state": "RUNNING",
> "queues": {
>   "queue": [
> {
>   "capacity": 6,
>   "state": "RUNNING",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 100,
> ...
>   }
> ]
>   },
>   
> },
> {
>   "capacity": 6,
>   ...
>   "state": "RUNNING",
>   "queues": {
> "queue": [
>   {
> "type": "capacitySchedulerLeafQueueInfo",
> "capacity": 100,
> ...
>   }
> ]
>   },
>   ...
> },
> ...
>   ]
> },
> ...
>   }
> ]
>   }
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4746) yarn web services should convert parse failures of appId to 400

2016-03-19 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4746:
---
Attachment: 0003-YARN-4746.patch

Attaching patch after testcase fix

> yarn web services should convert parse failures of appId to 400
> ---
>
> Key: YARN-4746
> URL: https://issues.apache.org/jira/browse/YARN-4746
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Priority: Minor
> Attachments: 0001-YARN-4746.patch, 0002-YARN-4746.patch, 
> 0003-YARN-4746.patch, 0003-YARN-4746.patch
>
>
> I'm seeing somewhere in the WS API tests of mine an error with exception 
> conversion of  a bad app ID sent in as an argument to a GET. I know it's in 
> ATS, but a scan of the core RM web services implies a same problem
> {{WebServices.parseApplicationId()}} uses {{ConverterUtils.toApplicationId}} 
> to convert an argument; this throws IllegalArgumentException, which is then 
> handled somewhere by jetty as a 500 error.
> In fact, it's a bad argument, which should be handled by returning a 400. 
> This can be done by catching the raised argument and explicitly converting it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4390) Consider container request size during CS preemption

2016-03-19 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198795#comment-15198795
 ] 

Wangda Tan commented on YARN-4390:
--

Hi [~eepayne],

Since YARN-4108 doesn't solve all the issues. (I planned to solve this together 
with YARN-4108, but YARN-4108 only tackled half of the problem: when containers 
selected, only preempt useful containers). However, we need select container 
more clever based on requirement. I'm thinking about this recently and I plan 
to make some progresses as soon as possible. May I reopen this JIRA and take 
over from you?

> Consider container request size during CS preemption
> 
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.0.0, 2.8.0, 2.7.3
>Reporter: Eric Payne
>Assignee: Eric Payne
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >