[jira] [Commented] (YARN-4289) TestDistributedShell failing with bind exception

2015-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969509#comment-14969509
 ] 

Hadoop QA commented on YARN-4289:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   6m  4s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 52s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 21s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 21s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 42s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   6m 54s | Tests passed in 
hadoop-yarn-applications-distributedshell. |
| | |  24m 22s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768066/YARN-4289.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / 4c0bae2 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9525/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-applications-distributedshell test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9525/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9525/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9525/console |


This message was automatically generated.

> TestDistributedShell failing with bind exception
> 
>
> Key: YARN-4289
> URL: https://issues.apache.org/jira/browse/YARN-4289
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: YARN-4289.patch
>
>
>  *Stacktrace* 
> {noformat}
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> ApplicationHistoryServer failed to start. Final state is STOPPED
>   at 
> org.apache.hadoop.yarn.server.MiniYARNCluster$ApplicationHistoryServerWrapper.serviceStart(MiniYARNCluster.java:726)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.setupInternal(TestDistributedShell.java:91)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.setup(TestDistributedShell.java:71)
> Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.net.BindException: Problem binding to [0.0.0.0:10200] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
>   at 
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
>   at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.serviceStart(ApplicationHistoryClientService.java:93)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceStart(ApplicationHistoryServer.java:118)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.MiniYARNCluster$ApplicationHistoryServerWrapper$1.run(MiniYARNCluster.java:715)
> {noformat}
> https://builds.apache.org/job/Hadoop-Yarn-trunk/1290/



--
This message 

[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-22 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969519#comment-14969519
 ] 

Xuan Gong commented on YARN-4243:
-

Thanks for the comments, [~kasha]

bq. To be consistent with the other config, can we call use zk-retries instead 
of zk.op.retries

Modified

bq. should we make the change to ActiveStandbyElector as a Common JIRA or at 
least create a Common JIRA and close it as part of this one, so the common and 
HDFS devs are aware of this change? They might want to update the way HDFS 
handles the retries situation as well.

Created https://issues.apache.org/jira/browse/HADOOP-12503, and link the ticket

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2913) Fair scheduler should have ability to set MaxResourceDefault for each queue

2015-10-22 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-2913:
--
Attachment: YARN-2913.v4.patch

> Fair scheduler should have ability to set MaxResourceDefault for each queue
> ---
>
> Key: YARN-2913
> URL: https://issues.apache.org/jira/browse/YARN-2913
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-2913.v1.patch, YARN-2913.v2.patch, 
> YARN-2913.v3.patch, YARN-2913.v4.patch
>
>
> Queues that are created on the fly have the max resource of the entire 
> cluster. Fair Scheduler should have a default maxResource to control the 
> maxResource of those queues



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4256) YARN fair scheduler vcores with decimal values

2015-10-22 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4256:

Hadoop Flags: Reviewed

> YARN fair scheduler vcores with decimal values
> --
>
> Key: YARN-4256
> URL: https://issues.apache.org/jira/browse/YARN-4256
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Prabhu Joseph
>Assignee: Jun Gong
>Priority: Minor
> Fix For: 2.7.2
>
> Attachments: YARN-4256.001.patch, YARN-4256.002.patch
>
>
> When the queue with vcores is in decimal value, the value after the decimal 
> point is taken as vcores by FairScheduler.
> For the below queue,
> 2 mb,20 vcores,20.25 disks
> 3 mb,40.2 vcores,30.25 disks
> When many applications submitted  parallely into queue, all were in PENDING 
> state as the vcores is taken as 2 skipping the value 40.
> The code FairSchedulerConfiguration.java to Pattern match the vcores has to 
> be improved in such a way either throw 
> AllocationConfigurationException("Missing resource") or consider the value 
> before decimal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3739) Add reservation system recovery to RM recovery process

2015-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969541#comment-14969541
 ] 

Hudson commented on YARN-3739:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2515 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2515/])
YARN-3739. Add reservation system recovery to RM recovery process. (adhoot: rev 
2798723a5443d04455b9d79c48d61f435ab52267)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractSchedulerPlanFollower.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/FairSchedulerPlanFollower.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/PlanningAlgorithm.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestReservationSystemWithRMHA.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/CapacitySchedulerPlanFollower.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestAlignedPlanner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityOverTimePolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/PlanFollower.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestNoOverCommitPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestSchedulerPlanFollowerBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestSimpleCapacityReplanner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/InMemoryPlan.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystem.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/PlanEdit.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestGreedyReservationAgent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestInMemoryPlan.java


> Add reservation system recovery to RM recovery process
> --
>
> Key: YARN-3739
> URL: https://issues.apache.org/jira/browse/YARN-3739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Fix For: 2.8.0
>
> Attachments: YARN-3739-v1.patch, YARN-3739-v2.patch, 
> YARN-3739-v3.patch
>
>
> YARN-1051 introduced a reservation system in the YARN RM. This JIRA tracks 
> the recovery of the reservation system in case of a RM failover.



--
This message was sent by 

[jira] [Commented] (YARN-3739) Add reservation system recovery to RM recovery process

2015-10-22 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969732#comment-14969732
 ] 

Subru Krishnan commented on YARN-3739:
--

Thanks [~adhoot] for reviewing and committing the patch.

For the record, _TestApplicationPriority_ failure is unrelated to patch. I ran 
the test locally & it runs fine.

> Add reservation system recovery to RM recovery process
> --
>
> Key: YARN-3739
> URL: https://issues.apache.org/jira/browse/YARN-3739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Fix For: 2.8.0
>
> Attachments: YARN-3739-v1.patch, YARN-3739-v2.patch, 
> YARN-3739-v3.patch
>
>
> YARN-1051 introduced a reservation system in the YARN RM. This JIRA tracks 
> the recovery of the reservation system in case of a RM failover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4256) YARN fair scheduler vcores with decimal values

2015-10-22 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4256:

Target Version/s: 2.8.0  (was: 2.7.2)

> YARN fair scheduler vcores with decimal values
> --
>
> Key: YARN-4256
> URL: https://issues.apache.org/jira/browse/YARN-4256
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Prabhu Joseph
>Assignee: Jun Gong
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-4256.001.patch, YARN-4256.002.patch
>
>
> When the queue with vcores is in decimal value, the value after the decimal 
> point is taken as vcores by FairScheduler.
> For the below queue,
> 2 mb,20 vcores,20.25 disks
> 3 mb,40.2 vcores,30.25 disks
> When many applications submitted  parallely into queue, all were in PENDING 
> state as the vcores is taken as 2 skipping the value 40.
> The code FairSchedulerConfiguration.java to Pattern match the vcores has to 
> be improved in such a way either throw 
> AllocationConfigurationException("Missing resource") or consider the value 
> before decimal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4280) CapacityScheduler reservations may not prevent indefinite postponement on a busy cluster

2015-10-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968607#comment-14968607
 ] 

Wangda Tan commented on YARN-4280:
--

Thanks for sharing thoughts, [~jlowe]. Agree, and this requires to do resource 
limit check for reserved container also, and CS may need to allocate reserved 
container from top to bottom instead of going to leaf queue directly. I'm fine 
with this approach. Any suggestions?

> CapacityScheduler reservations may not prevent indefinite postponement on a 
> busy cluster
> 
>
> Key: YARN-4280
> URL: https://issues.apache.org/jira/browse/YARN-4280
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.6.1, 2.8.0, 2.7.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>
> Consider the following scenario:
> There are 2 queues A(25% of the total capacity) and B(75%), both can run at 
> total cluster capacity. There are 2 applications, appX that runs on Queue A, 
> always asking for 1G containers(non-AM) and appY runs on Queue B asking for 2 
> GB containers.
> The user limit is high enough for the application to reach 100% of the 
> cluster resource. 
> appX is running at total cluster capacity, full with 1G containers releasing 
> only one container at a time. appY comes in with a request of 2GB container 
> but only 1 GB is free. Ideally, since appY is in the underserved queue, it 
> has higher priority and should reserve for its 2 GB request. Since this 
> request puts the alloc+reserve above total capacity of the cluster, 
> reservation is not made. appX comes in with a 1GB request and since 1GB is 
> still available, the request is allocated. 
> This can continue indefinitely causing priority inversion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-10-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968665#comment-14968665
 ] 

Wangda Tan commented on YARN-4287:
--

Thanks [~nroberts], +1 to have an interim solution, the proposal looks good 
also, will review patch soon.

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3573) MiniMRYarnCluster constructor that starts the timeline server using a boolean should be marked deprecated

2015-10-22 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-3573:
-
Summary: MiniMRYarnCluster constructor that starts the timeline server 
using a boolean should be marked deprecated  (was: MiniMRYarnCluster 
constructor that starts the timeline server using a boolean should be marked 
depricated)

> MiniMRYarnCluster constructor that starts the timeline server using a boolean 
> should be marked deprecated
> -
>
> Key: YARN-3573
> URL: https://issues.apache.org/jira/browse/YARN-3573
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Brahma Reddy Battula
> Attachments: YARN-3573-002.patch, YARN-3573.patch
>
>
> {code}MiniMRYarnCluster(String testName, int noOfNMs, boolean enableAHS){code}
> starts the timeline server using *boolean enableAHS*. It is better to have 
> the timelineserver started based on the config value.
> We should mark this constructor as deprecated to avoid its future use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1565) Add a way for YARN clients to get critical YARN system properties from the RM

2015-10-22 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969562#comment-14969562
 ] 

Steve Loughran commented on YARN-1565:
--

-ignore the yarn config one. like you say: not a new problem. And no, no 
findbugs errors listed.

> Add a way for YARN clients to get critical YARN system properties from the RM
> -
>
> Key: YARN-1565
> URL: https://issues.apache.org/jira/browse/YARN-1565
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Steve Loughran
> Attachments: YARN-1565-001.patch, YARN-1565-002.patch, 
> YARN-1565-003.patch
>
>
> If you are trying to build up an AM request, you need to know
> # the limits of memory, core  for the chosen queue
> # the existing YARN classpath
> # the path separator for the target platform (so your classpath comes out 
> right)
> # cluster OS: in case you need some OS-specific changes
> The classpath can be in yarn-site.xml, but a remote client may not have that. 
> The site-xml file doesn't list Queue resource limits, cluster OS or the path 
> separator.
> A way to query the RM for these values would make it easier for YARN clients 
> to build up AM submissions with less guesswork and client-side config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4169) jenkins trunk+java build failed in TestNodeStatusUpdaterForLabels

2015-10-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969522#comment-14969522
 ] 

Wangda Tan commented on YARN-4169:
--

[~Naganarasimha], sorry for my late response, I think the approach in your 
latest patch looks good, we'd better not do major refactoring to 
NodeStatusUpdater. Rekicking Jenkins..

> jenkins trunk+java build failed in TestNodeStatusUpdaterForLabels
> -
>
> Key: YARN-4169
> URL: https://issues.apache.org/jira/browse/YARN-4169
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
> Environment: Jenkins
>Reporter: Steve Loughran
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-4169.v1.001.patch, YARN-4169.v1.002.patch, 
> YARN-4169.v1.003.patch
>
>
> Test failing in [[Jenkins build 
> 402|https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-Yarn-trunk-Java8/402/testReport/junit/org.apache.hadoop.yarn.server.nodemanager/TestNodeStatusUpdaterForLabels/testNodeStatusUpdaterForNodeLabels/]
> {code}
> java.lang.NullPointerException: null
>   at java.util.HashSet.(HashSet.java:118)
>   at 
> org.apache.hadoop.yarn.nodelabels.NodeLabelTestBase.assertNLCollectionEquals(NodeLabelTestBase.java:103)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels.testNodeStatusUpdaterForNodeLabels(TestNodeStatusUpdaterForLabels.java:268)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4285) Display resource usage as percentage of queue and cluster in the RM UI

2015-10-22 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-4285:

Attachment: YARN-4285.002.patch

Thanks for the feedback [~leftnoteasy]. I've updated a new patch with your 
suggestions. Can you let me know if it looks ok? Thanks!

> Display resource usage as percentage of queue and cluster in the RM UI
> --
>
> Key: YARN-4285
> URL: https://issues.apache.org/jira/browse/YARN-4285
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4285.001.patch, YARN-4285.002.patch
>
>
> Currently, we display the memory and vcores allocated to an app in the RM UI. 
> It would be useful to display the resources consumed as a %of the queue and 
> the cluster to identify apps that are using a lot of resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4285) Display resource usage as percentage of queue and cluster in the RM UI

2015-10-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969638#comment-14969638
 ] 

Wangda Tan commented on YARN-4285:
--

Thanks for updating, [~vvasudev]!

Some comments:

1) Since ApplicationResourceUsageReport is public API, I suggest to rename 
parameter, getter/setter name from Perc to Percentage. Same as 
yarn_protos.proto.

2) 
{code}
if (rmContext.getScheduler() instanceof YarnScheduler) {
  calc = rmContext.getScheduler().getResourceCalculator();
}
{code}
rmContext.getScheduler() should be always YarnScheduler, correct? This check 
maybe not required.

3) Is it better to change int percentage to float, for AppInfo and other APIs.

4) Since this is not a trivial patch, could you add some tests to verify if 
SchedulerApplicationAttempt can return percentage properly?

> Display resource usage as percentage of queue and cluster in the RM UI
> --
>
> Key: YARN-4285
> URL: https://issues.apache.org/jira/browse/YARN-4285
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4285.001.patch, YARN-4285.002.patch
>
>
> Currently, we display the memory and vcores allocated to an app in the RM UI. 
> It would be useful to display the resources consumed as a %of the queue and 
> the cluster to identify apps that are using a lot of resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2015-10-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969666#comment-14969666
 ] 

Wangda Tan commented on YARN-3216:
--

[~sunilg].

I just checked code: getComplexConfigurationWithQueueLabels and other methods 
used by TestNodeLabelContainerAllocation has default node label expression 
setting. IIUC, these tests shouldn't fail. I think they may relate to the code 
in MockRM:

{code}
523 amResourceRequest.setNodeLabelExpression((amLabel == null) ? "" : 
amLabel
524 .trim());
{code}

Thanks,

> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, 
> 0003-YARN-3216.patch, 0004-YARN-3216.patch, 0005-YARN-3216.patch, 
> 0006-YARN-3216.patch, 0007-YARN-3216.patch, 0008-YARN-3216.patch, 
> 0009-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3738) Add support for recovery of reserved apps (running under dynamic queues) to Capacity Scheduler

2015-10-22 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-3738:
-
Attachment: YARN-3738-v2.patch

Uploading a rebased patch to incorporate YARN-3739 commit.

I have also included couple of [~adhoot]'s offline feedback for the test:
   * Remove _Thread.sleep()_ as it's unreliable.
   * Update comments to reflect what is being verified as the previous one was 
copy-pasted from another test.

> Add support for recovery of reserved apps (running under dynamic queues) to 
> Capacity Scheduler
> --
>
> Key: YARN-3738
> URL: https://issues.apache.org/jira/browse/YARN-3738
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-3738-v2.patch, YARN-3738.patch
>
>
> YARN-3736 persists the current state of the Plan to the RMStateStore. This 
> JIRA covers recovery of the Plan, i.e. dynamic reservation queues with 
> associated apps as part Capacity Scheduler failover mechanism.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-22 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969804#comment-14969804
 ] 

Xuan Gong commented on YARN-4243:
-

Testcase failures and findbug warnings are not related

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4169) jenkins trunk+java build failed in TestNodeStatusUpdaterForLabels

2015-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969529#comment-14969529
 ] 

Hadoop QA commented on YARN-4169:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12763997/YARN-4169.v1.003.patch 
|
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | trunk / 4c0bae2 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9527/console |


This message was automatically generated.

> jenkins trunk+java build failed in TestNodeStatusUpdaterForLabels
> -
>
> Key: YARN-4169
> URL: https://issues.apache.org/jira/browse/YARN-4169
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
> Environment: Jenkins
>Reporter: Steve Loughran
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-4169.v1.001.patch, YARN-4169.v1.002.patch, 
> YARN-4169.v1.003.patch
>
>
> Test failing in [[Jenkins build 
> 402|https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-Yarn-trunk-Java8/402/testReport/junit/org.apache.hadoop.yarn.server.nodemanager/TestNodeStatusUpdaterForLabels/testNodeStatusUpdaterForNodeLabels/]
> {code}
> java.lang.NullPointerException: null
>   at java.util.HashSet.(HashSet.java:118)
>   at 
> org.apache.hadoop.yarn.nodelabels.NodeLabelTestBase.assertNLCollectionEquals(NodeLabelTestBase.java:103)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels.testNodeStatusUpdaterForNodeLabels(TestNodeStatusUpdaterForLabels.java:268)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-22 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4243:

Attachment: YARN-4243.5.patch

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1565) Add a way for YARN clients to get critical YARN system properties from the RM

2015-10-22 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969568#comment-14969568
 ] 

Steve Loughran commented on YARN-1565:
--

Looking at the code, given the SystemPropertyInfo is to be a new structure that 
will evolve over time, let's keep it as simple as possible -making the 
classpath field public and skipping the explicit accessor method. 

Otherwise, I like this: Addresses a core problem on hadoop apps. 

I think later on I'd like more of the environment and things, but starting with 
classpath means we don't have to worry about exposing security information 
which we could otherwise unintentionally leak. (other than the list of 
artifacts on the CP,  hence potentially known vulnerabilities). 

Now, we'll also need some documentation in the yarn-site pages, won't we?

> Add a way for YARN clients to get critical YARN system properties from the RM
> -
>
> Key: YARN-1565
> URL: https://issues.apache.org/jira/browse/YARN-1565
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Steve Loughran
> Attachments: YARN-1565-001.patch, YARN-1565-002.patch, 
> YARN-1565-003.patch
>
>
> If you are trying to build up an AM request, you need to know
> # the limits of memory, core  for the chosen queue
> # the existing YARN classpath
> # the path separator for the target platform (so your classpath comes out 
> right)
> # cluster OS: in case you need some OS-specific changes
> The classpath can be in yarn-site.xml, but a remote client may not have that. 
> The site-xml file doesn't list Queue resource limits, cluster OS or the path 
> separator.
> A way to query the RM for these values would make it easier for YARN clients 
> to build up AM submissions with less guesswork and client-side config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-22 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969576#comment-14969576
 ] 

Junping Du commented on YARN-4243:
--

v5 patch LGTM. +1 pending on Jenkins result.

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4129) Refactor the SystemMetricPublisher in RM to better support newer events

2015-10-22 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969532#comment-14969532
 ] 

Sangjin Lee commented on YARN-4129:
---

Rekicked the jenkins build.

> Refactor the SystemMetricPublisher in RM to better support newer events
> ---
>
> Key: YARN-4129
> URL: https://issues.apache.org/jira/browse/YARN-4129
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4129-YARN-2928.002.patch, 
> YARN-4129-YARN-2928.003.patch, YARN-4129-YARN-2928.004.patch, 
> YARN-4129.YARN-2928.001.patch
>
>
> Currently to add new timeline event/ entity in RM side, one has to add a 
> method in publisher and a method in handler and create a new event class 
> which looks cumbersome and redundant. also further all the events might not 
> be required to be published in V1 & V2. So adopting the approach similar to 
> what was adopted in YARN-3045(NM side)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4278) On AM registration, response should include cluster Nodes report on demanded by registration request.

2015-10-22 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969560#comment-14969560
 ] 

Steve Loughran commented on YARN-4278:
--

One thing we do have to consider is the increased cost of this on registering 
every AM. Does anyone know how expensive it is when the client asks for this?

Maybe during registration, the client should set a bool to ask for the map: if 
unset, they don't . This would avoid generating and marshalling the map over 
the wire for those AMs that don't want the data

> On AM registration, response should  include cluster Nodes report on demanded 
> by registration request.
> --
>
> Key: YARN-4278
> URL: https://issues.apache.org/jira/browse/YARN-4278
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>
> From the yarn-dev mailing list discussion thread 
> [Thread-1|http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201510.mbox/%3c0ee80f6f7a98a64ebd18f2be839c91156798a...@szxeml512-mbs.china.huawei.com%3E]
>  
> [Thread-2|http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201510.mbox/%3c4f7812fc-ab5d-465d-ac89-824735698...@hortonworks.com%3E]
>  
> Slider required to know about cluster nodes details for providing support for 
> affinity/anti-affinity on containers.
> Current behavior : During life span of application , updatedNodes are sent in 
> allocate request only if there are any change like added/removed/'state 
> change' in the nodes. Otherwise cluster nodes not updated to AM.
> One of the approach thought by [~ste...@apache.org] is while AM registration 
> let response hold the cluster nodes report



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4009) CORS support for ResourceManager REST API

2015-10-22 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-4009:
--
Attachment: YARN-4009.8.patch

[~vvasudev], you are correct. There is no need for the new flag in the current 
code base. Removed in patch 8. If 
*yarn.timeline-service.http-cross-origin.enabled* is true then it will use 
timeline conf values and the fall back to the common conf settings. I think 
this is an acceptable compromise between backwards compatibility and utilizing 
the new functionality.

> CORS support for ResourceManager REST API
> -
>
> Key: YARN-4009
> URL: https://issues.apache.org/jira/browse/YARN-4009
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Prakash Ramachandran
>Assignee: Varun Vasudev
> Attachments: YARN-4009.001.patch, YARN-4009.002.patch, 
> YARN-4009.003.patch, YARN-4009.004.patch, YARN-4009.005.patch, 
> YARN-4009.006.patch, YARN-4009.007.patch, YARN-4009.8.patch, 
> YARN-4009.LOGGING.patch, YARN-4009.LOGGING.patch
>
>
> Currently the REST API's do not have CORS support. This means any UI (running 
> in browser) cannot consume the REST API's. For ex Tez UI would like to use 
> the REST API for getting application, application attempt information exposed 
> by the API's. 
> It would be very useful if CORS is enabled for the REST API's.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4256) YARN fair scheduler vcores with decimal values

2015-10-22 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969748#comment-14969748
 ] 

zhihai xu commented on YARN-4256:
-

committed it to trunk and branch-2,  Thanks [~Prabhu Joseph] for reporting this 
issue, thanks [~hex108] for the patch and thanks [~brahmareddy] for additional 
review!

> YARN fair scheduler vcores with decimal values
> --
>
> Key: YARN-4256
> URL: https://issues.apache.org/jira/browse/YARN-4256
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Prabhu Joseph
>Assignee: Jun Gong
>Priority: Minor
> Fix For: 2.7.2
>
> Attachments: YARN-4256.001.patch, YARN-4256.002.patch
>
>
> When the queue with vcores is in decimal value, the value after the decimal 
> point is taken as vcores by FairScheduler.
> For the below queue,
> 2 mb,20 vcores,20.25 disks
> 3 mb,40.2 vcores,30.25 disks
> When many applications submitted  parallely into queue, all were in PENDING 
> state as the vcores is taken as 2 skipping the value 40.
> The code FairSchedulerConfiguration.java to Pattern match the vcores has to 
> be improved in such a way either throw 
> AllocationConfigurationException("Missing resource") or consider the value 
> before decimal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969753#comment-14969753
 ] 

Hadoop QA commented on YARN-4243:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  22m 16s | Pre-patch trunk has 3 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m 12s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 42s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 27s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 44s | The applied patch generated  1 
new checkstyle issues (total was 15, now 16). |
| {color:red}-1{color} | checkstyle |   3m 17s | The applied patch generated  2 
new checkstyle issues (total was 211, now 212). |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 43s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   6m 35s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |   7m 33s | Tests passed in 
hadoop-common. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |   2m  0s | Tests failed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |  68m 50s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 133m 22s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.logaggregation.TestAggregatedLogsBlock |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768079/YARN-4243.5.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 4c0bae2 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9526/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-common.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9526/artifact/patchprocess/diffcheckstylehadoop-common.txt
 
https://builds.apache.org/job/PreCommit-YARN-Build/9526/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9526/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9526/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9526/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9526/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9526/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9526/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9526/console |


This message was automatically generated.

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3739) Add reservation system recovery to RM recovery process

2015-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969607#comment-14969607
 ] 

Hudson commented on YARN-3739:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2463 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2463/])
YARN-3739. Add reservation system recovery to RM recovery process. (adhoot: rev 
2798723a5443d04455b9d79c48d61f435ab52267)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestReservationSystemWithRMHA.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/PlanFollower.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestInMemoryPlan.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/InMemoryPlan.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/PlanningAlgorithm.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestNoOverCommitPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/CapacitySchedulerPlanFollower.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestSchedulerPlanFollowerBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractSchedulerPlanFollower.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/PlanEdit.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystem.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityOverTimePolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestSimpleCapacityReplanner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestAlignedPlanner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestGreedyReservationAgent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/FairSchedulerPlanFollower.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java


> Add reservation system recovery to RM recovery process
> --
>
> Key: YARN-3739
> URL: https://issues.apache.org/jira/browse/YARN-3739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Fix For: 2.8.0
>
> Attachments: YARN-3739-v1.patch, YARN-3739-v2.patch, 
> YARN-3739-v3.patch
>
>
> YARN-1051 introduced a reservation system in the YARN RM. This JIRA tracks 
> the recovery of the reservation system in case of a RM failover.



--
This message was sent by Atlassian JIRA

[jira] [Commented] (YARN-4129) Refactor the SystemMetricPublisher in RM to better support newer events

2015-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969793#comment-14969793
 ] 

Hadoop QA commented on YARN-4129:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  19m 25s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   9m 20s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  12m  8s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 20s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 31s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m 31s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 51s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 44s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 40s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  64m 20s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 111m  4s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768050/YARN-4129-YARN-2928.004.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 581a6b6 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9529/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9529/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9529/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9529/console |


This message was automatically generated.

> Refactor the SystemMetricPublisher in RM to better support newer events
> ---
>
> Key: YARN-4129
> URL: https://issues.apache.org/jira/browse/YARN-4129
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4129-YARN-2928.002.patch, 
> YARN-4129-YARN-2928.003.patch, YARN-4129-YARN-2928.004.patch, 
> YARN-4129.YARN-2928.001.patch
>
>
> Currently to add new timeline event/ entity in RM side, one has to add a 
> method in publisher and a method in handler and create a new event class 
> which looks cumbersome and redundant. also further all the events might not 
> be required to be published in V1 & V2. So adopting the approach similar to 
> what was adopted in YARN-3045(NM side)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4009) CORS support for ResourceManager REST API

2015-10-22 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968844#comment-14968844
 ] 

Varun Vasudev commented on YARN-4009:
-

[~jeagles] - with your patch, the timeline service will always require its own 
CORS configuration. Is my understanding correct? In that case, we should remove
{code}
+  /** Enables cross origin support for timeline server. Uses the filter
+   * in hadoop common instead of the timeline server specific one. */
+  public static final String TIMELINE_SERVICE_CROSS_ORIGIN_ENABLED =
+  TIMELINE_SERVICE_PREFIX + "cross-origin.enabled";
+
{code}
from YarnConfiguration.java.

The rest of the patch looks good to me.

> CORS support for ResourceManager REST API
> -
>
> Key: YARN-4009
> URL: https://issues.apache.org/jira/browse/YARN-4009
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Prakash Ramachandran
>Assignee: Varun Vasudev
> Attachments: YARN-4009.001.patch, YARN-4009.002.patch, 
> YARN-4009.003.patch, YARN-4009.004.patch, YARN-4009.005.patch, 
> YARN-4009.006.patch, YARN-4009.007.patch, YARN-4009.LOGGING.patch, 
> YARN-4009.LOGGING.patch
>
>
> Currently the REST API's do not have CORS support. This means any UI (running 
> in browser) cannot consume the REST API's. For ex Tez UI would like to use 
> the REST API for getting application, application attempt information exposed 
> by the API's. 
> It would be very useful if CORS is enabled for the REST API's.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4186) Make WebAppUtils a public API for yarn

2015-10-22 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4186:
-
Issue Type: Improvement  (was: Bug)

> Make WebAppUtils a public API for yarn
> --
>
> Key: YARN-4186
> URL: https://issues.apache.org/jira/browse/YARN-4186
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
> Attachments: YARN-4186-v1.patch
>
>
> Application types like Tez will want to expose AM container log location to 
> users without having to make a call to the RM.
> Exposing getRunningLogURL as public will provide this functionality.
> Jon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2015-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968961#comment-14968961
 ] 

Hadoop QA commented on YARN-3216:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 35s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 7 new or modified test files. |
| {color:green}+1{color} | javac |   8m  7s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 26s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 50s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m 15s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 29s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  64m  8s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 105m 24s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768001/0009-YARN-3216.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 381610d |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9520/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9520/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9520/console |


This message was automatically generated.

> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, 
> 0003-YARN-3216.patch, 0004-YARN-3216.patch, 0005-YARN-3216.patch, 
> 0006-YARN-3216.patch, 0007-YARN-3216.patch, 0008-YARN-3216.patch, 
> 0009-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4286) yarn-default.xml has a typo: yarn-nodemanager.local-dirs should be yarn.nodemanager.local-dirs

2015-10-22 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969020#comment-14969020
 ] 

Tsuyoshi Ozawa commented on YARN-4286:
--

This issue is NOT incompatible change since this typo exists only in 
yarn-default.xml(not in YarnConfiguration.java).

> yarn-default.xml has a typo: yarn-nodemanager.local-dirs should be 
> yarn.nodemanager.local-dirs
> --
>
> Key: YARN-4286
> URL: https://issues.apache.org/jira/browse/YARN-4286
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Dustin Cote
>Priority: Trivial
>  Labels: newbie
>
> In the yarn-default.xml, the property yarn.nodemanager.local-dirs is 
> referenced as yarn-nodemanager.local-dirs in multiple places.  This should be 
> a straightforward fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4256) YARN fair scheduler vcores with decimal values

2015-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969840#comment-14969840
 ] 

Hudson commented on YARN-4256:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #570 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/570/])
YARN-4256. YARN fair scheduler vcores with decimal values. Contributed (zxu: 
rev 960201b79b9f2ca40f8eadb21a2f9fe37dde2b5d)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java


> YARN fair scheduler vcores with decimal values
> --
>
> Key: YARN-4256
> URL: https://issues.apache.org/jira/browse/YARN-4256
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Prabhu Joseph
>Assignee: Jun Gong
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-4256.001.patch, YARN-4256.002.patch
>
>
> When the queue with vcores is in decimal value, the value after the decimal 
> point is taken as vcores by FairScheduler.
> For the below queue,
> 2 mb,20 vcores,20.25 disks
> 3 mb,40.2 vcores,30.25 disks
> When many applications submitted  parallely into queue, all were in PENDING 
> state as the vcores is taken as 2 skipping the value 40.
> The code FairSchedulerConfiguration.java to Pattern match the vcores has to 
> be improved in such a way either throw 
> AllocationConfigurationException("Missing resource") or consider the value 
> before decimal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4256) YARN fair scheduler vcores with decimal values

2015-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969857#comment-14969857
 ] 

Hudson commented on YARN-4256:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8691 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8691/])
YARN-4256. YARN fair scheduler vcores with decimal values. Contributed (zxu: 
rev 960201b79b9f2ca40f8eadb21a2f9fe37dde2b5d)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerConfiguration.java


> YARN fair scheduler vcores with decimal values
> --
>
> Key: YARN-4256
> URL: https://issues.apache.org/jira/browse/YARN-4256
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Prabhu Joseph
>Assignee: Jun Gong
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-4256.001.patch, YARN-4256.002.patch
>
>
> When the queue with vcores is in decimal value, the value after the decimal 
> point is taken as vcores by FairScheduler.
> For the below queue,
> 2 mb,20 vcores,20.25 disks
> 3 mb,40.2 vcores,30.25 disks
> When many applications submitted  parallely into queue, all were in PENDING 
> state as the vcores is taken as 2 skipping the value 40.
> The code FairSchedulerConfiguration.java to Pattern match the vcores has to 
> be improved in such a way either throw 
> AllocationConfigurationException("Missing resource") or consider the value 
> before decimal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2913) Fair scheduler should have ability to set MaxResourceDefault for each queue

2015-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969950#comment-14969950
 ] 

Hadoop QA commented on YARN-2913:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 29s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m  9s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 26s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 50s | The applied patch generated  2 
new checkstyle issues (total was 37, now 35). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 31s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  56m  4s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  97m 16s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler |
|   | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768102/YARN-2913.v4.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 960201b |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9530/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9530/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9530/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9530/console |


This message was automatically generated.

> Fair scheduler should have ability to set MaxResourceDefault for each queue
> ---
>
> Key: YARN-2913
> URL: https://issues.apache.org/jira/browse/YARN-2913
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-2913.v1.patch, YARN-2913.v2.patch, 
> YARN-2913.v3.patch, YARN-2913.v4.patch
>
>
> Queues that are created on the fly have the max resource of the entire 
> cluster. Fair Scheduler should have a default maxResource to control the 
> maxResource of those queues



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4129) Refactor the SystemMetricPublisher in RM to better support newer events

2015-10-22 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970043#comment-14970043
 ] 

Naganarasimha G R commented on YARN-4129:
-

Thanks [~sjlee0] for the review!

> Refactor the SystemMetricPublisher in RM to better support newer events
> ---
>
> Key: YARN-4129
> URL: https://issues.apache.org/jira/browse/YARN-4129
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4129-YARN-2928.002.patch, 
> YARN-4129-YARN-2928.003.patch, YARN-4129-YARN-2928.004.patch, 
> YARN-4129.YARN-2928.001.patch
>
>
> Currently to add new timeline event/ entity in RM side, one has to add a 
> method in publisher and a method in handler and create a new event class 
> which looks cumbersome and redundant. also further all the events might not 
> be required to be published in V1 & V2. So adopting the approach similar to 
> what was adopted in YARN-3045(NM side)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4129) Refactor the SystemMetricPublisher in RM to better support newer events

2015-10-22 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970044#comment-14970044
 ] 

Naganarasimha G R commented on YARN-4129:
-

Thanks [~sjlee0] for the review!

> Refactor the SystemMetricPublisher in RM to better support newer events
> ---
>
> Key: YARN-4129
> URL: https://issues.apache.org/jira/browse/YARN-4129
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4129-YARN-2928.002.patch, 
> YARN-4129-YARN-2928.003.patch, YARN-4129-YARN-2928.004.patch, 
> YARN-4129.YARN-2928.001.patch
>
>
> Currently to add new timeline event/ entity in RM side, one has to add a 
> method in publisher and a method in handler and create a new event class 
> which looks cumbersome and redundant. also further all the events might not 
> be required to be published in V1 & V2. So adopting the approach similar to 
> what was adopted in YARN-3045(NM side)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-22 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969841#comment-14969841
 ] 

Junping Du commented on YARN-4243:
--

Committing it now.

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4129) Refactor the SystemMetricPublisher in RM to better support newer events

2015-10-22 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969858#comment-14969858
 ] 

Sangjin Lee commented on YARN-4129:
---

Thanks for updating the patch [~Naganarasimha]. I'm +1 on the latest patch 
(v.4). I'll let others comment on this for today before I commit it. Thanks!

> Refactor the SystemMetricPublisher in RM to better support newer events
> ---
>
> Key: YARN-4129
> URL: https://issues.apache.org/jira/browse/YARN-4129
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4129-YARN-2928.002.patch, 
> YARN-4129-YARN-2928.003.patch, YARN-4129-YARN-2928.004.patch, 
> YARN-4129.YARN-2928.001.patch
>
>
> Currently to add new timeline event/ entity in RM side, one has to add a 
> method in publisher and a method in handler and create a new event class 
> which looks cumbersome and redundant. also further all the events might not 
> be required to be published in V1 & V2. So adopting the approach similar to 
> what was adopted in YARN-3045(NM side)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969982#comment-14969982
 ] 

Hudson commented on YARN-4243:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8692 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8692/])
YARN-4243. Add retry on establishing Zookeeper conenction in (junping_du: rev 
0fce5f9a496925f0d53ea6c14318c9b513de9882)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java


> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4256) YARN fair scheduler vcores with decimal values

2015-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969921#comment-14969921
 ] 

Hudson commented on YARN-4256:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #1306 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1306/])
YARN-4256. YARN fair scheduler vcores with decimal values. Contributed (zxu: 
rev 960201b79b9f2ca40f8eadb21a2f9fe37dde2b5d)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerConfiguration.java


> YARN fair scheduler vcores with decimal values
> --
>
> Key: YARN-4256
> URL: https://issues.apache.org/jira/browse/YARN-4256
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Prabhu Joseph
>Assignee: Jun Gong
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-4256.001.patch, YARN-4256.002.patch
>
>
> When the queue with vcores is in decimal value, the value after the decimal 
> point is taken as vcores by FairScheduler.
> For the below queue,
> 2 mb,20 vcores,20.25 disks
> 3 mb,40.2 vcores,30.25 disks
> When many applications submitted  parallely into queue, all were in PENDING 
> state as the vcores is taken as 2 skipping the value 40.
> The code FairSchedulerConfiguration.java to Pattern match the vcores has to 
> be improved in such a way either throw 
> AllocationConfigurationException("Missing resource") or consider the value 
> before decimal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4256) YARN fair scheduler vcores with decimal values

2015-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969973#comment-14969973
 ] 

Hudson commented on YARN-4256:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #585 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/585/])
YARN-4256. YARN fair scheduler vcores with decimal values. Contributed (zxu: 
rev 960201b79b9f2ca40f8eadb21a2f9fe37dde2b5d)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java


> YARN fair scheduler vcores with decimal values
> --
>
> Key: YARN-4256
> URL: https://issues.apache.org/jira/browse/YARN-4256
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Prabhu Joseph
>Assignee: Jun Gong
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-4256.001.patch, YARN-4256.002.patch
>
>
> When the queue with vcores is in decimal value, the value after the decimal 
> point is taken as vcores by FairScheduler.
> For the below queue,
> 2 mb,20 vcores,20.25 disks
> 3 mb,40.2 vcores,30.25 disks
> When many applications submitted  parallely into queue, all were in PENDING 
> state as the vcores is taken as 2 skipping the value 40.
> The code FairSchedulerConfiguration.java to Pattern match the vcores has to 
> be improved in such a way either throw 
> AllocationConfigurationException("Missing resource") or consider the value 
> before decimal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-10-22 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-4287:
-
Attachment: YARN-4287-v2.patch

Fixed unit test failures and addressed most checkstyle errors

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4287-v2.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-10-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969820#comment-14969820
 ] 

Jason Lowe commented on YARN-4041:
--

The problem with checking the renewer event queue directly is that the queue 
can be empty but processing has not yet completed.  Threads can still be 
executing the last events, having just pulled them from the queue to leave it 
empty.  Therefore the test is still racy.  A simpler approach would be to just 
keep checking if the tokens are equal.  If they aren't then sleep for a bit 
then try again, up to some limit of time to keep checking.

By the way, we should not sleep an entire second between checks.  All those 
seconds of waiting add up across all of our tests doing it, making it take 
significantly longer to run them overall.  We should be sleeping for only 10ms 
or so.  That's still a large amount of time for modern processors to get work 
done while we're waiting, and we still won't be spinning non-stop on the CPU.


> Slow delegation token renewal can severely prolong RM recovery
> --
>
> Key: YARN-4041
> URL: https://issues.apache.org/jira/browse/YARN-4041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, 
> 0003-YARN-4041.patch, 0004-YARN-4041.patch
>
>
> When the RM does a work-preserving restart it synchronously tries to renew 
> delegation tokens for every active application.  If a token server happens to 
> be down or is running slow and a lot of the active apps were using tokens 
> from that server then it can have a huge impact on the time it takes the RM 
> to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-10-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970026#comment-14970026
 ] 

Wangda Tan commented on YARN-4287:
--

Some suggestions
1) RACK_LOCALITY_EXTRA_DELAY -> RACK_LOCALITY_DELAY, same as configuration 
property name (rack-locality-delay)

2) Do you think if is it a good idea to separate old rack-locality-delay 
computation (using getLocalityWaitFactor) and new rack-locality-delay config? 
Now rack-locality-delay = min(old-computed-delay, new-specified-delay), since 
the getLocalityWaitFactor has some flaws, I think we can make this configurable 
so user can choose to use specified or computed.

Pseudo code may look like:
{code}
if type is OFF_SWITCH:
  if rack-locality-delay specified:
delay = rack-locality-delay
  else:
delay = computed-locality-delay
else if type is RACK_LOCAL:
delay = min(node-locality-delay, computed-or-specified-rack-locality-delay)
{code}

3)
bq. When we're getting rackLocal assignments, subsequent rackLocal assignments 
should not be delayed
+1 to the fix, since this is a behavior change, do you think if we need to make 
this configurable? This change could lead to #node-local container allocation 
decreasing in some cases.


Thanks,
Wangda

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4287-v2.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4291) ResourceUtilization should be a part of NodeReport

2015-10-22 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-4291:


 Summary: ResourceUtilization should be a part of NodeReport 
 Key: YARN-4291
 URL: https://issues.apache.org/jira/browse/YARN-4291
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient

2015-10-22 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970108#comment-14970108
 ] 

Naganarasimha G R commented on YARN-3367:
-

Hi [~djp], 
Can you share your thoughts on why events are required to be in order ? Are you 
visualizing some erroneous situation *when the sync and async events are out of 
order ?* My guess would be that, it might impact if async events like resource 
utilization for a given container should be in order so that aggregation and 
accumulation are proper, but i was not able to identify any scenario where sync 
and async events should be in order. If its required then i need to bring in 
some additional modifications in the patch.

> Replace starting a separate thread for post entity with event loop in 
> TimelineClient
> 
>
> Key: YARN-3367
> URL: https://issues.apache.org/jira/browse/YARN-3367
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Naganarasimha G R
> Attachments: YARN-3367.YARN-2928.001.patch
>
>
> Since YARN-3039, we add loop in TimelineClient to wait for 
> collectorServiceAddress ready before posting any entity. In consumer of  
> TimelineClient (like AM), we are starting a new thread for each call to get 
> rid of potential deadlock in main thread. This way has at least 3 major 
> defects:
> 1. The consumer need some additional code to wrap a thread before calling 
> putEntities() in TimelineClient.
> 2. It cost many thread resources which is unnecessary.
> 3. The sequence of events could be out of order because each posting 
> operation thread get out of waiting loop randomly.
> We should have something like event loop in TimelineClient side, 
> putEntities() only put related entities into a queue of entities and a 
> separated thread handle to deliver entities in queue to collector via REST 
> call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4285) Display resource usage as percentage of queue and cluster in the RM UI

2015-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970172#comment-14970172
 ] 

Hadoop QA commented on YARN-4285:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  38m 19s | Pre-patch trunk has 3 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |  11m 13s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  13m 40s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 31s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   3m 13s | The applied patch generated  1 
new checkstyle issues (total was 10, now 11). |
| {color:red}-1{color} | whitespace |   0m  2s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   2m  9s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 46s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   9m 16s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 33s | Tests passed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |  35m 28s | Tests failed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   2m 48s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   3m 59s | Tests passed in 
hadoop-yarn-server-applicationhistoryservice. |
| {color:red}-1{color} | yarn tests | 267m 10s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 390m 20s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-resourcemanager |
| Failed unit tests | hadoop.yarn.client.api.impl.TestYarnClient |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps |
|   | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
|   | hadoop.yarn.server.resourcemanager.TestClientRMService |
|   | hadoop.yarn.server.resourcemanager.TestRM |
|   | hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl |
|   | hadoop.yarn.server.resourcemanager.TestMoveApplication |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestWorkPreservingRMRestartForNodeLabel
 |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodeLabels |
|   | hadoop.yarn.server.resourcemanager.TestContainerResourceUsage |
|   | hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler |
|   | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions |
|   | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
|   | hadoop.yarn.server.resourcemanager.security.TestAMRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestApplicationCleanup |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate
 |
|   | hadoop.yarn.server.resourcemanager.TestRMAdminService |
|   | hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs |
|   | hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerDynamicBehavior
 |
|   | hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA |
|   | hadoop.yarn.server.resourcemanager.TestApplicationACLs |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs
 |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler |
|   | hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler |
|   | hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens |
|   | hadoop.yarn.server.resourcemanager.rmapp.TestNodesListManager |
| Timed out tests | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestNMClient |
|   | org.apache.hadoop.yarn.server.resourcemanager.TestSignalContainer |
|   | org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler
 |
|   | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority
 |
|   | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
|   | 
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCResponseId
 |
|   | 

[jira] [Commented] (YARN-4256) YARN fair scheduler vcores with decimal values

2015-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970180#comment-14970180
 ] 

Hudson commented on YARN-4256:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2517 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2517/])
YARN-4256. YARN fair scheduler vcores with decimal values. Contributed (zxu: 
rev 960201b79b9f2ca40f8eadb21a2f9fe37dde2b5d)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java


> YARN fair scheduler vcores with decimal values
> --
>
> Key: YARN-4256
> URL: https://issues.apache.org/jira/browse/YARN-4256
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Prabhu Joseph
>Assignee: Jun Gong
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-4256.001.patch, YARN-4256.002.patch
>
>
> When the queue with vcores is in decimal value, the value after the decimal 
> point is taken as vcores by FairScheduler.
> For the below queue,
> 2 mb,20 vcores,20.25 disks
> 3 mb,40.2 vcores,30.25 disks
> When many applications submitted  parallely into queue, all were in PENDING 
> state as the vcores is taken as 2 skipping the value 40.
> The code FairSchedulerConfiguration.java to Pattern match the vcores has to 
> be improved in such a way either throw 
> AllocationConfigurationException("Missing resource") or consider the value 
> before decimal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970179#comment-14970179
 ] 

Hudson commented on YARN-4243:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2517 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2517/])
YARN-4243. Add retry on establishing Zookeeper conenction in (junping_du: rev 
0fce5f9a496925f0d53ea6c14318c9b513de9882)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4256) YARN fair scheduler vcores with decimal values

2015-10-22 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970225#comment-14970225
 ] 

Jun Gong commented on YARN-4256:


Thanks [~zxu] for the review and commit!

> YARN fair scheduler vcores with decimal values
> --
>
> Key: YARN-4256
> URL: https://issues.apache.org/jira/browse/YARN-4256
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Prabhu Joseph
>Assignee: Jun Gong
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-4256.001.patch, YARN-4256.002.patch
>
>
> When the queue with vcores is in decimal value, the value after the decimal 
> point is taken as vcores by FairScheduler.
> For the below queue,
> 2 mb,20 vcores,20.25 disks
> 3 mb,40.2 vcores,30.25 disks
> When many applications submitted  parallely into queue, all were in PENDING 
> state as the vcores is taken as 2 skipping the value 40.
> The code FairSchedulerConfiguration.java to Pattern match the vcores has to 
> be improved in such a way either throw 
> AllocationConfigurationException("Missing resource") or consider the value 
> before decimal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4290) "yarn nodes -list" should print all nodes reports information

2015-10-22 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970224#comment-14970224
 ] 

Sunil G commented on YARN-4290:
---

Yes.  This will be really helpful feature. I can help to do this if you haven't 
done. Kindly reassign otherwise. Thank you. 
One more point, we could retrospect for some more CLI commands to add 
*show-details*.  It is a good sub command which can append what more details 
are needed. 

> "yarn nodes -list" should print all nodes reports information
> -
>
> Key: YARN-4290
> URL: https://issues.apache.org/jira/browse/YARN-4290
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Reporter: Wangda Tan
>Assignee: Sunil G
>
> Currently, "yarn nodes -list" command only shows 
> - "Node-Id", 
> - "Node-State", 
> - "Node-Http-Address",
> - "Number-of-Running-Containers"
> I think we need to show more information such as used resource, just like 
> "yarn nodes -status" command.
> Maybe we can add a parameter to -list, such as "-show-details" to enable 
> printing all detailed information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4127) RM fail with noAuth error if switched from non-failover mode to failover mode

2015-10-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970229#comment-14970229
 ] 

Varun Saxena commented on YARN-4127:


Sorry [~jianhe], missed your earlier comment.
Makes sense to delete fencing node path. Will update a patch shortly.

> RM fail with noAuth error if switched from non-failover mode to failover mode 
> --
>
> Key: YARN-4127
> URL: https://issues.apache.org/jira/browse/YARN-4127
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Jian He
>Assignee: Varun Saxena
> Attachments: YARN-4127.01.patch
>
>
> The scenario is that RM failover was initially enabled, so the zkRootNodeAcl 
> is by default set with the *RM ID* in the ACL string 
> If RM failover is then switched to be disabled,  it cannot load data from ZK 
> and fail with noAuth error. After I reset the root node ACL, it again can 
> access.
> {code}
> 15/09/08 14:28:34 ERROR resourcemanager.ResourceManager: Failed to 
> load/recover state
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
>   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$SafeTransaction.commit(ZKRMStateStore.java:1009)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.safeSetData(ZKRMStateStore.java:985)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:374)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579)
>   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:973)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1010)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1010)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1050)
>   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1194)
> {code}
>  the problem may be that in non-failover mode, RM doesn't use the *RM-ID* to 
> connect with ZK and thus fail with no Auth error.
> We should be able to switch failover on and off with no interruption to the 
> user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4127) RM fail with noAuth error if switched from failover mode to non-failover mode

2015-10-22 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4127:
---
Summary: RM fail with noAuth error if switched from failover mode to 
non-failover mode   (was: RM fail with noAuth error if switched from 
non-failover mode to failover mode )

> RM fail with noAuth error if switched from failover mode to non-failover mode 
> --
>
> Key: YARN-4127
> URL: https://issues.apache.org/jira/browse/YARN-4127
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Jian He
>Assignee: Varun Saxena
> Attachments: YARN-4127.01.patch, YARN-4127.02.patch
>
>
> The scenario is that RM failover was initially enabled, so the zkRootNodeAcl 
> is by default set with the *RM ID* in the ACL string 
> If RM failover is then switched to be disabled,  it cannot load data from ZK 
> and fail with noAuth error. After I reset the root node ACL, it again can 
> access.
> {code}
> 15/09/08 14:28:34 ERROR resourcemanager.ResourceManager: Failed to 
> load/recover state
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
>   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$SafeTransaction.commit(ZKRMStateStore.java:1009)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.safeSetData(ZKRMStateStore.java:985)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:374)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579)
>   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:973)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1010)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1010)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1050)
>   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1194)
> {code}
>  the problem may be that in non-failover mode, RM doesn't use the *RM-ID* to 
> connect with ZK and thus fail with no Auth error.
> We should be able to switch failover on and off with no interruption to the 
> user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4127) RM fail with noAuth error if switched from non-failover mode to failover mode

2015-10-22 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970079#comment-14970079
 ] 

Jian He commented on YARN-4127:
---

Hi [~varun_saxena], any updates?

> RM fail with noAuth error if switched from non-failover mode to failover mode 
> --
>
> Key: YARN-4127
> URL: https://issues.apache.org/jira/browse/YARN-4127
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Jian He
>Assignee: Varun Saxena
> Attachments: YARN-4127.01.patch
>
>
> The scenario is that RM failover was initially enabled, so the zkRootNodeAcl 
> is by default set with the *RM ID* in the ACL string 
> If RM failover is then switched to be disabled,  it cannot load data from ZK 
> and fail with noAuth error. After I reset the root node ACL, it again can 
> access.
> {code}
> 15/09/08 14:28:34 ERROR resourcemanager.ResourceManager: Failed to 
> load/recover state
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
>   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$SafeTransaction.commit(ZKRMStateStore.java:1009)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.safeSetData(ZKRMStateStore.java:985)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:374)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579)
>   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:973)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1010)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1010)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1050)
>   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1194)
> {code}
>  the problem may be that in non-failover mode, RM doesn't use the *RM-ID* to 
> connect with ZK and thus fail with no Auth error.
> We should be able to switch failover on and off with no interruption to the 
> user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3223) Resource update during NM graceful decommission

2015-10-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970078#comment-14970078
 ] 

Wangda Tan commented on YARN-3223:
--

[~brookz],
Thanks for working on this JIRA, I took a look at the patch, some comments: 

I think the general approach is fine, SchedulerNode keep updating total 
resource to used resource if it is decommissioning state, so available resource 
will be 0.

But I think there're some other places need to take care:
- Suggest to use {{CapacityScheduler#updateNodeAndQueueResource}} to update 
resources, we need to update queue's resource, cluster metrics as well.
- When async scheduling enabled, we need to make sure decommissioing node's 
total resource is updated so no new container will be allocated on these nodes.

And after this patch, I think we need to add total decommisioning nodes 
resources to cluster metrics (equals to sum(decommisioning-node.used-resource)).

> Resource update during NM graceful decommission
> ---
>
> Key: YARN-3223
> URL: https://issues.apache.org/jira/browse/YARN-3223
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Junping Du
>Assignee: Brook Zhou
> Attachments: YARN-3223-v0.patch, YARN-3223-v1.patch
>
>
> During NM graceful decommission, we should handle resource update properly, 
> include: make RMNode keep track of old resource for possible rollback, keep 
> available resource to 0 and used resource get updated when
> container finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2015-10-22 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970247#comment-14970247
 ] 

Sunil G commented on YARN-3216:
---

Yes Wangda. I feel we need not have to give a default label ("") in am
resource request from MockRM.




> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, 
> 0003-YARN-3216.patch, 0004-YARN-3216.patch, 0005-YARN-3216.patch, 
> 0006-YARN-3216.patch, 0007-YARN-3216.patch, 0008-YARN-3216.patch, 
> 0009-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4127) RM fail with noAuth error if switched from non-failover mode to failover mode

2015-10-22 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4127:
---
Attachment: YARN-4127.02.patch

> RM fail with noAuth error if switched from non-failover mode to failover mode 
> --
>
> Key: YARN-4127
> URL: https://issues.apache.org/jira/browse/YARN-4127
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Jian He
>Assignee: Varun Saxena
> Attachments: YARN-4127.01.patch, YARN-4127.02.patch
>
>
> The scenario is that RM failover was initially enabled, so the zkRootNodeAcl 
> is by default set with the *RM ID* in the ACL string 
> If RM failover is then switched to be disabled,  it cannot load data from ZK 
> and fail with noAuth error. After I reset the root node ACL, it again can 
> access.
> {code}
> 15/09/08 14:28:34 ERROR resourcemanager.ResourceManager: Failed to 
> load/recover state
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
>   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$SafeTransaction.commit(ZKRMStateStore.java:1009)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.safeSetData(ZKRMStateStore.java:985)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:374)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579)
>   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:973)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1010)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1010)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1050)
>   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1194)
> {code}
>  the problem may be that in non-failover mode, RM doesn't use the *RM-ID* to 
> connect with ZK and thus fail with no Auth error.
> We should be able to switch failover on and off with no interruption to the 
> user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4256) YARN fair scheduler vcores with decimal values

2015-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970270#comment-14970270
 ] 

Hudson commented on YARN-4256:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #528 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/528/])
YARN-4256. YARN fair scheduler vcores with decimal values. Contributed (zxu: 
rev 960201b79b9f2ca40f8eadb21a2f9fe37dde2b5d)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerConfiguration.java


> YARN fair scheduler vcores with decimal values
> --
>
> Key: YARN-4256
> URL: https://issues.apache.org/jira/browse/YARN-4256
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Prabhu Joseph
>Assignee: Jun Gong
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-4256.001.patch, YARN-4256.002.patch
>
>
> When the queue with vcores is in decimal value, the value after the decimal 
> point is taken as vcores by FairScheduler.
> For the below queue,
> 2 mb,20 vcores,20.25 disks
> 3 mb,40.2 vcores,30.25 disks
> When many applications submitted  parallely into queue, all were in PENDING 
> state as the vcores is taken as 2 skipping the value 40.
> The code FairSchedulerConfiguration.java to Pattern match the vcores has to 
> be improved in such a way either throw 
> AllocationConfigurationException("Missing resource") or consider the value 
> before decimal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970269#comment-14970269
 ] 

Hudson commented on YARN-4243:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #528 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/528/])
YARN-4243. Add retry on establishing Zookeeper conenction in (junping_du: rev 
0fce5f9a496925f0d53ea6c14318c9b513de9882)
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java


> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4009) CORS support for ResourceManager REST API

2015-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970054#comment-14970054
 ] 

Hadoop QA commented on YARN-4009:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  28m 55s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   9m 30s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  12m 20s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 31s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | site |   4m 14s | Site still builds. |
| {color:red}-1{color} | checkstyle |   3m 26s | The applied patch generated  5 
new checkstyle issues (total was 0, now 5). |
| {color:red}-1{color} | checkstyle |   4m  3s | The applied patch generated  2 
new checkstyle issues (total was 211, now 212). |
| {color:red}-1{color} | whitespace |   0m  2s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 46s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 38s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |  12m  0s | The patch appears to introduce 4 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |   9m 26s | Tests passed in 
hadoop-common. |
| {color:green}+1{color} | yarn tests |   0m 27s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m 24s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   4m 22s | Tests passed in 
hadoop-yarn-server-applicationhistoryservice. |
| {color:green}+1{color} | yarn tests |   0m 26s | Tests passed in 
hadoop-yarn-server-common. |
| {color:red}-1{color} | yarn tests |   9m 12s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| {color:red}-1{color} | yarn tests |  65m 12s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 166m 55s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-common |
| FindBugs | module:hadoop-yarn-server-nodemanager |
| Failed unit tests | 
hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels |
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768100/YARN-4009.8.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle site |
| git revision | trunk / 960201b |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9531/artifact/patchprocess/diffcheckstylehadoop-common.txt
 
https://builds.apache.org/job/PreCommit-YARN-Build/9531/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9531/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9531/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9531/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9531/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9531/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9531/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9531/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9531/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9531/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9531/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9531/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 

[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970148#comment-14970148
 ] 

Hudson commented on YARN-4243:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #1307 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1307/])
YARN-4243. Add retry on establishing Zookeeper conenction in (junping_du: rev 
0fce5f9a496925f0d53ea6c14318c9b513de9882)
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java
* hadoop-yarn-project/CHANGES.txt


> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4290) "yarn nodes -list" should print all nodes reports information

2015-10-22 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reassigned YARN-4290:
-

Assignee: Sunil G

> "yarn nodes -list" should print all nodes reports information
> -
>
> Key: YARN-4290
> URL: https://issues.apache.org/jira/browse/YARN-4290
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Reporter: Wangda Tan
>Assignee: Sunil G
>
> Currently, "yarn nodes -list" command only shows 
> - "Node-Id", 
> - "Node-State", 
> - "Node-Http-Address",
> - "Number-of-Running-Containers"
> I think we need to show more information such as used resource, just like 
> "yarn nodes -status" command.
> Maybe we can add a parameter to -list, such as "-show-details" to enable 
> printing all detailed information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4291) ResourceUtilization should be a part of NodeReport API.

2015-10-22 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R reassigned YARN-4291:
---

Assignee: Naganarasimha G R

> ResourceUtilization should be a part of NodeReport API.
> ---
>
> Key: YARN-4291
> URL: https://issues.apache.org/jira/browse/YARN-4291
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4179) [reader implementation] support flow activity queries based on time

2015-10-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970264#comment-14970264
 ] 

Varun Saxena commented on YARN-4179:


Thanks [~sjlee0] for the commit.

> [reader implementation] support flow activity queries based on time
> ---
>
> Key: YARN-4179
> URL: https://issues.apache.org/jira/browse/YARN-4179
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
>Priority: Minor
> Fix For: YARN-2928
>
> Attachments: YARN-4179-YARN-2928.01.patch, 
> YARN-4179-YARN-2928.02.patch, YARN-4179-YARN-2928.03.patch
>
>
> This came up as part of YARN-4074 and YARN-4075.
> Currently the only query pattern that's supported on the flow activity table 
> is by cluster only. But it might be useful to support queries by cluster and 
> certain date or dates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup

2015-10-22 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2729:

Attachment: YARN-2729.20151023-1.patch

Thanks [~rohithsharma] & [~sunilg], for the comments . I have updated the patch 
with the approach specified. but i have a query.
bq.  If you call super.serviceStop() first, D1 clean up will not happen first 
rather, D1(D,C,B,A)-cleanup D1 which is at the end.
I could not get the above statement clearly, IIUC  the order would be either 
*D1,D,C,B,A* when {{super.serviceStop()}} is called first and *D,D1,C,B,A* when 
{{super.serviceStop()}} is called at the last. Latter is what we are trying to 
achieve earlier, correct me if i am wrong. 

> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup
> ---
>
> Key: YARN-2729
> URL: https://issues.apache.org/jira/browse/YARN-2729
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, 
> YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch, 
> YARN-2729.20141210-1.patch, YARN-2729.20150309-1.patch, 
> YARN-2729.20150322-1.patch, YARN-2729.20150401-1.patch, 
> YARN-2729.20150402-1.patch, YARN-2729.20150404-1.patch, 
> YARN-2729.20150517-1.patch, YARN-2729.20150830-1.patch, 
> YARN-2729.20150925-1.patch, YARN-2729.20151015-1.patch, 
> YARN-2729.20151019.patch, YARN-2729.20151023-1.patch, 
> YARN-2729.20151310-1.patch, YARN-2729.20151310-2.patch
>
>
> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup . 
> Miscellaneous Issues :
> # In configurationNodeLabelsProvider instead of taking multiple labels from 
> single configuration, we need to support exclusive configuration for 
> partition (single label).
> # Proper logging when registration of Node Fails
> # Classloader was not getting reset from custom class loader in 
> TestConfigurationNodeLabelsProvider.java which could make test cases fail in 
> certain conditions
> # In ResourceTrackerService we need to consider distributed configuration 
> only when node labels are enabled. leads to lots of logs in certain conditions
> # NodeLabelsProvider needs to be a interface rather than  abstract class 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4292) ResourceUtilization should be a part of NodeInfo REST API

2015-10-22 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-4292:


 Summary: ResourceUtilization should be a part of NodeInfo REST API
 Key: YARN-4292
 URL: https://issues.apache.org/jira/browse/YARN-4292
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4293) ResourceUtilization should be a part of yarn node CLI

2015-10-22 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-4293:


 Summary: ResourceUtilization should be a part of yarn node CLI
 Key: YARN-4293
 URL: https://issues.apache.org/jira/browse/YARN-4293
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4291) ResourceUtilization should be a part of NodeReport API.

2015-10-22 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4291:
-
Summary: ResourceUtilization should be a part of NodeReport API.  (was: 
ResourceUtilization should be a part of NodeReport )

> ResourceUtilization should be a part of NodeReport API.
> ---
>
> Key: YARN-4291
> URL: https://issues.apache.org/jira/browse/YARN-4291
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970121#comment-14970121
 ] 

Hudson commented on YARN-4243:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #571 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/571/])
YARN-4243. Add retry on establishing Zookeeper conenction in (junping_du: rev 
0fce5f9a496925f0d53ea6c14318c9b513de9882)
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970159#comment-14970159
 ] 

Hadoop QA commented on YARN-4287:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m  9s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   7m 59s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 26s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 50s | The applied patch generated  4 
new checkstyle issues (total was 257, now 259). |
| {color:red}-1{color} | whitespace |   0m 12s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 31s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  58m 19s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  99m  3s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768138/YARN-4287-v2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 124a412 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9533/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9533/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9533/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9533/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9533/console |


This message was automatically generated.

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4287-v2.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4292) ResourceUtilization should be a part of NodeInfo REST API

2015-10-22 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970218#comment-14970218
 ] 

Sunil G commented on YARN-4292:
---

Hi [~leftnoteasy]. 
I would like to take this if you are not started. Kindly let me know. 

> ResourceUtilization should be a part of NodeInfo REST API
> -
>
> Key: YARN-4292
> URL: https://issues.apache.org/jira/browse/YARN-4292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4292) ResourceUtilization should be a part of NodeInfo REST API

2015-10-22 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reassigned YARN-4292:
-

Assignee: Sunil G

> ResourceUtilization should be a part of NodeInfo REST API
> -
>
> Key: YARN-4292
> URL: https://issues.apache.org/jira/browse/YARN-4292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4256) YARN fair scheduler vcores with decimal values

2015-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970237#comment-14970237
 ] 

Hudson commented on YARN-4256:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2465 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2465/])
YARN-4256. YARN fair scheduler vcores with decimal values. Contributed (zxu: 
rev 960201b79b9f2ca40f8eadb21a2f9fe37dde2b5d)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java


> YARN fair scheduler vcores with decimal values
> --
>
> Key: YARN-4256
> URL: https://issues.apache.org/jira/browse/YARN-4256
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Prabhu Joseph
>Assignee: Jun Gong
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-4256.001.patch, YARN-4256.002.patch
>
>
> When the queue with vcores is in decimal value, the value after the decimal 
> point is taken as vcores by FairScheduler.
> For the below queue,
> 2 mb,20 vcores,20.25 disks
> 3 mb,40.2 vcores,30.25 disks
> When many applications submitted  parallely into queue, all were in PENDING 
> state as the vcores is taken as 2 skipping the value 40.
> The code FairSchedulerConfiguration.java to Pattern match the vcores has to 
> be improved in such a way either throw 
> AllocationConfigurationException("Missing resource") or consider the value 
> before decimal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4293) ResourceUtilization should be a part of yarn node CLI

2015-10-22 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reassigned YARN-4293:
-

Assignee: Sunil G

> ResourceUtilization should be a part of yarn node CLI
> -
>
> Key: YARN-4293
> URL: https://issues.apache.org/jira/browse/YARN-4293
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4127) RM fail with noAuth error if switched from non-failover mode to failover mode

2015-10-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970261#comment-14970261
 ] 

Varun Saxena commented on YARN-4127:


[~jianhe], updated the patch. The patch does not apply cleanly on branch-2.7 so 
will update a patch for that too.

> RM fail with noAuth error if switched from non-failover mode to failover mode 
> --
>
> Key: YARN-4127
> URL: https://issues.apache.org/jira/browse/YARN-4127
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Jian He
>Assignee: Varun Saxena
> Attachments: YARN-4127.01.patch, YARN-4127.02.patch
>
>
> The scenario is that RM failover was initially enabled, so the zkRootNodeAcl 
> is by default set with the *RM ID* in the ACL string 
> If RM failover is then switched to be disabled,  it cannot load data from ZK 
> and fail with noAuth error. After I reset the root node ACL, it again can 
> access.
> {code}
> 15/09/08 14:28:34 ERROR resourcemanager.ResourceManager: Failed to 
> load/recover state
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
>   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$SafeTransaction.commit(ZKRMStateStore.java:1009)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.safeSetData(ZKRMStateStore.java:985)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:374)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579)
>   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:973)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1010)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1010)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1050)
>   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1194)
> {code}
>  the problem may be that in non-failover mode, RM doesn't use the *RM-ID* to 
> connect with ZK and thus fail with no Auth error.
> We should be able to switch failover on and off with no interruption to the 
> user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4284) condition for AM blacklisting is too narrow

2015-10-22 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970069#comment-14970069
 ] 

Jian He commented on YARN-4284:
---

I see, sounds good to me, thanks for your explanation. 

> condition for AM blacklisting is too narrow
> ---
>
> Key: YARN-4284
> URL: https://issues.apache.org/jira/browse/YARN-4284
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4284.001.patch
>
>
> Per YARN-2005, there is now a way to blacklist nodes for AM purposes so the 
> next app attempt can be assigned to a different node.
> However, currently the condition under which the node gets blacklisted is 
> limited to {{DISKS_FAILED}}. There are a whole host of other issues that may 
> cause the failure, for which we want to locate the AM elsewhere; e.g. disks 
> full, JVM crashes, memory issues, etc.
> Since the AM blacklisting is per-app, there is little practical downside in 
> blacklisting the nodes on *any failure* (although it might lead to 
> blacklisting the node more aggressively than necessary). I would propose 
> locating the next app attempt to a different node on any failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2913) Fair scheduler should have ability to set MaxResourceDefault for each queue

2015-10-22 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970092#comment-14970092
 ] 

Ming Ma commented on YARN-2913:
---

Thanks [~l201514]. Can you please update FairScheduler.md?

> Fair scheduler should have ability to set MaxResourceDefault for each queue
> ---
>
> Key: YARN-2913
> URL: https://issues.apache.org/jira/browse/YARN-2913
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-2913.v1.patch, YARN-2913.v2.patch, 
> YARN-2913.v3.patch, YARN-2913.v4.patch
>
>
> Queues that are created on the fly have the max resource of the entire 
> cluster. Fair Scheduler should have a default maxResource to control the 
> maxResource of those queues



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4290) "yarn nodes -list" should print all nodes reports information

2015-10-22 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-4290:


 Summary: "yarn nodes -list" should print all nodes reports 
information
 Key: YARN-4290
 URL: https://issues.apache.org/jira/browse/YARN-4290
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Wangda Tan


Currently, "yarn nodes -list" command only shows 
- "Node-Id", 
- "Node-State", 
- "Node-Http-Address",
- "Number-of-Running-Containers"

I think we need to show more information such as used resource, just like "yarn 
nodes -status" command.

Maybe we can add a parameter to -list, such as "-show-details" to enable 
printing all detailed information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970197#comment-14970197
 ] 

Hudson commented on YARN-4243:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #586 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/586/])
YARN-4243. Add retry on establishing Zookeeper conenction in (junping_du: rev 
0fce5f9a496925f0d53ea6c14318c9b513de9882)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java


> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2913) Fair scheduler should have ability to set MaxResourceDefault for each queue

2015-10-22 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-2913:
--
Attachment: YARN-2913.v5.patch

> Fair scheduler should have ability to set MaxResourceDefault for each queue
> ---
>
> Key: YARN-2913
> URL: https://issues.apache.org/jira/browse/YARN-2913
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-2913.v1.patch, YARN-2913.v2.patch, 
> YARN-2913.v3.patch, YARN-2913.v4.patch, YARN-2913.v5.patch
>
>
> Queues that are created on the fly have the max resource of the entire 
> cluster. Fair Scheduler should have a default maxResource to control the 
> maxResource of those queues



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3738) Add support for recovery of reserved apps (running under dynamic queues) to Capacity Scheduler

2015-10-22 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-3738:
-
Attachment: YARN-3738-v3.patch

Touching patch to retrigger test-patch as Jenkins seems to have got confused 
with YARN-4243 commit

> Add support for recovery of reserved apps (running under dynamic queues) to 
> Capacity Scheduler
> --
>
> Key: YARN-3738
> URL: https://issues.apache.org/jira/browse/YARN-3738
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-3738-v2.patch, YARN-3738-v3.patch, YARN-3738.patch
>
>
> YARN-3736 persists the current state of the Plan to the RMStateStore. This 
> JIRA covers recovery of the Plan, i.e. dynamic reservation queues with 
> associated apps as part Capacity Scheduler failover mechanism.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3738) Add support for recovery of reserved apps (running under dynamic queues) to Capacity Scheduler

2015-10-22 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970245#comment-14970245
 ] 

Subru Krishnan commented on YARN-3738:
--

I tested the following scenarios successfully in my machine:
   * Created reservation, submitted jobs to it. Restarted RM and submitted more 
jobs to the recovered reservation.
   * Created reservation, submitted a longish job to it. Restarted RM and 
verified that the job was recovered under the reservation.
   * Created reservation, submitted a longish job to it towards its deadline. 
Restarted RM and verified that the job was recovered under the default 
reservation.

> Add support for recovery of reserved apps (running under dynamic queues) to 
> Capacity Scheduler
> --
>
> Key: YARN-3738
> URL: https://issues.apache.org/jira/browse/YARN-3738
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-3738-v2.patch, YARN-3738-v3.patch, YARN-3738.patch
>
>
> YARN-3736 persists the current state of the Plan to the RMStateStore. This 
> JIRA covers recovery of the Plan, i.e. dynamic reservation queues with 
> associated apps as part Capacity Scheduler failover mechanism.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2015-10-22 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3216:
--
Attachment: 0009-YARN-3216.patch

Hi [~leftnoteasy]
Thank you for sharing the comments. Updating patch addressing the same.

bq.2) Will tests fail if revert changes of TestNodeLabelContainerAllocation
Yes, tests will fail For example, if we do not give any label to {{submitApp}}, 
then it will try to allocate to default label. Because in MockRM, we set 
NO_LABEL in all am resource request by default. And 
{{MockRM.launchAndRegisterAM(app1, rm1, nm1);}}, will try to schedule to *nm1* 
which only takes "x". So AM will not be allocated, rather it will stay in 
scheduled state.
{code}
MockNM nm1 = rm1.registerNode("h1:1234", 8000); // label = x
rm1.registerNode("h2:1234", 8000); // label = y
MockNM nm3 = rm1.registerNode("h3:1234", 8000); // label = 

// launch an app to queue a1 (label = x), and check all container will
// be allocated in h1
RMApp app1 = rm1.submitApp(200, "app", "user", null, "a1", "x");
MockAM am1 = MockRM.launchAndRegisterAM(app1, rm1, nm1);
{code}

> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, 
> 0003-YARN-3216.patch, 0004-YARN-3216.patch, 0005-YARN-3216.patch, 
> 0006-YARN-3216.patch, 0007-YARN-3216.patch, 0008-YARN-3216.patch, 
> 0009-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3739) Add reservation system recovery to RM recovery process

2015-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969199#comment-14969199
 ] 

Hudson commented on YARN-3739:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8688 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8688/])
YARN-3739. Add reservation system recovery to RM recovery process. (adhoot: rev 
2798723a5443d04455b9d79c48d61f435ab52267)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractSchedulerPlanFollower.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/PlanningAlgorithm.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestAlignedPlanner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityOverTimePolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystem.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/PlanFollower.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/PlanEdit.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestNoOverCommitPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestInMemoryPlan.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestReservationSystemWithRMHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestGreedyReservationAgent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/CapacitySchedulerPlanFollower.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestSimpleCapacityReplanner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestSchedulerPlanFollowerBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/FairSchedulerPlanFollower.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/InMemoryPlan.java


> Add reservation system recovery to RM recovery process
> --
>
> Key: YARN-3739
> URL: https://issues.apache.org/jira/browse/YARN-3739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-3739-v1.patch, YARN-3739-v2.patch, 
> YARN-3739-v3.patch
>
>
> YARN-1051 introduced a reservation system in the YARN RM. This JIRA tracks 
> the recovery of the reservation system in case of a RM failover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3739) Add reservation system recovery to RM recovery process

2015-10-22 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3739:

Fix Version/s: 2.8.0

> Add reservation system recovery to RM recovery process
> --
>
> Key: YARN-3739
> URL: https://issues.apache.org/jira/browse/YARN-3739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Fix For: 2.8.0
>
> Attachments: YARN-3739-v1.patch, YARN-3739-v2.patch, 
> YARN-3739-v3.patch
>
>
> YARN-1051 introduced a reservation system in the YARN RM. This JIRA tracks 
> the recovery of the reservation system in case of a RM failover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4288) NodeManager restart should keep retrying to register to RM while connection exception happens during RM restart

2015-10-22 Thread Chang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969205#comment-14969205
 ] 

Chang Li commented on YARN-4288:


Hi [~djp], I have worked on YARN-4132, would that also address this issue?

> NodeManager restart should keep retrying to register to RM while connection 
> exception happens during RM restart
> ---
>
> Key: YARN-4288
> URL: https://issues.apache.org/jira/browse/YARN-4288
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
>
> When NM get restarted, NodeStatusUpdaterImpl will try to register to RM with 
> RPC which could throw following exceptions when RM get restarted at the same 
> time, like following exception shows:
> {noformat}
> 2015-08-17 14:35:59,434 ERROR nodemanager.NodeStatusUpdaterImpl 
> (NodeStatusUpdaterImpl.java:rebootNodeStatusUpdaterAndRegisterWithRM(222)) - 
> Unexpected error rebooting NodeStatusUpdater
> java.io.IOException: Failed on local exception: java.io.IOException: 
> Connection reset by peer; Host Details : local host is: "172.27.62.28"; 
> destination host is: "172.27.62.57":8025;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
> at org.apache.hadoop.ipc.Client.call(Client.java:1473)
> at org.apache.hadoop.ipc.Client.call(Client.java:1400)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy36.registerNodeManager(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy37.registerNodeManager(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:257)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.rebootNodeStatusUpdaterAndRegisterWithRM(NodeStatusUpdaterImpl.java:215)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager$2.run(NodeManager.java:304)
> Caused by: java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> at 
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at 
> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:514)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1072)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:967)
> 2015-08-17 14:35:59,436 FATAL nodemanager.NodeManager 
> (NodeManager.java:run(307)) - Error while rebooting NodeStatusUpdater.
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
> Failed on local exception: java.io.IOException: Connection reset by peer; 
> Host Details : local host is: "172.27.62.28"; destination host is: 
> "172.27.62.57":8025;
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.rebootNodeStatusUpdaterAndRegisterWithRM(NodeStatusUpdaterImpl.java:223)
> at 
> 

[jira] [Updated] (YARN-1565) Add a way for YARN clients to get critical YARN system properties from the RM

2015-10-22 Thread Pradeep Subrahmanion (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Subrahmanion updated YARN-1565:
---
Attachment: YARN-1565-003.patch

Fixed the mistake which caused the test to fail. Test passed locally. Attached 
latest path. Steve Loughran - Can you please submit this patch for me ?

> Add a way for YARN clients to get critical YARN system properties from the RM
> -
>
> Key: YARN-1565
> URL: https://issues.apache.org/jira/browse/YARN-1565
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Steve Loughran
> Attachments: YARN-1565-001.patch, YARN-1565-002.patch, 
> YARN-1565-003.patch
>
>
> If you are trying to build up an AM request, you need to know
> # the limits of memory, core  for the chosen queue
> # the existing YARN classpath
> # the path separator for the target platform (so your classpath comes out 
> right)
> # cluster OS: in case you need some OS-specific changes
> The classpath can be in yarn-site.xml, but a remote client may not have that. 
> The site-xml file doesn't list Queue resource limits, cluster OS or the path 
> separator.
> A way to query the RM for these values would make it easier for YARN clients 
> to build up AM submissions with less guesswork and client-side config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3738) Add support for recovery of reserved apps (running under dynamic queues) to Capacity Scheduler

2015-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969204#comment-14969204
 ] 

Hadoop QA commented on YARN-3738:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12767945/YARN-3738.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 2798723 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9521/console |


This message was automatically generated.

> Add support for recovery of reserved apps (running under dynamic queues) to 
> Capacity Scheduler
> --
>
> Key: YARN-3738
> URL: https://issues.apache.org/jira/browse/YARN-3738
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-3738.patch
>
>
> YARN-3736 persists the current state of the Plan to the RMStateStore. This 
> JIRA covers recovery of the Plan, i.e. dynamic reservation queues with 
> associated apps as part Capacity Scheduler failover mechanism.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3739) Add reservation system recovery to RM recovery process

2015-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969220#comment-14969220
 ] 

Hudson commented on YARN-3739:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1304 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1304/])
YARN-3739. Add reservation system recovery to RM recovery process. (adhoot: rev 
2798723a5443d04455b9d79c48d61f435ab52267)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/PlanFollower.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestGreedyReservationAgent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityOverTimePolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestInMemoryPlan.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/FairSchedulerPlanFollower.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestSimpleCapacityReplanner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystem.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestAlignedPlanner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestReservationSystemWithRMHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/PlanningAlgorithm.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestSchedulerPlanFollowerBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractSchedulerPlanFollower.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/PlanEdit.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/InMemoryPlan.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestNoOverCommitPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/CapacitySchedulerPlanFollower.java


> Add reservation system recovery to RM recovery process
> --
>
> Key: YARN-3739
> URL: https://issues.apache.org/jira/browse/YARN-3739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Fix For: 2.8.0
>
> Attachments: YARN-3739-v1.patch, YARN-3739-v2.patch, 
> YARN-3739-v3.patch
>
>
> YARN-1051 introduced a reservation system in the YARN RM. This JIRA tracks 
> the recovery of the reservation system in case of a RM failover.



--
This message was sent by Atlassian JIRA

[jira] [Commented] (YARN-4288) NodeManager restart should keep retrying to register to RM while connection exception happens during RM restart

2015-10-22 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969260#comment-14969260
 ] 

Junping Du commented on YARN-4288:
--

Thanks [~lichangleo] for point that JIRA out. 
I think YARN-4132 may not help in this case because RMProxy will use 
RetryPolicies.TRY_ONCE_THEN_FAIL when HA is enabled so it still get failed 
without any retry. We should have a way to address retry for NM to register to 
RM in RM HA case.

> NodeManager restart should keep retrying to register to RM while connection 
> exception happens during RM restart
> ---
>
> Key: YARN-4288
> URL: https://issues.apache.org/jira/browse/YARN-4288
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
>
> When NM get restarted, NodeStatusUpdaterImpl will try to register to RM with 
> RPC which could throw following exceptions when RM get restarted at the same 
> time, like following exception shows:
> {noformat}
> 2015-08-17 14:35:59,434 ERROR nodemanager.NodeStatusUpdaterImpl 
> (NodeStatusUpdaterImpl.java:rebootNodeStatusUpdaterAndRegisterWithRM(222)) - 
> Unexpected error rebooting NodeStatusUpdater
> java.io.IOException: Failed on local exception: java.io.IOException: 
> Connection reset by peer; Host Details : local host is: "172.27.62.28"; 
> destination host is: "172.27.62.57":8025;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
> at org.apache.hadoop.ipc.Client.call(Client.java:1473)
> at org.apache.hadoop.ipc.Client.call(Client.java:1400)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy36.registerNodeManager(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy37.registerNodeManager(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:257)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.rebootNodeStatusUpdaterAndRegisterWithRM(NodeStatusUpdaterImpl.java:215)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager$2.run(NodeManager.java:304)
> Caused by: java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> at 
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at 
> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:514)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1072)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:967)
> 2015-08-17 14:35:59,436 FATAL nodemanager.NodeManager 
> (NodeManager.java:run(307)) - Error while rebooting NodeStatusUpdater.
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
> Failed on local exception: java.io.IOException: Connection reset by peer; 
> Host Details : local host is: "172.27.62.28"; destination host is: 
> "172.27.62.57":8025;
> at 
> 

[jira] [Commented] (YARN-3739) Add recovery of reservation system to RM failover process

2015-10-22 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969137#comment-14969137
 ] 

Anubhav Dhoot commented on YARN-3739:
-

+1

> Add recovery of reservation system to RM failover process
> -
>
> Key: YARN-3739
> URL: https://issues.apache.org/jira/browse/YARN-3739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-3739-v1.patch, YARN-3739-v2.patch, 
> YARN-3739-v3.patch
>
>
> YARN-1051 introduced a reservation system in the YARN RM. This JIRA tracks 
> the recovery of the reservation system in case of a RM failover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup

2015-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970292#comment-14970292
 ] 

Hadoop QA commented on YARN-2729:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  21m 25s | Pre-patch trunk has 4 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 57s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 43s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 16s | The applied patch generated  1 
new checkstyle issues (total was 212, now 212). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 36s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   6m  3s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m  3s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   9m 13s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:red}-1{color} | yarn tests |  53m 43s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 117m 11s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens |
|   | hadoop.yarn.server.resourcemanager.resourcetracker.TestRMNMRPCResponseId |
|   | hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerEventLog |
|   | hadoop.yarn.server.resourcemanager.TestApplicationCleanup |
|   | hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs |
|   | hadoop.yarn.server.resourcemanager.TestResourceManager |
|   | hadoop.yarn.server.resourcemanager.TestApplicationMasterService |
|   | hadoop.yarn.server.resourcemanager.TestRMHAForNodeLabels |
|   | hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestWorkPreservingRMRestartForNodeLabel
 |
|   | 
hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 |
|   | hadoop.yarn.server.resourcemanager.resourcetracker.TestNMReconnect |
|   | hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppAttempt |
|   | hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry |
|   | hadoop.yarn.server.resourcemanager.monitor.TestSchedulingMonitor |
|   | hadoop.yarn.server.resourcemanager.TestRMProxyUsersConf |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerDynamicBehavior
 |
|   | hadoop.yarn.server.resourcemanager.TestApplicationACLs |
|   | hadoop.yarn.server.resourcemanager.TestContainerResourceUsage |
|   | hadoop.yarn.server.resourcemanager.scheduler.TestQueueMetrics |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate
 |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerFairShare |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAppRunnability |
|   | hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestSchedulingUpdate |
|   | hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768168/YARN-2729.20151023-1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 124a412 |
| Pre-patch Findbugs warnings | 

[jira] [Commented] (YARN-2913) Fair scheduler should have ability to set MaxResourceDefault for each queue

2015-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970359#comment-14970359
 ] 

Hadoop QA commented on YARN-2913:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  23m 43s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   9m  5s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 49s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | site |   3m 44s | Site still builds. |
| {color:red}-1{color} | checkstyle |   0m 54s | The applied patch generated  2 
new checkstyle issues (total was 37, now 35). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 38s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  60m 17s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 113m 52s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodeLabels |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768181/YARN-2913.v5.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle site |
| git revision | trunk / 124a412 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9535/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9535/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9535/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9535/console |


This message was automatically generated.

> Fair scheduler should have ability to set MaxResourceDefault for each queue
> ---
>
> Key: YARN-2913
> URL: https://issues.apache.org/jira/browse/YARN-2913
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-2913.v1.patch, YARN-2913.v2.patch, 
> YARN-2913.v3.patch, YARN-2913.v4.patch, YARN-2913.v5.patch
>
>
> Queues that are created on the fly have the max resource of the entire 
> cluster. Fair Scheduler should have a default maxResource to control the 
> maxResource of those queues



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4127) RM fail with noAuth error if switched from failover mode to non-failover mode

2015-10-22 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4127:
---
Attachment: YARN-4127-branch-2.7.01.patch

> RM fail with noAuth error if switched from failover mode to non-failover mode 
> --
>
> Key: YARN-4127
> URL: https://issues.apache.org/jira/browse/YARN-4127
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Jian He
>Assignee: Varun Saxena
> Attachments: YARN-4127-branch-2.7.01.patch, YARN-4127.01.patch, 
> YARN-4127.02.patch
>
>
> The scenario is that RM failover was initially enabled, so the zkRootNodeAcl 
> is by default set with the *RM ID* in the ACL string 
> If RM failover is then switched to be disabled,  it cannot load data from ZK 
> and fail with noAuth error. After I reset the root node ACL, it again can 
> access.
> {code}
> 15/09/08 14:28:34 ERROR resourcemanager.ResourceManager: Failed to 
> load/recover state
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
>   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$SafeTransaction.commit(ZKRMStateStore.java:1009)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.safeSetData(ZKRMStateStore.java:985)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:374)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579)
>   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:973)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1010)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1010)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1050)
>   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1194)
> {code}
>  the problem may be that in non-failover mode, RM doesn't use the *RM-ID* to 
> connect with ZK and thus fail with no Auth error.
> We should be able to switch failover on and off with no interruption to the 
> user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2015-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970416#comment-14970416
 ] 

Hadoop QA commented on YARN-3216:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 12s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 5 new or modified test files. |
| {color:green}+1{color} | javac |   7m 59s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 29s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 49s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m 12s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 31s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  57m 47s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  98m 29s | |
\\
\\
|| Reason || Tests ||
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart 
|
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768201/0010-YARN-3216.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 124a412 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9538/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9538/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9538/console |


This message was automatically generated.

> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, 
> 0003-YARN-3216.patch, 0004-YARN-3216.patch, 0005-YARN-3216.patch, 
> 0006-YARN-3216.patch, 0007-YARN-3216.patch, 0008-YARN-3216.patch, 
> 0009-YARN-3216.patch, 0010-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-10-22 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4041:
--
Attachment: 0005-YARN-4041.patch

Yes [~jlowe], we can compare with token itself and do wait in smaller units. 
With new patch, I kept a total wait time of 1sec but with 10ms units. Locally 
test run seems more faster. Uploading a new patch.

> Slow delegation token renewal can severely prolong RM recovery
> --
>
> Key: YARN-4041
> URL: https://issues.apache.org/jira/browse/YARN-4041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, 
> 0003-YARN-4041.patch, 0004-YARN-4041.patch, 0005-YARN-4041.patch
>
>
> When the RM does a work-preserving restart it synchronously tries to renew 
> delegation tokens for every active application.  If a token server happens to 
> be down or is running slow and a lot of the active apps were using tokens 
> from that server then it can have a huge impact on the time it takes the RM 
> to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3738) Add support for recovery of reserved apps (running under dynamic queues) to Capacity Scheduler

2015-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970352#comment-14970352
 ] 

Hadoop QA commented on YARN-3738:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 31s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   9m  8s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 33s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 26s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 56s | The applied patch generated  2 
new checkstyle issues (total was 189, now 189). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 43s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 37s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 39s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  62m 33s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 108m 12s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority |
|   | hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
 |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesForCSWithPartitions |
|   | hadoop.yarn.server.resourcemanager.TestRM |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768186/YARN-3738-v3.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 124a412 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9536/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9536/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9536/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9536/console |


This message was automatically generated.

> Add support for recovery of reserved apps (running under dynamic queues) to 
> Capacity Scheduler
> --
>
> Key: YARN-3738
> URL: https://issues.apache.org/jira/browse/YARN-3738
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-3738-v2.patch, YARN-3738-v3.patch, YARN-3738.patch
>
>
> YARN-3736 persists the current state of the Plan to the RMStateStore. This 
> JIRA covers recovery of the Plan, i.e. dynamic reservation queues with 
> associated apps as part Capacity Scheduler failover mechanism.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4127) RM fail with noAuth error if switched from failover mode to non-failover mode

2015-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970427#comment-14970427
 ] 

Hadoop QA commented on YARN-4127:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768217/YARN-4127-branch-2.7.01.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 124a412 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9539/console |


This message was automatically generated.

> RM fail with noAuth error if switched from failover mode to non-failover mode 
> --
>
> Key: YARN-4127
> URL: https://issues.apache.org/jira/browse/YARN-4127
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Jian He
>Assignee: Varun Saxena
> Attachments: YARN-4127-branch-2.7.01.patch, YARN-4127.01.patch, 
> YARN-4127.02.patch
>
>
> The scenario is that RM failover was initially enabled, so the zkRootNodeAcl 
> is by default set with the *RM ID* in the ACL string 
> If RM failover is then switched to be disabled,  it cannot load data from ZK 
> and fail with noAuth error. After I reset the root node ACL, it again can 
> access.
> {code}
> 15/09/08 14:28:34 ERROR resourcemanager.ResourceManager: Failed to 
> load/recover state
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
>   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$SafeTransaction.commit(ZKRMStateStore.java:1009)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.safeSetData(ZKRMStateStore.java:985)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:374)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579)
>   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:973)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1010)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1010)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1050)
>   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1194)
> {code}
>  the problem may be that in non-failover mode, RM doesn't use the *RM-ID* to 
> connect with ZK and thus fail with no Auth error.
> We should be able to switch failover on and off with no interruption to the 
> user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4292) ResourceUtilization should be a part of NodeInfo REST API

2015-10-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970285#comment-14970285
 ] 

Wangda Tan commented on YARN-4292:
--

Thanks for interesting about this [~sunilg]. Go ahead please!

> ResourceUtilization should be a part of NodeInfo REST API
> -
>
> Key: YARN-4292
> URL: https://issues.apache.org/jira/browse/YARN-4292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >