[jira] [Commented] (YARN-2934) Improve handling of container's stderr

2015-11-05 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991374#comment-14991374
 ] 

nijel commented on YARN-2934:
-

thanks [~Naganarasimha] for the patch

Few minor comments/doubts

1.
{code}
FileStatus[] listStatus =
   fileSystem.listStatus(containerLogDir, new PathFilter() {
 @Override
 public boolean accept(Path path) {
   return FilenameUtils.wildcardMatch(path.getName(),
   errorFileNamePattern, IOCase.INSENSITIVE);
 }
   });
{code}
What if this give multiple error files ? 


2. 
{code}
 } catch (IOException e) {
LOG.warn("Failed while trying to read container's error log", e);
  }
{code}
Can this be error log ? I think there should not be any exception in reading 
the file. If there is an error, better to log error log



> Improve handling of container's stderr 
> ---
>
> Key: YARN-2934
> URL: https://issues.apache.org/jira/browse/YARN-2934
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Gera Shegalov
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-2934.v1.001.patch, YARN-2934.v1.002.patch, 
> YARN-2934.v1.003.patch
>
>
> Most YARN applications redirect stderr to some file. That's why when 
> container launch fails with {{ExitCodeException}} the message is empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4331) Restarting NodeManager leaves orphaned containers

2015-11-05 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991729#comment-14991729
 ] 

Jason Lowe commented on YARN-4331:
--

SAMZA-750 is discussing RM restart, but this is NM restart.  They are related 
but mostly independent features, and one can be enabled without the other.  
Check if yarn.nodemanager.recovery.enabled=true on that node.  If you want to 
support rolling upgrades of the entire YARN cluster they both need to be 
enabled, but if you simply want to restart/upgrade a NodeManager independent of 
the ResourceManager then you can turn on nodemanager restart without 
resourcemanager restart.  NodeManager restart should be mostly invisible to 
applications except for interruptions in the auxiliary services on that node 
(e.g.: shuffle handler).

bq. if the application master (AM) is dead, shouldn't it be responsibility of 
the container to kill itself?

That is completely application framework dependent and not the responsibility 
of YARN.  A container is completely under the control of the application (i.e.: 
user code) and doesn't have to have any YARN code in it at all.  Theoretically 
one could write an application entirely in C or Go or whatever that generates 
compatible protocol buffers and adheres to the YARN RPC protocol semantics.  No 
YARN code would be running at all for that application or in any of its 
containers at that point.  (I know of no such applications, but it is 
theoretically possible.)

Also it is not a requirement that containers have an umbilical connection to 
the ApplicationMaster.  That choice is up to the application, and some 
applications don't do this (like the distributed shell sample YARN 
application).  MapReduce is an application framework that does have an 
umbilical connection, but if there's a bug in that app where tasks don't 
properly recognize the umbilical was severed then that's a bug in the app and 
not a bug in YARN.  Once the nodemanager died on the node, YARN lost all 
ability to control containers on that node.  If the container decides not to 
exit then that's an issue with the app more than an issue with YARN.  There's 
not much YARN can do about it since YARN's actor on that node is no longer 
present.

If NM restart is not enabled then the nodemanager should _not_ be killed with 
SIGKILL.  Simply kill it with SIGTERM and the nodemanager should attempt to 
kill all containers before shutting down.  Killing the NM with SIGKILL is 
normally only done when performing a work-preserving restart on the NM, and 
that requres that yarn.nodemanager.recovery.enabled=true on that node to 
function properly.

> Restarting NodeManager leaves orphaned containers
> -
>
> Key: YARN-4331
> URL: https://issues.apache.org/jira/browse/YARN-4331
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.7.1
>Reporter: Joseph
>Priority: Critical
>
> We are seeing a lot of orphaned containers running in our production clusters.
> I tried to simulate this locally on my machine and can replicate the issue by 
> killing nodemanager.
> I'm running Yarn 2.7.1 with RM state stored in zookeeper and deploying samza 
> jobs.
> Steps:
> {quote}1. Deploy a job 
> 2. Issue a kill -9 signal to nodemanager 
> 3. We should see the AM and its container running without nodemanager
> 4. AM should die but the container still keeps running
> 5. Restarting nodemanager brings up new AM and container but leaves the 
> orphaned container running in the background
> {quote}
> This is effectively causing double processing of data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS

2015-11-05 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991492#comment-14991492
 ] 

Steve Loughran commented on YARN-3946:
--

I'd like to see this in application reports, so that client-side applications 
can display the details

> Allow fetching exact reason as to why a submitted app is in ACCEPTED state in 
> CS
> 
>
> Key: YARN-3946
> URL: https://issues.apache.org/jira/browse/YARN-3946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sumit Nigam
>Assignee: Naganarasimha G R
> Attachments: YARN-3946.v1.001.patch, YARN3946_attemptDiagnistic 
> message.png
>
>
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS

2015-11-05 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991501#comment-14991501
 ] 

Naganarasimha G R commented on YARN-3946:
-

Thanks [~steve_l], I am reworking on [~wangda] comments will consider this and 
upload a patch .

> Allow fetching exact reason as to why a submitted app is in ACCEPTED state in 
> CS
> 
>
> Key: YARN-3946
> URL: https://issues.apache.org/jira/browse/YARN-3946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sumit Nigam
>Assignee: Naganarasimha G R
> Attachments: YARN-3946.v1.001.patch, YARN3946_attemptDiagnistic 
> message.png
>
>
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4331) Restarting NodeManager leaves orphaned containers

2015-11-05 Thread Joseph (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991575#comment-14991575
 ] 

Joseph commented on YARN-4331:
--

[~jlowe] Thanks for your comments, very helpful.
yarn.resourcemanager.work-preserving-recovery.enabled is indeed set to false. 
The reason we have set it to false is because we run samza jobs on the yarn 
cluster and they don't work well with this feature turned on 
(https://issues.apache.org/jira/browse/SAMZA-750).

Apologies for my ignorance in this area, but if the application master (AM) is 
dead, shouldn't it be responsibility of the container to kill itself? I'd 
imagine every container should be required to heartbeat to its application 
master and killing itself if it misses a few?


> Restarting NodeManager leaves orphaned containers
> -
>
> Key: YARN-4331
> URL: https://issues.apache.org/jira/browse/YARN-4331
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.7.1
>Reporter: Joseph
>Priority: Critical
>
> We are seeing a lot of orphaned containers running in our production clusters.
> I tried to simulate this locally on my machine and can replicate the issue by 
> killing nodemanager.
> I'm running Yarn 2.7.1 with RM state stored in zookeeper and deploying samza 
> jobs.
> Steps:
> {quote}1. Deploy a job 
> 2. Issue a kill -9 signal to nodemanager 
> 3. We should see the AM and its container running without nodemanager
> 4. AM should die but the container still keeps running
> 5. Restarting nodemanager brings up new AM and container but leaves the 
> orphaned container running in the background
> {quote}
> This is effectively causing double processing of data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out

2015-11-05 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-4312:
--
Fix Version/s: 2.6.3

Cherry-picked the fix to branch-2.6. Thanks [~varun_saxena]!

> TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of 
> the test cases time out 
> 
>
> Key: YARN-4312
> URL: https://issues.apache.org/jira/browse/YARN-4312
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1, 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: 2.7.2, 2.6.3
>
> Attachments: YARN-4312-branch-2.6.01.patch, 
> YARN-4312-branch-2.7.01.patch
>
>
> These timeouts happen because we do ZK sync operation on RM startup after 
> YARN-3798 which delays RM startup a bit making the timeouts of 5 s. too small 
> for a couple of tests in TestSubmitApplicationWithRMHA.
> {noformat}
> testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.162 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234)
> 
> testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.146 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> 

[jira] [Commented] (YARN-3223) Resource update during NM graceful decommission

2015-11-05 Thread Brook Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992234#comment-14992234
 ] 

Brook Zhou commented on YARN-3223:
--

Unit tests that failed were not affected by patch. May be related to 
[YARN-2634|https://issues.apache.org/jira/browse/YARN-2634] 

> Resource update during NM graceful decommission
> ---
>
> Key: YARN-3223
> URL: https://issues.apache.org/jira/browse/YARN-3223
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Junping Du
>Assignee: Brook Zhou
> Attachments: YARN-3223-v0.patch, YARN-3223-v1.patch, 
> YARN-3223-v2.patch
>
>
> During NM graceful decommission, we should handle resource update properly, 
> include: make RMNode keep track of old resource for possible rollback, keep 
> available resource to 0 and used resource get updated when
> container finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2882) Introducing container types

2015-11-05 Thread Konstantinos Karanasos (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992207#comment-14992207
 ] 

Konstantinos Karanasos commented on YARN-2882:
--

Thanks for the feedback, [~asuresh].
I will address point (2), and will fix the patch so that it applies to the 
yarn-2877 branch I created off apache trunk yesterday.

Moreover, I will make sure we align with YARN-3116 that introduced container 
types for distinguishing the AM container.

Regarding the builder pattern, do you think we should address that here or is 
it better to create a separate JIRA for having a more principled way to create 
new instances of resources?

> Introducing container types
> ---
>
> Key: YARN-2882
> URL: https://issues.apache.org/jira/browse/YARN-2882
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: yarn-2882.patch
>
>
> This JIRA introduces the notion of container types.
> We propose two initial types of containers: guaranteed-start and queueable 
> containers.
> Guaranteed-start are the existing containers, which are allocated by the 
> central RM and are instantaneously started, once allocated.
> Queueable is a new type of container, which allows containers to be queued in 
> the NM, thus their execution may be arbitrarily delayed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id > 9999)

2015-11-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992803#comment-14992803
 ] 

Hadoop QA commented on YARN-3840:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
31s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 29s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 14s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
57s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
18s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 13s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in 
trunk has 3 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 57s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 18s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 17s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 10s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 10s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
56s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 54s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 18s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 47s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 53s 
{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the 
patch passed with JDK v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK 
v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 23s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_60. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 54s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 39s 
{color} | {color:green} hadoop-mapreduce-client-app in the patch passed with 
JDK v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 1s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_79. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 7s 
{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the 
patch passed with JDK v1.7.0_79. {color} |
| 

[jira] [Commented] (YARN-4219) New levelDB cache storage for timeline v1.5

2015-11-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992751#comment-14992751
 ] 

Hadoop QA commented on YARN-4219:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 1s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
34s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
42s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 11s 
{color} | {color:red} Patch generated 4 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 (total was 59, now 63). {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 20s 
{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the 
patch passed with JDK v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 43s 
{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the 
patch passed with JDK v1.7.0_79. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
27s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 17m 21s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.7.1 Server=1.7.1 
Image:test-patch-base-hadoop-date2015-11-05 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12770890/YARN-4219-trunk.003.patch
 |
| JIRA Issue | YARN-4219 |
| Optional Tests |  asflicense  javac  javadoc  mvninstall  unit  xml  compile  
findbugs  checkstyle  |
| uname | Linux 2d0c4cb2ffed 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Updated] (YARN-4334) Ability to avoid ResourceManager recovery if state store is "too old"

2015-11-05 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4334:
---
Attachment: YARN-4334.wip.patch

upload a prototype patch, which does heartbeat to LeveldbRMStateStore and on RM 
recovery it checks whether statestore is expired

> Ability to avoid ResourceManager recovery if state store is "too old"
> -
>
> Key: YARN-4334
> URL: https://issues.apache.org/jira/browse/YARN-4334
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: Chang Li
> Attachments: YARN-4334.wip.patch
>
>
> There are times when a ResourceManager has been down long enough that 
> ApplicationMasters and potentially external client-side monitoring mechanisms 
> have given up completely.  If the ResourceManager starts back up and tries to 
> recover we can get into situations where the RM launches new application 
> attempts for the AMs that gave up, but then the client _also_ launches 
> another instance of the app because it assumed everything was dead.
> It would be nice if the RM could be optionally configured to avoid trying to 
> recover if the state store was "too old."  The RM would come up without any 
> applications recovered, but we would avoid a double-submission situation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4219) New levelDB cache storage for timeline v1.5

2015-11-05 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-4219:

Attachment: YARN-4219-trunk.003.patch

Attaching 003 patch to fix javac warnings and javadoc errors. 

> New levelDB cache storage for timeline v1.5
> ---
>
> Key: YARN-4219
> URL: https://issues.apache.org/jira/browse/YARN-4219
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4219-trunk.001.patch, YARN-4219-trunk.002.patch, 
> YARN-4219-trunk.003.patch
>
>
> We need to have an "offline" caching storage for timeline server v1.5 after 
> the changes in YARN-3942. The in memory timeline storage may run into OOM 
> issues when used as a cache storage for entity file timeline storage. We can 
> refactor the code and have a level db based caching storage for this use 
> case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4334) Ability to avoid ResourceManager recovery if state store is "too old"

2015-11-05 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-4334:


 Summary: Ability to avoid ResourceManager recovery if state store 
is "too old"
 Key: YARN-4334
 URL: https://issues.apache.org/jira/browse/YARN-4334
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Jason Lowe


There are times when a ResourceManager has been down long enough that 
ApplicationMasters and potentially external client-side monitoring mechanisms 
have given up completely.  If the ResourceManager starts back up and tries to 
recover we can get into situations where the RM launches new application 
attempts for the AMs that gave up, but then the client _also_ launches another 
instance of the app because it assumed everything was dead.

It would be nice if the RM could be optionally configured to avoid trying to 
recover if the state store was "too old."  The RM would come up without any 
applications recovered, but we would avoid a double-submission situation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4334) Ability to avoid ResourceManager recovery if state store is "too old"

2015-11-05 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992405#comment-14992405
 ] 

Jason Lowe commented on YARN-4334:
--

This would probably involve some sort of "heartbeat" to the state store to keep 
track of an approximate last uptime of the ResourceManager.  We would not want 
to update the state store very often, probably only on the order of a minute or 
so.

One key use-case for this is Oozie.  Oozie launchers have a known problem where 
when they restart they will re-launch applications.  If the launcher AM gives 
up and the sub-job's AM gives up, then when the RM recovers and re-launches AM 
attempts for both jobs the launcher will re-submit the job.  Then there will be 
two instances of the sub-job running which is undesirable.  I suspect there are 
other job-launches-job situations besides Oozie where this would also be 
problematic.

> Ability to avoid ResourceManager recovery if state store is "too old"
> -
>
> Key: YARN-4334
> URL: https://issues.apache.org/jira/browse/YARN-4334
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: Chang Li
>
> There are times when a ResourceManager has been down long enough that 
> ApplicationMasters and potentially external client-side monitoring mechanisms 
> have given up completely.  If the ResourceManager starts back up and tries to 
> recover we can get into situations where the RM launches new application 
> attempts for the AMs that gave up, but then the client _also_ launches 
> another instance of the app because it assumed everything was dead.
> It would be nice if the RM could be optionally configured to avoid trying to 
> recover if the state store was "too old."  The RM would come up without any 
> applications recovered, but we would avoid a double-submission situation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4334) Ability to avoid ResourceManager recovery if state store is "too old"

2015-11-05 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li reassigned YARN-4334:
--

Assignee: Chang Li

> Ability to avoid ResourceManager recovery if state store is "too old"
> -
>
> Key: YARN-4334
> URL: https://issues.apache.org/jira/browse/YARN-4334
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: Chang Li
>
> There are times when a ResourceManager has been down long enough that 
> ApplicationMasters and potentially external client-side monitoring mechanisms 
> have given up completely.  If the ResourceManager starts back up and tries to 
> recover we can get into situations where the RM launches new application 
> attempts for the AMs that gave up, but then the client _also_ launches 
> another instance of the app because it assumed everything was dead.
> It would be nice if the RM could be optionally configured to avoid trying to 
> recover if the state store was "too old."  The RM would come up without any 
> applications recovered, but we would avoid a double-submission situation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4330) MiniYARNCluster prints multiple Failed to instantiate default resource calculator warning messages

2015-11-05 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992473#comment-14992473
 ] 

Varun Saxena commented on YARN-4330:


Its not retrying per say. Its just that we are trying to get memory and CPU 
info at multiple places. And at some places(for monitoring) its either trying 
to read the calculator plugin class from config and at some just directly the 
default one(while trying to detect NM's CPU/memory capability). As for default 
resource calculator plugin, Mac is not supported, hence the 
UnsupportedOperationException.

While monitoring, if resource calculator plugin class is not configured, trying 
to load default calculator plugin(and hence this code path) is the default 
behavior. We cant really switch it off. But we need not print the whole stack 
trace  for UnsupportedOperationException.

For MiniYARNCluster, we can do a few more things. 
When we try and load default resource calculator plugin via 
NodeManagerHardwareUtils class(to detect NMs' CPU/memory), we can switch off 
this behavior via a config. Code can be rearranged so that this error doesnt 
show up and we check the config first. And this config can be set to false in 
MiniYarnCluster
Also, a dummy plugin implementation can also be included in MiniYarnCluster and 
set in config so that it does not try to load default resource calculator

> MiniYARNCluster prints multiple  Failed to instantiate default resource 
> calculator warning messages
> ---
>
> Key: YARN-4330
> URL: https://issues.apache.org/jira/browse/YARN-4330
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test, yarn
>Affects Versions: 2.8.0
> Environment: OSX, JUnit
>Reporter: Steve Loughran
>Assignee: Varun Saxena
>Priority: Blocker
>
> Whenever I try to start a MiniYARNCluster on Branch-2 (commit #0b61cca), I 
> see multiple stack traces warning me that a resource calculator plugin could 
> not be created
> {code}
> (ResourceCalculatorPlugin.java:getResourceCalculatorPlugin(184)) - 
> java.lang.UnsupportedOperationException: Could not determine OS: Failed to 
> instantiate default resource calculator.
> java.lang.UnsupportedOperationException: Could not determine OS
> {code}
> This is a minicluster. It doesn't need resource calculation. It certainly 
> doesn't need test logs being cluttered with even more stack traces which will 
> only generate false alarms about tests failing. 
> There needs to be a way to turn this off, and the minicluster should have it 
> that way by default.
> Being ruthless and marking as a blocker, because its a fairly major 
> regression for anyone testing with the minicluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2859) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster

2015-11-05 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2859:
--
Fix Version/s: 2.6.3

> ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster
> --
>
> Key: YARN-2859
> URL: https://issues.apache.org/jira/browse/YARN-2859
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Hitesh Shah
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Fix For: 2.8.0, 2.7.2, 2.6.3
>
> Attachments: YARN-2859.txt
>
>
> In mini cluster, a random port should be used. 
> Also, the config is not updated to the host that the process got bound to.
> {code}
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(722)) - MiniYARN ApplicationHistoryServer 
> address: localhost:10200
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(724)) - MiniYARN ApplicationHistoryServer 
> web address: 0.0.0.0:8188
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4219) New levelDB cache storage for timeline v1.5

2015-11-05 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992814#comment-14992814
 ] 

Xuan Gong commented on YARN-4219:
-

+1. The last patch looks good to me. Let us wait for several days. If there are 
no other comments, I will commit this later.

[~jlowe] and [~jeagles] Do we have any comments for this ?

> New levelDB cache storage for timeline v1.5
> ---
>
> Key: YARN-4219
> URL: https://issues.apache.org/jira/browse/YARN-4219
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.8.0
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4219-trunk.001.patch, YARN-4219-trunk.002.patch, 
> YARN-4219-trunk.003.patch
>
>
> We need to have an "offline" caching storage for timeline server v1.5 after 
> the changes in YARN-3942. The in memory timeline storage may run into OOM 
> issues when used as a cache storage for entity file timeline storage. We can 
> refactor the code and have a level db based caching storage for this use 
> case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4219) New levelDB cache storage for timeline v1.5

2015-11-05 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4219:

Affects Version/s: 2.8.0

> New levelDB cache storage for timeline v1.5
> ---
>
> Key: YARN-4219
> URL: https://issues.apache.org/jira/browse/YARN-4219
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.8.0
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4219-trunk.001.patch, YARN-4219-trunk.002.patch, 
> YARN-4219-trunk.003.patch
>
>
> We need to have an "offline" caching storage for timeline server v1.5 after 
> the changes in YARN-3942. The in memory timeline storage may run into OOM 
> issues when used as a cache storage for entity file timeline storage. We can 
> refactor the code and have a level db based caching storage for this use 
> case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4335) Introducing resource request types

2015-11-05 Thread Konstantinos Karanasos (JIRA)
Konstantinos Karanasos created YARN-4335:


 Summary: Introducing resource request types
 Key: YARN-4335
 URL: https://issues.apache.org/jira/browse/YARN-4335
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Konstantinos Karanasos
Assignee: Konstantinos Karanasos


YARN-2882 introduced container types that are internal (not user-facing) and 
are used by the ContainerManager during execution at the NM.

With this JIRA we are introducing (user-facing) resource request types that are 
used by the AM to specify the type of the ResourceRequest.

We will initially support two resource request types: CONSERVATIVE and 
OPTIMISTIC.
CONSERVATIVE resource requests will be handed internally to containers of 
GUARANTEED type, whereas OPTIMISTIC resource requests will be handed to 
QUEUEABLE containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2556) Tool to measure the performance of the timeline server

2015-11-05 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993149#comment-14993149
 ] 

Naganarasimha G R commented on YARN-2556:
-

Hi [~sjlee0] & [~lichangleo]
In Some Forum User Query came across an issue where each Insert was taking more 
time when the existing Level DB data size was more(about 3GB disk size), 
continuously at the rate of 10 inserts per sec for span of 15 mins  . During 
each Insertion, query is happening to pick the Date for identifying 
{{CreationTime}} which i presume to be the reason for ATS inserts being slow. 
May be we can optimize the test to have some initial Level DB Data and then 
test the performance. This would also be useful to evaluate the performance of 
ATSV1.5. Thoughts ?


> Tool to measure the performance of the timeline server
> --
>
> Key: YARN-2556
> URL: https://issues.apache.org/jira/browse/YARN-2556
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Jonathan Eagles
>Assignee: Chang Li
>  Labels: BB2015-05-TBR
> Fix For: 2.8.0
>
> Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, 
> YARN-2556.1.patch, YARN-2556.10.patch, YARN-2556.11.patch, 
> YARN-2556.12.patch, YARN-2556.13.patch, YARN-2556.13.whitespacefix.patch, 
> YARN-2556.14.patch, YARN-2556.14.whitespacefix.patch, YARN-2556.15.patch, 
> YARN-2556.2.patch, YARN-2556.3.patch, YARN-2556.4.patch, YARN-2556.5.patch, 
> YARN-2556.6.patch, YARN-2556.7.patch, YARN-2556.8.patch, YARN-2556.9.patch, 
> YARN-2556.patch, yarn2556.patch, yarn2556.patch, yarn2556_wip.patch
>
>
> We need to be able to understand the capacity model for the timeline server 
> to give users the tools they need to deploy a timeline server with the 
> correct capacity.
> I propose we create a mapreduce job that can measure timeline server write 
> and read performance. Transactions per second, I/O for both read and write 
> would be a good start.
> This could be done as an example or test job that could be tied into gridmix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2882) Introducing container types

2015-11-05 Thread Konstantinos Karanasos (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992845#comment-14992845
 ] 

Konstantinos Karanasos commented on YARN-2882:
--

Following our offline discussion with [~asuresh], we are going to integrate 
this JIRA with YARN-3116.

In particular, we will extend the ContainerType by substituting the TASK field 
with the following two: GUARANTEED and QUEUEABLE.

> Introducing container types
> ---
>
> Key: YARN-2882
> URL: https://issues.apache.org/jira/browse/YARN-2882
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: yarn-2882.patch
>
>
> This JIRA introduces the notion of container types.
> We propose two initial types of containers: guaranteed-start and queueable 
> containers.
> Guaranteed-start are the existing containers, which are allocated by the 
> central RM and are instantaneously started, once allocated.
> Queueable is a new type of container, which allows containers to be queued in 
> the NM, thus their execution may be arbitrarily delayed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)