[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-06-04 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573673#comment-14573673
 ] 

Wangda Tan commented on YARN-3769:
--

Thanks [~eepayne], I reassigned it to me, I will upload a design doc shortly 
for review.

 Preemption occurring unnecessarily because preemption doesn't consider user 
 limit
 -

 Key: YARN-3769
 URL: https://issues.apache.org/jira/browse/YARN-3769
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0, 2.7.0, 2.8.0
Reporter: Eric Payne
Assignee: Wangda Tan

 We are seeing the preemption monitor preempting containers from queue A and 
 then seeing the capacity scheduler giving them immediately back to queue A. 
 This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-06-04 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned YARN-3769:


Assignee: Wangda Tan  (was: Eric Payne)

 Preemption occurring unnecessarily because preemption doesn't consider user 
 limit
 -

 Key: YARN-3769
 URL: https://issues.apache.org/jira/browse/YARN-3769
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0, 2.7.0, 2.8.0
Reporter: Eric Payne
Assignee: Wangda Tan

 We are seeing the preemption monitor preempting containers from queue A and 
 then seeing the capacity scheduler giving them immediately back to queue A. 
 This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3768) Index out of range exception with environment variables without values

2015-06-04 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu reassigned YARN-3768:
---

Assignee: zhihai xu

 Index out of range exception with environment variables without values
 --

 Key: YARN-3768
 URL: https://issues.apache.org/jira/browse/YARN-3768
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.5.0
Reporter: Joe Ferner
Assignee: zhihai xu

 Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range 
 exception occurs if an environment variable is encountered without a value.
 I believe this occurs because java will not return empty strings from the 
 split method. Similar to this 
 http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-06-04 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573830#comment-14573830
 ] 

Zhijie Shen commented on YARN-3051:
---

[~varun_saxena], thanks for working on the new patch. It seems to be a complete 
reader side protype, which is nice. I still need some time to take thorough 
look, but I'd like to my thoughts about the reader APIs.

IMHO, we may want to have or start with two sets of APIs: 1) the APIs to query 
the raw data and 2) the APIs to query the aggregation data.

1) APIs to query the raw data:

We would like to have the APIs to let users zoom into the details about their 
jobs, and give users the freedom to fetch the raw data and do the customized 
process that ATS will not do. For example, Hive/Pig on Tez need this set of 
APIs to get the framework specific data, process it and render it on their on 
web UI. We basically need 2 such APIs.

a. Get a single entity given an ID that uniquely locates the entity in the 
backend (We assume the uniqueness is assured somehow). 
* This API can be extended or split into multiple sub-APIs to get a single 
element of the entity, such as events, metrics and configuration.

b. Search for a set entities that match the given predicates.
* We can start from the predicates that we used in ATS v1 (also for the 
compatibility purpose), but some of them may no longer apply.
* We may want to add more predicates to check the newly added element in v2.
* With more predefined semantics, we can even query entities that belong to 
some container/attempt/application and so on.

2) APIs to query the aggregation data

These are complete new in v2 and are the advantage. With the aggregation, we 
can answer some statistical questions about the job, the user, the queue, the 
flow and the cluster. These APIs are not directing users to the individual 
entities put by the application, but returning statistical data (carried by 
Application|User|Queue|Flow|ClusterEntity). 

a. Get certain level aggregation data given the ID of the concept on that 
level, i.e.,  the job, the user, the queue, the flow and the cluster.

b. Search for the the jobs, the users, the queues, the flows and the clusters 
given predicates.
* For the predicates, we could learn from the examples in hRaven.


 [Storage abstraction] Create backing storage read interface for ATS readers
 ---

 Key: YARN-3051
 URL: https://issues.apache.org/jira/browse/YARN-3051
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Varun Saxena
 Attachments: YARN-3051-YARN-2928.003.patch, 
 YARN-3051-YARN-2928.03.patch, YARN-3051.wip.02.YARN-2928.patch, 
 YARN-3051.wip.patch, YARN-3051_temp.patch


 Per design in YARN-2928, create backing storage read interface that can be 
 implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2716) Refactor ZKRMStateStore retry code with Apache Curator

2015-06-04 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573911#comment-14573911
 ] 

Jian He commented on YARN-2716:
---

Thanks Karthik for working this ! This simplifies things a lot. 
Mostly good, few comments and questions:
- these two booleans not used, maybe removed.
{{private boolean create = false, delete = false;
}}
- is this going to be done in this jira?
{code} // TODO: Check deleting appIdRemovePath works recursively
safeDelete(appIdRemovePath);{code}
- will the safeDelete throw noNodeExist exception if deleting a non-existing 
zone?
- {{new RetryNTimes(numRetries, zkSessionTimeout / numRetries));}},  I think 
the second parameter should be zkRetryInterval; Also, I have a question why in 
HA case, zkRetryInterval is calculated as below
{code}
if (HAUtil.isHAEnabled(conf)) {
  zkRetryInterval = zkSessionTimeout / 
numRetries;
{code}

- I found this 
[thread|http://mail-archives.apache.org/mod_mbox/curator-user/201410.mbox/%3cd076bc8e.9ef1%25sreichl...@chegg.com%3E]
 saying that blockUntilConnect is not needed to call; Suppose it’s needed, I 
think the zkSessionTimeout value is too small, it would be 
numRetries*numRetryInterval, otherwise RM will exit soon after retry 10s by 
default.
{code}
if (!curatorFramework.blockUntilConnected(
zkSessionTimeout, TimeUnit.MILLISECONDS)) {
  LOG.fatal(Couldn't establish connection to ZK server);
  throw new YarnRuntimeException(Couldn't connect to ZK server);
}
{code}
- remove this ?
{code}
//  @Override
//  public ZooKeeper getNewZooKeeper() throws IOException {
//return client;
//  }
{code}
-  I think testZKSessionTimeout may be removed too ? it looks like a test for 
curator 


 Refactor ZKRMStateStore retry code with Apache Curator
 --

 Key: YARN-2716
 URL: https://issues.apache.org/jira/browse/YARN-2716
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Karthik Kambatla
 Attachments: yarn-2716-1.patch, yarn-2716-prelim.patch, 
 yarn-2716-prelim.patch, yarn-2716-super-prelim.patch


 Per suggestion by [~kasha] in YARN-2131,  it's nice to use curator to 
 simplify the retry logic in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3508) Preemption processing occuring on the main RM dispatcher

2015-06-04 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573878#comment-14573878
 ] 

Wangda Tan commented on YARN-3508:
--

Trying to better understand this problem: I'm not sure where is bottleneck. If 
CapacityScheduler becomes bottleneck, move preemption events out of main RM 
dispatcher doesn't help. This approach only helps when main dispatcher is 
bottleneck.

And a parallel thing we can do is to optimize number of preemption event. 
Currently, if a container sits in to-preempt list, before it is get preempted, 
one event will be sent to scheduler for every few seconds, we can reduce 
frequency of this event.

 Preemption processing occuring on the main RM dispatcher
 

 Key: YARN-3508
 URL: https://issues.apache.org/jira/browse/YARN-3508
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Varun Saxena
 Attachments: YARN-3508.002.patch, YARN-3508.01.patch


 We recently saw the RM for a large cluster lag far behind on the 
 AsyncDispacher event queue.  The AsyncDispatcher thread was consistently 
 blocked on the highly-contended CapacityScheduler lock trying to dispatch 
 preemption-related events for RMContainerPreemptEventDispatcher.  Preemption 
 processing should occur on the scheduler event dispatcher thread or a 
 separate thread to avoid delaying the processing of other events in the 
 primary dispatcher queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-06-04 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573670#comment-14573670
 ] 

Eric Payne commented on YARN-3769:
--

[~leftnoteasy]
bq. If you think it's fine, could I take a shot at it?
It sounds like it would work. It's fine with me if you want to work on that.

 Preemption occurring unnecessarily because preemption doesn't consider user 
 limit
 -

 Key: YARN-3769
 URL: https://issues.apache.org/jira/browse/YARN-3769
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0, 2.7.0, 2.8.0
Reporter: Eric Payne
Assignee: Eric Payne

 We are seeing the preemption monitor preempting containers from queue A and 
 then seeing the capacity scheduler giving them immediately back to queue A. 
 This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor

2015-06-04 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573724#comment-14573724
 ] 

zhihai xu commented on YARN-3745:
-

[~lavkesh], thanks for working on this issue. This looks like a good catch.
One question about the patch, why retrying on SecurityException? Will retrying 
on NoSuchMethodException be enough?
If need retrying on SecurityException, Can we add a test case against it?
There is a typo in the comment {{This does not has constructor with String 
argument}}, should be {{have}} instead of {{has}}.
Also could we make the comment {{Try with String constructor if it fails try 
with default.}} clearer as
{{Try constructor with String argument, if it fails, try default.}}
Can we add some comment to explain why ClassNotFoundException is expected in 
the test?


 SerializedException should also try to instantiate internal exception with 
 the default constructor
 --

 Key: YARN-3745
 URL: https://issues.apache.org/jira/browse/YARN-3745
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir
 Attachments: YARN-3745.1.patch, YARN-3745.patch


 While deserialising a SerializedException it tries to create internal 
 exception in instantiateException() with cn = 
 cls.getConstructor(String.class).
 if cls does not has a constructor with String parameter it throws 
 Nosuchmethodexception
 for example ClosedChannelException class.  
 We should also try to instantiate exception with default constructor so that 
 inner exception can to propagated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3017) ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID

2015-06-04 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573593#comment-14573593
 ] 

zhihai xu commented on YARN-3017:
-

Hi [~rohithsharma], thanks for the information.
Sorry, I am not familiar with rolling upgrade, Could you give a little more 
detail about the possibility to break the rolling upgrade?
But I saw the ContainerId format is changed by YARN-2562 at 2.6.0 release eight 
months ago, Compared to the change at YARN-2562, this patch is minor. Because 
it only changes function {{ContainerId#toString}}, the current function 
{{ContainerId##fromString}} supports both the current container string format 
and the new container string format.
CC [~ozawa] for the impact of ContainerId format change.

 ContainerID in ResourceManager Log Has Slightly Different Format From 
 AppAttemptID
 --

 Key: YARN-3017
 URL: https://issues.apache.org/jira/browse/YARN-3017
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.8.0
Reporter: MUFEED USMAN
Priority: Minor
  Labels: PatchAvailable
 Attachments: YARN-3017.patch, YARN-3017_1.patch, YARN-3017_2.patch


 Not sure if this should be filed as a bug or not.
 In the ResourceManager log in the events surrounding the creation of a new
 application attempt,
 ...
 ...
 2014-11-14 17:45:37,258 INFO
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching
 masterappattempt_1412150883650_0001_02
 ...
 ...
 The application attempt has the ID format _1412150883650_0001_02.
 Whereas the associated ContainerID goes by _1412150883650_0001_02_.
 ...
 ...
 2014-11-14 17:45:37,260 INFO
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting 
 up
 container Container: [ContainerId: container_1412150883650_0001_02_01,
 NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource: memory:2048, 
 vCores:1,
 disks:0.0, Priority: 0, Token: Token { kind: ContainerToken, service:
 10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_02
 ...
 ...
 Curious to know if this is kept like that for a reason. If not while using
 filtering tools to, say, grep events surrounding a specific attempt by the
 numeric ID part information may slip out during troubleshooting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-06-04 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573667#comment-14573667
 ] 

Wangda Tan commented on YARN-3769:
--

[~eepayne], Exactly.

 Preemption occurring unnecessarily because preemption doesn't consider user 
 limit
 -

 Key: YARN-3769
 URL: https://issues.apache.org/jira/browse/YARN-3769
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0, 2.7.0, 2.8.0
Reporter: Eric Payne
Assignee: Eric Payne

 We are seeing the preemption monitor preempting containers from queue A and 
 then seeing the capacity scheduler giving them immediately back to queue A. 
 This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor

2015-06-04 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573745#comment-14573745
 ] 

zhihai xu commented on YARN-3745:
-

Sorry, there's one more thing I forgot to mention, Can we rename 
{{initExceptionWithConstructor}} to instantiateExceptionImpl?

 SerializedException should also try to instantiate internal exception with 
 the default constructor
 --

 Key: YARN-3745
 URL: https://issues.apache.org/jira/browse/YARN-3745
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir
 Attachments: YARN-3745.1.patch, YARN-3745.patch


 While deserialising a SerializedException it tries to create internal 
 exception in instantiateException() with cn = 
 cls.getConstructor(String.class).
 if cls does not has a constructor with String parameter it throws 
 Nosuchmethodexception
 for example ClosedChannelException class.  
 We should also try to instantiate exception with default constructor so that 
 inner exception can to propagated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-06-04 Thread Eric Payne (JIRA)
Eric Payne created YARN-3769:


 Summary: Preemption occurring unnecessarily because preemption 
doesn't consider user limit
 Key: YARN-3769
 URL: https://issues.apache.org/jira/browse/YARN-3769
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.7.0, 2.6.0, 2.8.0
Reporter: Eric Payne
Assignee: Eric Payne


We are seeing the preemption monitor preempting containers from queue A and 
then seeing the capacity scheduler giving them immediately back to queue A. 
This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3766) ATS Web UI breaks because of YARN-3467

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573659#comment-14573659
 ] 

Hudson commented on YARN-3766:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7971 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7971/])
YARN-3766. Fixed the apps table column error of generic history web UI. 
Contributed by Xuan Gong. (zjshen: rev 18dd01d6bf67f4d522b947454c1f4347d1cbbc19)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSView.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java


 ATS Web UI breaks because of YARN-3467
 --

 Key: YARN-3766
 URL: https://issues.apache.org/jira/browse/YARN-3766
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Affects Versions: 2.8.0
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.8.0

 Attachments: ATSWebPageBreaks.png, YARN-3766.1.patch


 The ATS web UI breaks because of the following changes made in YARN-3467.
 {code}
 +++ 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java
 @@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs(
.append(, 'mRender': renderHadoopDate })
.append(\n, {'sType':'numeric', bSearchable:false, 'aTargets':);
  if (isFairSchedulerPage) {
 -  sb.append([11]);
 +  sb.append([13]);
  } else if (isResourceManager) {
 -  sb.append([10]);
 +  sb.append([12]);
  } else {
sb.append([9]);
  }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-06-04 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573619#comment-14573619
 ] 

Eric Payne commented on YARN-3769:
--

The following configuration will cause this:

|| queue || capacity || max || pending || used || user limit
| root | 100 | 100 | 40 | 90 | N/A |
| A | 10 | 100 | 20 | 70 | 70 |
| B | 10 | 100 | 20 | 20 | 20 |

One app is running in each queue. Both apps are asking for more resources, but 
they have each reached their user limit, so even though both are asking for 
more and there are resources available, no more resources are allocated to 
either app.

The preemption monitor will see that {{B}} is asking for a lot more resources, 
and it will see that {{B}} is more underserved than {{A}}, so the preemption 
monitor will try to make the queues balance by preempting resources (10, for 
example) from {{A}}.

|| queue || capacity || max || pending || used || user limit
| root | 100 | 100 | 50 | 80 | N/A |
| A | 10 | 100 | 30 | 60 | 70 |
| B | 10 | 100 | 20 | 20 | 20 |

However, when the capacity scheduler tries to give that container to the app in 
{{B}}, the app will recognize that it has no headroom, and refuse the 
container. So the capacity scheduler offers the container again to the app in 
{{A}}, which accepts it because it has headroom now, and the process starts 
over again.

Note that this happens even when used cluster resources are below 100% because 
the used + pending for the cluster would put it above 100%.

 Preemption occurring unnecessarily because preemption doesn't consider user 
 limit
 -

 Key: YARN-3769
 URL: https://issues.apache.org/jira/browse/YARN-3769
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0, 2.7.0, 2.8.0
Reporter: Eric Payne
Assignee: Eric Payne

 We are seeing the preemption monitor preempting containers from queue A and 
 then seeing the capacity scheduler giving them immediately back to queue A. 
 This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3766) ATS Web UI breaks because of YARN-3467

2015-06-04 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573633#comment-14573633
 ] 

Zhijie Shen commented on YARN-3766:
---

Patch looks good. Tried it locally and the web UI has been fixed. Will commit 
it.

 ATS Web UI breaks because of YARN-3467
 --

 Key: YARN-3766
 URL: https://issues.apache.org/jira/browse/YARN-3766
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Affects Versions: 2.8.0
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Attachments: ATSWebPageBreaks.png, YARN-3766.1.patch


 The ATS web UI breaks because of the following changes made in YARN-3467.
 {code}
 +++ 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java
 @@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs(
.append(, 'mRender': renderHadoopDate })
.append(\n, {'sType':'numeric', bSearchable:false, 'aTargets':);
  if (isFairSchedulerPage) {
 -  sb.append([11]);
 +  sb.append([13]);
  } else if (isResourceManager) {
 -  sb.append([10]);
 +  sb.append([12]);
  } else {
sb.append([9]);
  }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-06-04 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573638#comment-14573638
 ] 

Wangda Tan commented on YARN-3769:
--

[~eepayne],
This is a very interesting problem, actually not only user-limit causes it.

For example, fair ordering (YARN-3306), hard locality requirements (I want 
resources from rackA and nodeX only), AM resource limit; In the near future we 
can have constraints (YARN-3409), all can lead to resource is preempted from 
one queue, but the other queue cannot use it because of specific resource 
requirement and limits.

One thing I've thought for a while is adding a lazy preemption mechanism, 
which is: when a container is marked preempted and wait for 
max_wait_before_time, it becomes a can_be_killed container. If there's 
another queue can allocate on a node with can_be_killed container, such 
container will be killed immediately to make room the new containers.

This mechanism can make preemption policy doesn't need to consider complex 
resource requirements and limits inside a queue, and also it can avoid kill 
unnecessary containers.

If you think it's fine, could I take a shot at it?

Thoughts? [~vinodkv].

 Preemption occurring unnecessarily because preemption doesn't consider user 
 limit
 -

 Key: YARN-3769
 URL: https://issues.apache.org/jira/browse/YARN-3769
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0, 2.7.0, 2.8.0
Reporter: Eric Payne
Assignee: Eric Payne

 We are seeing the preemption monitor preempting containers from queue A and 
 then seeing the capacity scheduler giving them immediately back to queue A. 
 This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing

2015-06-04 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573754#comment-14573754
 ] 

Karthik Kambatla commented on YARN-3453:


Few comments:
# New imports in FairScheduler and FSLeafQueue are not required.
# Looking at the remaining uses of DefaultResourceCalculator in FairScheduler, 
we could benefit from updating all of them to DominantResourceCalculator? 
[~ashwinshankar77] - do you concur? 
# In FairScheduler, changing the scope of RESOURCE_CALCULATOR and 
DOMINANT_RESOURCE_CALCULATOR is not required.
# We should add unit-tests to avoid regressions in the future. 
# Nit: In each of the policies, my preference would be not make the calculator 
and comparator members static unless required. We have had cases where our 
tests would invoke multiple instances of the class leading to issues. Not that 
I foresee multiple instantiations for these classes, but would like to avoid it 
if we can.

 Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator 
 even in DRF mode causing thrashing
 

 Key: YARN-3453
 URL: https://issues.apache.org/jira/browse/YARN-3453
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Ashwin Shankar
Assignee: Arun Suresh
 Attachments: YARN-3453.1.patch, YARN-3453.2.patch


 There are two places in preemption code flow where DefaultResourceCalculator 
 is used, even in DRF mode.
 Which basically results in more resources getting preempted than needed, and 
 those extra preempted containers aren’t even getting to the “starved” queue 
 since scheduling logic is based on DRF's Calculator.
 Following are the two places :
 1. {code:title=FSLeafQueue.java|borderStyle=solid}
 private boolean isStarved(Resource share)
 {code}
 A queue shouldn’t be marked as “starved” if the dominant resource usage
 is =  fair/minshare.
 2. {code:title=FairScheduler.java|borderStyle=solid}
 protected Resource resToPreempt(FSLeafQueue sched, long curTime)
 {code}
 --
 One more thing that I believe needs to change in DRF mode is : during a 
 preemption round,if preempting a few containers results in satisfying needs 
 of a resource type, then we should exit that preemption round, since the 
 containers that we just preempted should bring the dominant resource usage to 
 min/fair share.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3768) Index out of range exception with environment variables without values

2015-06-04 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573539#comment-14573539
 ] 

zhihai xu commented on YARN-3768:
-

Hi [~joeferner], That is a good find. I can see the change at MAPREDUCE-5965 
may trigger this bug. I can take up this issue if you don't mind. thanks for 
reporting this issue.

 Index out of range exception with environment variables without values
 --

 Key: YARN-3768
 URL: https://issues.apache.org/jira/browse/YARN-3768
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.5.0
Reporter: Joe Ferner

 Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range 
 exception occurs if an environment variable is encountered without a value.
 I believe this occurs because java will not return empty strings from the 
 split method. Similar to this 
 http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3706) Generalize native HBase writer for additional tables

2015-06-04 Thread Joep Rottinghuis (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joep Rottinghuis updated YARN-3706:
---
Attachment: YARN-3726-YARN-2928.005.patch

Uploading YARN-3726-YARN-2928.005.patch

Added proper encoding and decoding of column names and values where a splitter 
is used. We now also encode spaces in the column names, and properly decode 
them on the way out.

Fixed TestHBaseTimelineWriterImpl to confirm that configs now properly work as 
well.
Still need to add reading of metrics, fix a unit test for join (with null as 
separator) of the older join method, and add a entity reader that creates an 
entire entity object from a scan result.

 Generalize native HBase writer for additional tables
 

 Key: YARN-3706
 URL: https://issues.apache.org/jira/browse/YARN-3706
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Joep Rottinghuis
Assignee: Joep Rottinghuis
Priority: Minor
 Attachments: YARN-3706-YARN-2928.001.patch, 
 YARN-3726-YARN-2928.002.patch, YARN-3726-YARN-2928.003.patch, 
 YARN-3726-YARN-2928.004.patch, YARN-3726-YARN-2928.005.patch


 When reviewing YARN-3411 we noticed that we could change the class hierarchy 
 a little in order to accommodate additional tables easily.
 In order to get ready for benchmark testing we left the original layout in 
 place, as performance would not be impacted by the code hierarchy.
 Here is a separate jira to address the hierarchy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-06-04 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573664#comment-14573664
 ] 

Eric Payne commented on YARN-3769:
--

[~leftnoteasy],
{quote}
One thing I've thought for a while is adding a lazy preemption mechanism, 
which is: when a container is marked preempted and wait for 
max_wait_before_time, it becomes a can_be_killed container. If there's 
another queue can allocate on a node with can_be_killed container, such 
container will be killed immediately to make room the new containers.
{quote}
IIUC, in your proposal, the preemption monitor would mark the containers as 
preemptable, and then after some configurable wait period, the capacity 
scheduler would be the one to do the killing if it finds that it needs the 
resources on that node. Is my understanding correct?

 Preemption occurring unnecessarily because preemption doesn't consider user 
 limit
 -

 Key: YARN-3769
 URL: https://issues.apache.org/jira/browse/YARN-3769
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0, 2.7.0, 2.8.0
Reporter: Eric Payne
Assignee: Eric Payne

 We are seeing the preemption monitor preempting containers from queue A and 
 then seeing the capacity scheduler giving them immediately back to queue A. 
 This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3733) Fix DominantRC#compare() does not work as expected if cluster resource is empty

2015-06-04 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573182#comment-14573182
 ] 

Wangda Tan commented on YARN-3733:
--

Great! Committing...

 Fix DominantRC#compare() does not work as expected if cluster resource is 
 empty
 ---

 Key: YARN-3733
 URL: https://issues.apache.org/jira/browse/YARN-3733
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3 , 2 NM , 2 RM
 one NM - 3 GB 6 v core
Reporter: Bibin A Chundatt
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, 
 0002-YARN-3733.patch, YARN-3733.patch


 Steps to reproduce
 =
 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster)
 2. Configure map and reduce size to 512 MB  after changing scheduler minimum 
 size to 512 MB
 3. Configure capacity scheduler and AM limit to .5 
 (DominantResourceCalculator is configured)
 4. Submit 30 concurrent task 
 5. Switch RM
 Actual
 =
 For 12 Jobs AM gets allocated and all 12 starts running
 No other Yarn child is initiated , *all 12 Jobs in Running state for ever*
 Expected
 ===
 Only 6 should be running at a time since max AM allocated is .5 (3072 MB)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Moved] (YARN-3767) Yarn Scheduler Load Simulator does not work

2015-06-04 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena moved HADOOP-12062 to YARN-3767:
-

  Component/s: (was: tools)
Affects Version/s: (was: 2.7.0)
   2.7.0
  Key: YARN-3767  (was: HADOOP-12062)
  Project: Hadoop YARN  (was: Hadoop Common)

 Yarn Scheduler Load Simulator does not work
 ---

 Key: YARN-3767
 URL: https://issues.apache.org/jira/browse/YARN-3767
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
 Environment: OS X 10.10.  JDK 1.7
Reporter: David Kjerrumgaard
Assignee: Varun Saxena

 Running the SLS, as per the instructions on the web results in a 
 NullPointerException being thrown.
 Steps followed to create error:
 1) Download Apache Hadoop 2.7.0 tarball from Apache site
 2) Untar 2.7.0 tarball into /opt directory
 3) Execute the following command: 
 /opt/hadoop-2.7.0/share/hadoop/tools/sls//bin/slsrun.sh 
 --input-rumen=/opt/hadoop-2.7.0/share/hadoop/tools/sls/sample-data/2jobs2min-rumen-jh.json
  --output-dir=/tmp
 Results in the following error:
 15/06/04 10:25:41 INFO rmnode.RMNodeImpl: a2118.smile.com:2 Node Transitioned 
 from NEW to RUNNING
 15/06/04 10:25:41 INFO capacity.CapacityScheduler: Added node 
 a2118.smile.com:2 clusterResource: memory:30720, vCores:30
 15/06/04 10:25:41 INFO util.RackResolver: Resolved a2115.smile.com to 
 /default-rack
 15/06/04 10:25:41 INFO resourcemanager.ResourceTrackerService: NodeManager 
 from node a2115.smile.com(cmPort: 3 httpPort: 80) registered with capability: 
 memory:10240, vCores:10, assigned nodeId a2115.smile.com:3
 15/06/04 10:25:41 INFO rmnode.RMNodeImpl: a2115.smile.com:3 Node Transitioned 
 from NEW to RUNNING
 15/06/04 10:25:41 INFO capacity.CapacityScheduler: Added node 
 a2115.smile.com:3 clusterResource: memory:40960, vCores:40
 Exception in thread main java.lang.RuntimeException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134)
   at 
 org.apache.hadoop.yarn.sls.SLSRunner.startAMFromRumenTraces(SLSRunner.java:398)
   at org.apache.hadoop.yarn.sls.SLSRunner.startAM(SLSRunner.java:250)
   at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:145)
   at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:528)
 Caused by: java.lang.NullPointerException
   at 
 java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
   at 
 java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:126)
   ... 4 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2513) Host framework UIs in YARN for use with the ATS

2015-06-04 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2513:
--
Target Version/s: 2.8.0

 Host framework UIs in YARN for use with the ATS
 ---

 Key: YARN-2513
 URL: https://issues.apache.org/jira/browse/YARN-2513
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-2513-v1.patch, YARN-2513-v2.patch, 
 YARN-2513.v3.patch


 Allow for pluggable UIs as described by TEZ-8. Yarn can provide the 
 infrastructure to host java script and possible java UIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2513) Host framework UIs in YARN for use with the ATS

2015-06-04 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573288#comment-14573288
 ] 

Zhijie Shen commented on YARN-2513:
---

As it's valuable to some existing ATS use case, let's try to get the patch in 
and target 2.8.

[~jeagles], three comments about the patch:

1. Shall we add yarn.timeline-service.ui-names to yarn-default.xml too? Like 
yarn.nodemanager.aux-services?

2. Can we add some text in TimelineServer.md to document the configs and 
introduce how to install framework UIs.

3. Can we add a test case to validate and showcase that ATS can load a 
framework UIs (e.g., a single helloworld.html)?

 Host framework UIs in YARN for use with the ATS
 ---

 Key: YARN-2513
 URL: https://issues.apache.org/jira/browse/YARN-2513
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-2513-v1.patch, YARN-2513-v2.patch, 
 YARN-2513.v3.patch


 Allow for pluggable UIs as described by TEZ-8. Yarn can provide the 
 infrastructure to host java script and possible java UIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3767) Yarn Scheduler Load Simulator does not work

2015-06-04 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573269#comment-14573269
 ] 

Varun Saxena commented on YARN-3767:


This belongs to YARN.

 Yarn Scheduler Load Simulator does not work
 ---

 Key: YARN-3767
 URL: https://issues.apache.org/jira/browse/YARN-3767
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
 Environment: OS X 10.10.  JDK 1.7
Reporter: David Kjerrumgaard
Assignee: Varun Saxena

 Running the SLS, as per the instructions on the web results in a 
 NullPointerException being thrown.
 Steps followed to create error:
 1) Download Apache Hadoop 2.7.0 tarball from Apache site
 2) Untar 2.7.0 tarball into /opt directory
 3) Execute the following command: 
 /opt/hadoop-2.7.0/share/hadoop/tools/sls//bin/slsrun.sh 
 --input-rumen=/opt/hadoop-2.7.0/share/hadoop/tools/sls/sample-data/2jobs2min-rumen-jh.json
  --output-dir=/tmp
 Results in the following error:
 15/06/04 10:25:41 INFO rmnode.RMNodeImpl: a2118.smile.com:2 Node Transitioned 
 from NEW to RUNNING
 15/06/04 10:25:41 INFO capacity.CapacityScheduler: Added node 
 a2118.smile.com:2 clusterResource: memory:30720, vCores:30
 15/06/04 10:25:41 INFO util.RackResolver: Resolved a2115.smile.com to 
 /default-rack
 15/06/04 10:25:41 INFO resourcemanager.ResourceTrackerService: NodeManager 
 from node a2115.smile.com(cmPort: 3 httpPort: 80) registered with capability: 
 memory:10240, vCores:10, assigned nodeId a2115.smile.com:3
 15/06/04 10:25:41 INFO rmnode.RMNodeImpl: a2115.smile.com:3 Node Transitioned 
 from NEW to RUNNING
 15/06/04 10:25:41 INFO capacity.CapacityScheduler: Added node 
 a2115.smile.com:3 clusterResource: memory:40960, vCores:40
 Exception in thread main java.lang.RuntimeException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134)
   at 
 org.apache.hadoop.yarn.sls.SLSRunner.startAMFromRumenTraces(SLSRunner.java:398)
   at org.apache.hadoop.yarn.sls.SLSRunner.startAM(SLSRunner.java:250)
   at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:145)
   at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:528)
 Caused by: java.lang.NullPointerException
   at 
 java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
   at 
 java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:126)
   ... 4 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3767) Yarn Scheduler Load Simulator does not work

2015-06-04 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573273#comment-14573273
 ] 

Varun Saxena commented on YARN-3767:


Yes it will work if you copy {{sls-runner.xml}} to {{etc/hadoop}}. This is 
mentioned in documentation as well. Refer to : 
http://hadoop.apache.org/docs/r2.4.1/hadoop-sls/SchedulerLoadSimulator.html#Step_1:_Configure_Hadoop_and_the_simulator

It mentions Before we start, make sure Hadoop and the simulator are configured 
well. All configuration files for Hadoop and the simulator should be placed in 
directory $HADOOP_ROOT/etc/hadoop, where the ResourceManager and Yarn scheduler 
load their configurations. Directory 
$HADOOP_ROOT/share/hadoop/tools/sls/sample-conf/ provides several example 
configurations, that can be used to start a demo.

 Yarn Scheduler Load Simulator does not work
 ---

 Key: YARN-3767
 URL: https://issues.apache.org/jira/browse/YARN-3767
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
 Environment: OS X 10.10.  JDK 1.7
Reporter: David Kjerrumgaard
Assignee: Varun Saxena

 Running the SLS, as per the instructions on the web results in a 
 NullPointerException being thrown.
 Steps followed to create error:
 1) Download Apache Hadoop 2.7.0 tarball from Apache site
 2) Untar 2.7.0 tarball into /opt directory
 3) Execute the following command: 
 /opt/hadoop-2.7.0/share/hadoop/tools/sls//bin/slsrun.sh 
 --input-rumen=/opt/hadoop-2.7.0/share/hadoop/tools/sls/sample-data/2jobs2min-rumen-jh.json
  --output-dir=/tmp
 Results in the following error:
 15/06/04 10:25:41 INFO rmnode.RMNodeImpl: a2118.smile.com:2 Node Transitioned 
 from NEW to RUNNING
 15/06/04 10:25:41 INFO capacity.CapacityScheduler: Added node 
 a2118.smile.com:2 clusterResource: memory:30720, vCores:30
 15/06/04 10:25:41 INFO util.RackResolver: Resolved a2115.smile.com to 
 /default-rack
 15/06/04 10:25:41 INFO resourcemanager.ResourceTrackerService: NodeManager 
 from node a2115.smile.com(cmPort: 3 httpPort: 80) registered with capability: 
 memory:10240, vCores:10, assigned nodeId a2115.smile.com:3
 15/06/04 10:25:41 INFO rmnode.RMNodeImpl: a2115.smile.com:3 Node Transitioned 
 from NEW to RUNNING
 15/06/04 10:25:41 INFO capacity.CapacityScheduler: Added node 
 a2115.smile.com:3 clusterResource: memory:40960, vCores:40
 Exception in thread main java.lang.RuntimeException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134)
   at 
 org.apache.hadoop.yarn.sls.SLSRunner.startAMFromRumenTraces(SLSRunner.java:398)
   at org.apache.hadoop.yarn.sls.SLSRunner.startAM(SLSRunner.java:250)
   at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:145)
   at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:528)
 Caused by: java.lang.NullPointerException
   at 
 java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
   at 
 java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:126)
   ... 4 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3698) Make task attempt log files accessible from webapps

2015-06-04 Thread Sreenath Somarajapuram (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sreenath Somarajapuram updated YARN-3698:
-
Description: 
Currently we don't have direct access to an attempt's log file from web apps. 
The only available option is through jobhistory, and that provides an HTML view 
of the log.

Requirements:
# A link to access the raw log file.
# A variant of the link with the following headers set, this enables direct 
download of the file across all browsers.
Content-Disposition: attachment; filename=attempt-id.log
Content-Type of text/plain
# Node manager redirects an attempt syslog view to the container view. Hence we 
are not able to view the logs of a specific attempt.
Before redirection: 
http://sandbox.hortonworks.com:8042/node/containerlogs/container_1432048982252_0004_01_02/root/syslog_attempt_1432048982252_0004_1_02_00_0
After redirection: 
http://sandbox.hortonworks.com:19888/jobhistory/logs/sandbox.hortonworks.com:45454/container_1432048982252_0004_01_02/container_1432048982252_0004_01_02/root

  was:
Currently we don't have direct access to an attempt's log file from web apps. 
The only available option is through jobhistory, and that provides an HTML view 
of the log.

Requirements:
# A link to access the raw log file.
# A variant of the link with the following headers set, this enables direct 
download of the file across all browsers.
Content-Disposition: attachment; filename=attempt-id.log
Content-Type of text/plain


 Make task attempt log files accessible from webapps
 ---

 Key: YARN-3698
 URL: https://issues.apache.org/jira/browse/YARN-3698
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Sreenath Somarajapuram

 Currently we don't have direct access to an attempt's log file from web apps. 
 The only available option is through jobhistory, and that provides an HTML 
 view of the log.
 Requirements:
 # A link to access the raw log file.
 # A variant of the link with the following headers set, this enables direct 
 download of the file across all browsers.
 Content-Disposition: attachment; filename=attempt-id.log
 Content-Type of text/plain
 # Node manager redirects an attempt syslog view to the container view. Hence 
 we are not able to view the logs of a specific attempt.
 Before redirection: 
 http://sandbox.hortonworks.com:8042/node/containerlogs/container_1432048982252_0004_01_02/root/syslog_attempt_1432048982252_0004_1_02_00_0
 After redirection: 
 http://sandbox.hortonworks.com:19888/jobhistory/logs/sandbox.hortonworks.com:45454/container_1432048982252_0004_01_02/container_1432048982252_0004_01_02/root



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3767) Yarn Scheduler Load Simulator does not work

2015-06-04 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3767:
---
Assignee: (was: Varun Saxena)

 Yarn Scheduler Load Simulator does not work
 ---

 Key: YARN-3767
 URL: https://issues.apache.org/jira/browse/YARN-3767
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
 Environment: OS X 10.10.  JDK 1.7
Reporter: David Kjerrumgaard

 Running the SLS, as per the instructions on the web results in a 
 NullPointerException being thrown.
 Steps followed to create error:
 1) Download Apache Hadoop 2.7.0 tarball from Apache site
 2) Untar 2.7.0 tarball into /opt directory
 3) Execute the following command: 
 /opt/hadoop-2.7.0/share/hadoop/tools/sls//bin/slsrun.sh 
 --input-rumen=/opt/hadoop-2.7.0/share/hadoop/tools/sls/sample-data/2jobs2min-rumen-jh.json
  --output-dir=/tmp
 Results in the following error:
 15/06/04 10:25:41 INFO rmnode.RMNodeImpl: a2118.smile.com:2 Node Transitioned 
 from NEW to RUNNING
 15/06/04 10:25:41 INFO capacity.CapacityScheduler: Added node 
 a2118.smile.com:2 clusterResource: memory:30720, vCores:30
 15/06/04 10:25:41 INFO util.RackResolver: Resolved a2115.smile.com to 
 /default-rack
 15/06/04 10:25:41 INFO resourcemanager.ResourceTrackerService: NodeManager 
 from node a2115.smile.com(cmPort: 3 httpPort: 80) registered with capability: 
 memory:10240, vCores:10, assigned nodeId a2115.smile.com:3
 15/06/04 10:25:41 INFO rmnode.RMNodeImpl: a2115.smile.com:3 Node Transitioned 
 from NEW to RUNNING
 15/06/04 10:25:41 INFO capacity.CapacityScheduler: Added node 
 a2115.smile.com:3 clusterResource: memory:40960, vCores:40
 Exception in thread main java.lang.RuntimeException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134)
   at 
 org.apache.hadoop.yarn.sls.SLSRunner.startAMFromRumenTraces(SLSRunner.java:398)
   at org.apache.hadoop.yarn.sls.SLSRunner.startAM(SLSRunner.java:250)
   at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:145)
   at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:528)
 Caused by: java.lang.NullPointerException
   at 
 java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
   at 
 java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:126)
   ... 4 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3733) Fix DominantRC#compare() does not work as expected if cluster resource is empty

2015-06-04 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3733:
-
Summary: Fix DominantRC#compare() does not work as expected if cluster 
resource is empty  (was: DominantRC#compare() does not work as expected if 
cluster resource is empty)

 Fix DominantRC#compare() does not work as expected if cluster resource is 
 empty
 ---

 Key: YARN-3733
 URL: https://issues.apache.org/jira/browse/YARN-3733
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3 , 2 NM , 2 RM
 one NM - 3 GB 6 v core
Reporter: Bibin A Chundatt
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, 
 0002-YARN-3733.patch, YARN-3733.patch


 Steps to reproduce
 =
 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster)
 2. Configure map and reduce size to 512 MB  after changing scheduler minimum 
 size to 512 MB
 3. Configure capacity scheduler and AM limit to .5 
 (DominantResourceCalculator is configured)
 4. Submit 30 concurrent task 
 5. Switch RM
 Actual
 =
 For 12 Jobs AM gets allocated and all 12 starts running
 No other Yarn child is initiated , *all 12 Jobs in Running state for ever*
 Expected
 ===
 Only 6 should be running at a time since max AM allocated is .5 (3072 MB)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3733) Fix DominantRC#compare() does not work as expected if cluster resource is empty

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573202#comment-14573202
 ] 

Hudson commented on YARN-3733:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7965 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7965/])
YARN-3733. Fix DominantRC#compare() does not work as expected if cluster 
resource is empty. (Rohith Sharmaks via wangda) (wangda: rev 
ebd797c48fe236b404cf3a125ac9d1f7714e291e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/DominantResourceCalculator.java


 Fix DominantRC#compare() does not work as expected if cluster resource is 
 empty
 ---

 Key: YARN-3733
 URL: https://issues.apache.org/jira/browse/YARN-3733
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3 , 2 NM , 2 RM
 one NM - 3 GB 6 v core
Reporter: Bibin A Chundatt
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, 
 0002-YARN-3733.patch, YARN-3733.patch


 Steps to reproduce
 =
 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster)
 2. Configure map and reduce size to 512 MB  after changing scheduler minimum 
 size to 512 MB
 3. Configure capacity scheduler and AM limit to .5 
 (DominantResourceCalculator is configured)
 4. Submit 30 concurrent task 
 5. Switch RM
 Actual
 =
 For 12 Jobs AM gets allocated and all 12 starts running
 No other Yarn child is initiated , *all 12 Jobs in Running state for ever*
 Expected
 ===
 Only 6 should be running at a time since max AM allocated is .5 (3072 MB)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2392) add more diags about app retry limits on AM failures

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573325#comment-14573325
 ] 

Hudson commented on YARN-2392:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #7968 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7968/])
YARN-2392. Add more diags about app retry limits on AM failures. Contributed by 
Steve Loughran (jianhe: rev 1970ca7cbcdb7efa160d0cedc2e3e22c1401fad6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* hadoop-yarn-project/CHANGES.txt


 add more diags about app retry limits on AM failures
 

 Key: YARN-2392
 URL: https://issues.apache.org/jira/browse/YARN-2392
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-2392-001.patch, YARN-2392-002.patch, 
 YARN-2392-002.patch


 # when an app fails the failure count is shown, but not what the global + 
 local limits are. If the two are different, they should both be printed. 
 # the YARN-2242 strings don't have enough whitespace between text and the URL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3764) CapacityScheduler should forbid moving LeafQueue from one parent to another

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573276#comment-14573276
 ] 

Hudson commented on YARN-3764:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7966 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7966/])
YARN-3764. CapacityScheduler should forbid moving LeafQueue from one parent to 
another. Contributed by Wangda Tan (jianhe: rev 
6ad4e59cfc111a92747fdb1fb99cc6378044832a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java


 CapacityScheduler should forbid moving LeafQueue from one parent to another
 ---

 Key: YARN-3764
 URL: https://issues.apache.org/jira/browse/YARN-3764
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Blocker
 Fix For: 2.7.1

 Attachments: YARN-3764.1.patch


 Currently CapacityScheduler doesn't handle the case well, for example:
 A queue structure:
 {code}
 root
   |
   a (100)
 /   \
x y
   (50)   (50)
 {code}
 And reinitialize using following structure:
 {code}
  root
  /   \ 
 (50)a x (50)
 |
 y
(100)
 {code}
 The actual queue structure after reinitialize is:
 {code}
  root
 /\
a (50) x (50)
   /  \
  xy
 (50)  (100)
 {code}
 We should forbid admin doing that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2392) add more diags about app retry limits on AM failures

2015-06-04 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573293#comment-14573293
 ] 

Jian He commented on YARN-2392:
---

looks good, committing

 add more diags about app retry limits on AM failures
 

 Key: YARN-2392
 URL: https://issues.apache.org/jira/browse/YARN-2392
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
 Attachments: YARN-2392-001.patch, YARN-2392-002.patch, 
 YARN-2392-002.patch


 # when an app fails the failure count is shown, but not what the global + 
 local limits are. If the two are different, they should both be printed. 
 # the YARN-2242 strings don't have enough whitespace between text and the URL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3698) Make task attempt log files accessible from webapps correct node-manager redirection

2015-06-04 Thread Sreenath Somarajapuram (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sreenath Somarajapuram updated YARN-3698:
-
Summary: Make task attempt log files accessible from webapps  correct 
node-manager redirection  (was: Make task attempt log files accessible from 
webapps)

 Make task attempt log files accessible from webapps  correct node-manager 
 redirection
 --

 Key: YARN-3698
 URL: https://issues.apache.org/jira/browse/YARN-3698
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Sreenath Somarajapuram

 Currently we don't have direct access to an attempt's log file from web apps. 
 The only available option is through jobhistory, and that provides an HTML 
 view of the log.
 Requirements:
 # A link to access the raw log file.
 # A variant of the link with the following headers set, this enables direct 
 download of the file across all browsers.
 Content-Disposition: attachment; filename=attempt-id.log
 Content-Type of text/plain
 # Node manager redirects an attempt syslog view to the container view. Hence 
 we are not able to view the logs of a specific attempt.
 Before redirection: 
 http://sandbox.hortonworks.com:8042/node/containerlogs/container_1432048982252_0004_01_02/root/syslog_attempt_1432048982252_0004_1_02_00_0
 After redirection: 
 http://sandbox.hortonworks.com:19888/jobhistory/logs/sandbox.hortonworks.com:45454/container_1432048982252_0004_01_02/container_1432048982252_0004_01_02/root



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573013#comment-14573013
 ] 

Hudson commented on YARN-1462:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #216 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/216/])
Revert YARN-1462. Correct fix version from branch-2.7.1 to branch-2.8 in 
(zjshen: rev 4eec2fd132a7c3d100f2124b99ca8cd7befa27c7)
* hadoop-yarn-project/CHANGES.txt
Revert YARN-1462. Made RM write application tags to timeline server and 
exposed them to users via generic history web UI and REST API. Contributed by 
Xuan Gong. (zjshen: rev bc85959eddcb11037e8b9f0e06780b7c3e1cbab6)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestYARNRunner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestApplicatonReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/NotRunningJob.java


 AHS API and other AHS changes to handle tags for completed MR jobs
 --

 Key: YARN-1462
 URL: https://issues.apache.org/jira/browse/YARN-1462
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Xuan Gong
 Fix For: 2.8.0

 Attachments: YARN-1462-branch-2.7-1.2.patch, 
 YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, 
 YARN-1462.3.patch, YARN-1462.4.patch


 AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3749) We should make a copy of configuration when init MiniYARNCluster with multiple RMs

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573011#comment-14573011
 ] 

Hudson commented on YARN-3749:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #216 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/216/])
YARN-3749. We should make a copy of configuration when init (xgong: rev 
5766a04428f65bb008b5c451f6f09e61e1000300)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceProtocolOnHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/HATestUtil.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java


 We should make a copy of configuration when init MiniYARNCluster with 
 multiple RMs
 --

 Key: YARN-3749
 URL: https://issues.apache.org/jira/browse/YARN-3749
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chun Chen
Assignee: Chun Chen
 Fix For: 2.8.0

 Attachments: YARN-3749.2.patch, YARN-3749.3.patch, YARN-3749.4.patch, 
 YARN-3749.5.patch, YARN-3749.6.patch, YARN-3749.7.patch, YARN-3749.7.patch, 
 YARN-3749.patch


 When I was trying to write a test case for YARN-2674, I found DS client 
 trying to connect to both rm1 and rm2 with the same address 0.0.0.0:18032 
 when RM failover. But I initially set 
 yarn.resourcemanager.address.rm1=0.0.0.0:18032, 
 yarn.resourcemanager.address.rm2=0.0.0.0:28032  After digging, I found it is 
 in ClientRMService where the value of yarn.resourcemanager.address.rm2 
 changed to 0.0.0.0:18032. See the following code in ClientRMService:
 {code}
 clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST,
YarnConfiguration.RM_ADDRESS,

 YarnConfiguration.DEFAULT_RM_ADDRESS,
server.getListenerAddress());
 {code}
 Since we use the same instance of configuration in rm1 and rm2 and init both 
 RM before we start both RM, we will change yarn.resourcemanager.ha.id to rm2 
 during init of rm2 and yarn.resourcemanager.ha.id will become rm2 during 
 starting of rm1.
 So I think it is safe to make a copy of configuration when init both of the 
 rm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573017#comment-14573017
 ] 

Hudson commented on YARN-3585:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #216 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/216/])
YARN-3585. NodeManager cannot exit on SHUTDOWN event triggered and NM recovery 
is enabled. Contributed by Rohith Sharmaks (jlowe: rev 
e13b671aa510f553f4a6a232b4694b6a4cce88ae)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* hadoop-yarn-project/CHANGES.txt


 NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
 --

 Key: YARN-3585
 URL: https://issues.apache.org/jira/browse/YARN-3585
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Peng Zhang
Assignee: Rohith
Priority: Critical
 Fix For: 2.7.1

 Attachments: 0001-YARN-3585.patch, YARN-3585.patch


 With NM recovery enabled, after decommission, nodemanager log show stop but 
 process cannot end. 
 non daemon thread:
 {noformat}
 DestroyJavaVM prio=10 tid=0x7f3460011800 nid=0x29ec waiting on 
 condition [0x]
 leveldb prio=10 tid=0x7f3354001800 nid=0x2a97 runnable 
 [0x]
 VM Thread prio=10 tid=0x7f3460167000 nid=0x29f8 runnable 
 Gang worker#0 (Parallel GC Threads) prio=10 tid=0x7f346002 
 nid=0x29ed runnable 
 Gang worker#1 (Parallel GC Threads) prio=10 tid=0x7f3460022000 
 nid=0x29ee runnable 
 Gang worker#2 (Parallel GC Threads) prio=10 tid=0x7f3460024000 
 nid=0x29ef runnable 
 Gang worker#3 (Parallel GC Threads) prio=10 tid=0x7f3460025800 
 nid=0x29f0 runnable 
 Gang worker#4 (Parallel GC Threads) prio=10 tid=0x7f3460027800 
 nid=0x29f1 runnable 
 Gang worker#5 (Parallel GC Threads) prio=10 tid=0x7f3460029000 
 nid=0x29f2 runnable 
 Gang worker#6 (Parallel GC Threads) prio=10 tid=0x7f346002b000 
 nid=0x29f3 runnable 
 Gang worker#7 (Parallel GC Threads) prio=10 tid=0x7f346002d000 
 nid=0x29f4 runnable 
 Concurrent Mark-Sweep GC Thread prio=10 tid=0x7f3460120800 nid=0x29f7 
 runnable 
 Gang worker#0 (Parallel CMS Threads) prio=10 tid=0x7f346011c800 
 nid=0x29f5 runnable 
 Gang worker#1 (Parallel CMS Threads) prio=10 tid=0x7f346011e800 
 nid=0x29f6 runnable 
 VM Periodic Task Thread prio=10 tid=0x7f346019f800 nid=0x2a01 waiting 
 on condition 
 {noformat}
 and jni leveldb thread stack
 {noformat}
 Thread 12 (Thread 0x7f33dd842700 (LWP 10903)):
 #0  0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
 /lib64/libpthread.so.0
 #1  0x7f33dfce2a3b in leveldb::(anonymous 
 namespace)::PosixEnv::BGThreadWrapper(void*) () from 
 /tmp/libleveldbjni-64-1-6922178968300745716.8
 #2  0x003d83407851 in start_thread () from /lib64/libpthread.so.0
 #3  0x003d830e811d in clone () from /lib64/libc.so.6
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2573) Integrate ReservationSystem with the RM failover mechanism

2015-06-04 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2573:

Attachment: Design for Reservation HA.pdf

Attaching design for the umbrella jira 

 Integrate ReservationSystem with the RM failover mechanism
 --

 Key: YARN-2573
 URL: https://issues.apache.org/jira/browse/YARN-2573
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Subru Krishnan
Assignee: Subru Krishnan
 Attachments: Design for Reservation HA.pdf


 YARN-1051 introduces the ReservationSystem and the current implementation is 
 completely in-memory based. YARN-149 brings in the notion of RM HA with a 
 highly available state store. This JIRA proposes persisting the Plan into the 
 RMStateStore and recovering it post RM failover



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573006#comment-14573006
 ] 

Hudson commented on YARN-3762:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #216 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/216/])
YARN-3762. FairScheduler: CME on FSParentQueue#getQueueUserAclInfo. (kasha) 
(kasha: rev edb9cd0f7aa1ecaf34afaa120e3d79583e0ec689)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java


 FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
 ---

 Key: YARN-3762
 URL: https://issues.apache.org/jira/browse/YARN-3762
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 2.8.0

 Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch


 In our testing, we ran into the following ConcurrentModificationException:
 {noformat}
 halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0
 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, 
 queueName=root.testyarnpool3, queueCurrentCapacity=0.0, 
 queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0
 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client
 java.util.ConcurrentModificationException: 
 java.util.ConcurrentModificationException
   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
   at java.util.ArrayList$Itr.next(ArrayList.java:851)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573012#comment-14573012
 ] 

Hudson commented on YARN-41:


FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #216 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/216/])
YARN-41. The RM should handle the graceful shutdown of the NM. Contributed by 
Devaraj K. (junping_du: rev d7e7f6aa03c67b6a6ccf664adcb06d90bc963e58)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/ResourceTracker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClusterMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UnRegisterNodeManagerResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UnRegisterNodeManagerResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceTrackerPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/LocalRMInterface.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestResourceTrackerPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/ResourceTracker.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeState.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UnRegisterNodeManagerRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/client/ResourceTrackerPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeEventType.java
* 

[jira] [Commented] (YARN-3751) TestAHSWebServices fails after YARN-3467

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573015#comment-14573015
 ] 

Hudson commented on YARN-3751:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #216 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/216/])
YARN-3751. Fixed AppInfo to check if used resources are null. Contributed by 
Sunil G. (zjshen: rev dbc4f64937ea2b4c941a3ac49afc4eeba3f5b763)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java


 TestAHSWebServices fails after YARN-3467
 

 Key: YARN-3751
 URL: https://issues.apache.org/jira/browse/YARN-3751
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Sunil G
 Fix For: 2.8.0

 Attachments: 0001-YARN-3751.patch


 YARN-3467 changed AppInfo and assumed that used resource is not null. It's 
 not true as this information is not published to timeline server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573041#comment-14573041
 ] 

Hudson commented on YARN-1462:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2164 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2164/])
Revert YARN-1462. Correct fix version from branch-2.7.1 to branch-2.8 in 
(zjshen: rev 4eec2fd132a7c3d100f2124b99ca8cd7befa27c7)
* hadoop-yarn-project/CHANGES.txt
Revert YARN-1462. Made RM write application tags to timeline server and 
exposed them to users via generic history web UI and REST API. Contributed by 
Xuan Gong. (zjshen: rev bc85959eddcb11037e8b9f0e06780b7c3e1cbab6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/NotRunningJob.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestYARNRunner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestApplicatonReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java


 AHS API and other AHS changes to handle tags for completed MR jobs
 --

 Key: YARN-1462
 URL: https://issues.apache.org/jira/browse/YARN-1462
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Xuan Gong
 Fix For: 2.8.0

 Attachments: YARN-1462-branch-2.7-1.2.patch, 
 YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, 
 YARN-1462.3.patch, YARN-1462.4.patch


 AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573040#comment-14573040
 ] 

Hudson commented on YARN-41:


FAILURE: Integrated in Hadoop-Mapreduce-trunk #2164 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2164/])
YARN-41. The RM should handle the graceful shutdown of the NM. Contributed by 
Devaraj K. (junping_du: rev d7e7f6aa03c67b6a6ccf664adcb06d90bc963e58)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/ResourceTracker.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYSCRPCFactories.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/LocalRMInterface.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestResourceTrackerPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/ResourceTracker.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClusterMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/client/ResourceTrackerPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ClusterMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UnRegisterNodeManagerResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UnRegisterNodeManagerRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceTrackerPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UnRegisterNodeManagerRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 

[jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573045#comment-14573045
 ] 

Hudson commented on YARN-3585:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2164 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2164/])
YARN-3585. NodeManager cannot exit on SHUTDOWN event triggered and NM recovery 
is enabled. Contributed by Rohith Sharmaks (jlowe: rev 
e13b671aa510f553f4a6a232b4694b6a4cce88ae)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java


 NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
 --

 Key: YARN-3585
 URL: https://issues.apache.org/jira/browse/YARN-3585
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Peng Zhang
Assignee: Rohith
Priority: Critical
 Fix For: 2.7.1

 Attachments: 0001-YARN-3585.patch, YARN-3585.patch


 With NM recovery enabled, after decommission, nodemanager log show stop but 
 process cannot end. 
 non daemon thread:
 {noformat}
 DestroyJavaVM prio=10 tid=0x7f3460011800 nid=0x29ec waiting on 
 condition [0x]
 leveldb prio=10 tid=0x7f3354001800 nid=0x2a97 runnable 
 [0x]
 VM Thread prio=10 tid=0x7f3460167000 nid=0x29f8 runnable 
 Gang worker#0 (Parallel GC Threads) prio=10 tid=0x7f346002 
 nid=0x29ed runnable 
 Gang worker#1 (Parallel GC Threads) prio=10 tid=0x7f3460022000 
 nid=0x29ee runnable 
 Gang worker#2 (Parallel GC Threads) prio=10 tid=0x7f3460024000 
 nid=0x29ef runnable 
 Gang worker#3 (Parallel GC Threads) prio=10 tid=0x7f3460025800 
 nid=0x29f0 runnable 
 Gang worker#4 (Parallel GC Threads) prio=10 tid=0x7f3460027800 
 nid=0x29f1 runnable 
 Gang worker#5 (Parallel GC Threads) prio=10 tid=0x7f3460029000 
 nid=0x29f2 runnable 
 Gang worker#6 (Parallel GC Threads) prio=10 tid=0x7f346002b000 
 nid=0x29f3 runnable 
 Gang worker#7 (Parallel GC Threads) prio=10 tid=0x7f346002d000 
 nid=0x29f4 runnable 
 Concurrent Mark-Sweep GC Thread prio=10 tid=0x7f3460120800 nid=0x29f7 
 runnable 
 Gang worker#0 (Parallel CMS Threads) prio=10 tid=0x7f346011c800 
 nid=0x29f5 runnable 
 Gang worker#1 (Parallel CMS Threads) prio=10 tid=0x7f346011e800 
 nid=0x29f6 runnable 
 VM Periodic Task Thread prio=10 tid=0x7f346019f800 nid=0x2a01 waiting 
 on condition 
 {noformat}
 and jni leveldb thread stack
 {noformat}
 Thread 12 (Thread 0x7f33dd842700 (LWP 10903)):
 #0  0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
 /lib64/libpthread.so.0
 #1  0x7f33dfce2a3b in leveldb::(anonymous 
 namespace)::PosixEnv::BGThreadWrapper(void*) () from 
 /tmp/libleveldbjni-64-1-6922178968300745716.8
 #2  0x003d83407851 in start_thread () from /lib64/libpthread.so.0
 #3  0x003d830e811d in clone () from /lib64/libc.so.6
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3751) TestAHSWebServices fails after YARN-3467

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573043#comment-14573043
 ] 

Hudson commented on YARN-3751:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2164 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2164/])
YARN-3751. Fixed AppInfo to check if used resources are null. Contributed by 
Sunil G. (zjshen: rev dbc4f64937ea2b4c941a3ac49afc4eeba3f5b763)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java


 TestAHSWebServices fails after YARN-3467
 

 Key: YARN-3751
 URL: https://issues.apache.org/jira/browse/YARN-3751
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Sunil G
 Fix For: 2.8.0

 Attachments: 0001-YARN-3751.patch


 YARN-3467 changed AppInfo and assumed that used resource is not null. It's 
 not true as this information is not published to timeline server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573034#comment-14573034
 ] 

Hudson commented on YARN-3762:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2164 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2164/])
YARN-3762. FairScheduler: CME on FSParentQueue#getQueueUserAclInfo. (kasha) 
(kasha: rev edb9cd0f7aa1ecaf34afaa120e3d79583e0ec689)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java


 FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
 ---

 Key: YARN-3762
 URL: https://issues.apache.org/jira/browse/YARN-3762
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 2.8.0

 Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch


 In our testing, we ran into the following ConcurrentModificationException:
 {noformat}
 halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0
 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, 
 queueName=root.testyarnpool3, queueCurrentCapacity=0.0, 
 queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0
 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client
 java.util.ConcurrentModificationException: 
 java.util.ConcurrentModificationException
   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
   at java.util.ArrayList$Itr.next(ArrayList.java:851)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3749) We should make a copy of configuration when init MiniYARNCluster with multiple RMs

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573039#comment-14573039
 ] 

Hudson commented on YARN-3749:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2164 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2164/])
YARN-3749. We should make a copy of configuration when init (xgong: rev 
5766a04428f65bb008b5c451f6f09e61e1000300)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/HATestUtil.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceProtocolOnHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java


 We should make a copy of configuration when init MiniYARNCluster with 
 multiple RMs
 --

 Key: YARN-3749
 URL: https://issues.apache.org/jira/browse/YARN-3749
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chun Chen
Assignee: Chun Chen
 Fix For: 2.8.0

 Attachments: YARN-3749.2.patch, YARN-3749.3.patch, YARN-3749.4.patch, 
 YARN-3749.5.patch, YARN-3749.6.patch, YARN-3749.7.patch, YARN-3749.7.patch, 
 YARN-3749.patch


 When I was trying to write a test case for YARN-2674, I found DS client 
 trying to connect to both rm1 and rm2 with the same address 0.0.0.0:18032 
 when RM failover. But I initially set 
 yarn.resourcemanager.address.rm1=0.0.0.0:18032, 
 yarn.resourcemanager.address.rm2=0.0.0.0:28032  After digging, I found it is 
 in ClientRMService where the value of yarn.resourcemanager.address.rm2 
 changed to 0.0.0.0:18032. See the following code in ClientRMService:
 {code}
 clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST,
YarnConfiguration.RM_ADDRESS,

 YarnConfiguration.DEFAULT_RM_ADDRESS,
server.getListenerAddress());
 {code}
 Since we use the same instance of configuration in rm1 and rm2 and init both 
 RM before we start both RM, we will change yarn.resourcemanager.ha.id to rm2 
 during init of rm2 and yarn.resourcemanager.ha.id will become rm2 during 
 starting of rm1.
 So I think it is safe to make a copy of configuration when init both of the 
 rm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3749) We should make a copy of configuration when init MiniYARNCluster with multiple RMs

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572894#comment-14572894
 ] 

Hudson commented on YARN-3749:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #207 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/207/])
YARN-3749. We should make a copy of configuration when init (xgong: rev 
5766a04428f65bb008b5c451f6f09e61e1000300)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceProtocolOnHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/HATestUtil.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


 We should make a copy of configuration when init MiniYARNCluster with 
 multiple RMs
 --

 Key: YARN-3749
 URL: https://issues.apache.org/jira/browse/YARN-3749
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chun Chen
Assignee: Chun Chen
 Fix For: 2.8.0

 Attachments: YARN-3749.2.patch, YARN-3749.3.patch, YARN-3749.4.patch, 
 YARN-3749.5.patch, YARN-3749.6.patch, YARN-3749.7.patch, YARN-3749.7.patch, 
 YARN-3749.patch


 When I was trying to write a test case for YARN-2674, I found DS client 
 trying to connect to both rm1 and rm2 with the same address 0.0.0.0:18032 
 when RM failover. But I initially set 
 yarn.resourcemanager.address.rm1=0.0.0.0:18032, 
 yarn.resourcemanager.address.rm2=0.0.0.0:28032  After digging, I found it is 
 in ClientRMService where the value of yarn.resourcemanager.address.rm2 
 changed to 0.0.0.0:18032. See the following code in ClientRMService:
 {code}
 clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST,
YarnConfiguration.RM_ADDRESS,

 YarnConfiguration.DEFAULT_RM_ADDRESS,
server.getListenerAddress());
 {code}
 Since we use the same instance of configuration in rm1 and rm2 and init both 
 RM before we start both RM, we will change yarn.resourcemanager.ha.id to rm2 
 during init of rm2 and yarn.resourcemanager.ha.id will become rm2 during 
 starting of rm1.
 So I think it is safe to make a copy of configuration when init both of the 
 rm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-41) The RM should handle the graceful shutdown of the NM.

2015-06-04 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-41:
---
Release Note: The behavior of shutdown a NM could be different (if NM work 
preserving is not enabled): NM will unregister to RM immediately rather than 
waiting for timeout to be LOST. A new status of NodeStatus - SHUTDOWN is 
involved which could affect UI, CLI and ClusterMetrics for node's status. 

 The RM should handle the graceful shutdown of the NM.
 -

 Key: YARN-41
 URL: https://issues.apache.org/jira/browse/YARN-41
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Ravi Teja Ch N V
Assignee: Devaraj K
 Fix For: 2.8.0

 Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, 
 MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, 
 YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41-7.patch, 
 YARN-41-8.patch, YARN-41.patch


 Instead of waiting for the NM expiry, RM should remove and handle the NM, 
 which is shutdown gracefully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572889#comment-14572889
 ] 

Hudson commented on YARN-3762:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #207 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/207/])
YARN-3762. FairScheduler: CME on FSParentQueue#getQueueUserAclInfo. (kasha) 
(kasha: rev edb9cd0f7aa1ecaf34afaa120e3d79583e0ec689)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* hadoop-yarn-project/CHANGES.txt


 FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
 ---

 Key: YARN-3762
 URL: https://issues.apache.org/jira/browse/YARN-3762
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 2.8.0

 Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch


 In our testing, we ran into the following ConcurrentModificationException:
 {noformat}
 halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0
 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, 
 queueName=root.testyarnpool3, queueCurrentCapacity=0.0, 
 queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0
 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client
 java.util.ConcurrentModificationException: 
 java.util.ConcurrentModificationException
   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
   at java.util.ArrayList$Itr.next(ArrayList.java:851)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572899#comment-14572899
 ] 

Hudson commented on YARN-3585:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #207 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/207/])
YARN-3585. NodeManager cannot exit on SHUTDOWN event triggered and NM recovery 
is enabled. Contributed by Rohith Sharmaks (jlowe: rev 
e13b671aa510f553f4a6a232b4694b6a4cce88ae)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java


 NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
 --

 Key: YARN-3585
 URL: https://issues.apache.org/jira/browse/YARN-3585
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Peng Zhang
Assignee: Rohith
Priority: Critical
 Fix For: 2.7.1

 Attachments: 0001-YARN-3585.patch, YARN-3585.patch


 With NM recovery enabled, after decommission, nodemanager log show stop but 
 process cannot end. 
 non daemon thread:
 {noformat}
 DestroyJavaVM prio=10 tid=0x7f3460011800 nid=0x29ec waiting on 
 condition [0x]
 leveldb prio=10 tid=0x7f3354001800 nid=0x2a97 runnable 
 [0x]
 VM Thread prio=10 tid=0x7f3460167000 nid=0x29f8 runnable 
 Gang worker#0 (Parallel GC Threads) prio=10 tid=0x7f346002 
 nid=0x29ed runnable 
 Gang worker#1 (Parallel GC Threads) prio=10 tid=0x7f3460022000 
 nid=0x29ee runnable 
 Gang worker#2 (Parallel GC Threads) prio=10 tid=0x7f3460024000 
 nid=0x29ef runnable 
 Gang worker#3 (Parallel GC Threads) prio=10 tid=0x7f3460025800 
 nid=0x29f0 runnable 
 Gang worker#4 (Parallel GC Threads) prio=10 tid=0x7f3460027800 
 nid=0x29f1 runnable 
 Gang worker#5 (Parallel GC Threads) prio=10 tid=0x7f3460029000 
 nid=0x29f2 runnable 
 Gang worker#6 (Parallel GC Threads) prio=10 tid=0x7f346002b000 
 nid=0x29f3 runnable 
 Gang worker#7 (Parallel GC Threads) prio=10 tid=0x7f346002d000 
 nid=0x29f4 runnable 
 Concurrent Mark-Sweep GC Thread prio=10 tid=0x7f3460120800 nid=0x29f7 
 runnable 
 Gang worker#0 (Parallel CMS Threads) prio=10 tid=0x7f346011c800 
 nid=0x29f5 runnable 
 Gang worker#1 (Parallel CMS Threads) prio=10 tid=0x7f346011e800 
 nid=0x29f6 runnable 
 VM Periodic Task Thread prio=10 tid=0x7f346019f800 nid=0x2a01 waiting 
 on condition 
 {noformat}
 and jni leveldb thread stack
 {noformat}
 Thread 12 (Thread 0x7f33dd842700 (LWP 10903)):
 #0  0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
 /lib64/libpthread.so.0
 #1  0x7f33dfce2a3b in leveldb::(anonymous 
 namespace)::PosixEnv::BGThreadWrapper(void*) () from 
 /tmp/libleveldbjni-64-1-6922178968300745716.8
 #2  0x003d83407851 in start_thread () from /lib64/libpthread.so.0
 #3  0x003d830e811d in clone () from /lib64/libc.so.6
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572895#comment-14572895
 ] 

Hudson commented on YARN-1462:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #207 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/207/])
Revert YARN-1462. Correct fix version from branch-2.7.1 to branch-2.8 in 
(zjshen: rev 4eec2fd132a7c3d100f2124b99ca8cd7befa27c7)
* hadoop-yarn-project/CHANGES.txt
Revert YARN-1462. Made RM write application tags to timeline server and 
exposed them to users via generic history web UI and REST API. Contributed by 
Xuan Gong. (zjshen: rev bc85959eddcb11037e8b9f0e06780b7c3e1cbab6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/NotRunningJob.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestYARNRunner.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestApplicatonReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java


 AHS API and other AHS changes to handle tags for completed MR jobs
 --

 Key: YARN-1462
 URL: https://issues.apache.org/jira/browse/YARN-1462
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Xuan Gong
 Fix For: 2.8.0

 Attachments: YARN-1462-branch-2.7-1.2.patch, 
 YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, 
 YARN-1462.3.patch, YARN-1462.4.patch


 AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572861#comment-14572861
 ] 

Hudson commented on YARN-3762:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2146 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2146/])
YARN-3762. FairScheduler: CME on FSParentQueue#getQueueUserAclInfo. (kasha) 
(kasha: rev edb9cd0f7aa1ecaf34afaa120e3d79583e0ec689)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* hadoop-yarn-project/CHANGES.txt


 FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
 ---

 Key: YARN-3762
 URL: https://issues.apache.org/jira/browse/YARN-3762
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 2.8.0

 Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch


 In our testing, we ran into the following ConcurrentModificationException:
 {noformat}
 halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0
 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, 
 queueName=root.testyarnpool3, queueCurrentCapacity=0.0, 
 queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0
 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client
 java.util.ConcurrentModificationException: 
 java.util.ConcurrentModificationException
   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
   at java.util.ArrayList$Itr.next(ArrayList.java:851)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3749) We should make a copy of configuration when init MiniYARNCluster with multiple RMs

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572866#comment-14572866
 ] 

Hudson commented on YARN-3749:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2146 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2146/])
YARN-3749. We should make a copy of configuration when init (xgong: rev 
5766a04428f65bb008b5c451f6f09e61e1000300)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceProtocolOnHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/HATestUtil.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java


 We should make a copy of configuration when init MiniYARNCluster with 
 multiple RMs
 --

 Key: YARN-3749
 URL: https://issues.apache.org/jira/browse/YARN-3749
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chun Chen
Assignee: Chun Chen
 Fix For: 2.8.0

 Attachments: YARN-3749.2.patch, YARN-3749.3.patch, YARN-3749.4.patch, 
 YARN-3749.5.patch, YARN-3749.6.patch, YARN-3749.7.patch, YARN-3749.7.patch, 
 YARN-3749.patch


 When I was trying to write a test case for YARN-2674, I found DS client 
 trying to connect to both rm1 and rm2 with the same address 0.0.0.0:18032 
 when RM failover. But I initially set 
 yarn.resourcemanager.address.rm1=0.0.0.0:18032, 
 yarn.resourcemanager.address.rm2=0.0.0.0:28032  After digging, I found it is 
 in ClientRMService where the value of yarn.resourcemanager.address.rm2 
 changed to 0.0.0.0:18032. See the following code in ClientRMService:
 {code}
 clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST,
YarnConfiguration.RM_ADDRESS,

 YarnConfiguration.DEFAULT_RM_ADDRESS,
server.getListenerAddress());
 {code}
 Since we use the same instance of configuration in rm1 and rm2 and init both 
 RM before we start both RM, we will change yarn.resourcemanager.ha.id to rm2 
 during init of rm2 and yarn.resourcemanager.ha.id will become rm2 during 
 starting of rm1.
 So I think it is safe to make a copy of configuration when init both of the 
 rm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572871#comment-14572871
 ] 

Hudson commented on YARN-3585:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2146 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2146/])
YARN-3585. NodeManager cannot exit on SHUTDOWN event triggered and NM recovery 
is enabled. Contributed by Rohith Sharmaks (jlowe: rev 
e13b671aa510f553f4a6a232b4694b6a4cce88ae)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java


 NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
 --

 Key: YARN-3585
 URL: https://issues.apache.org/jira/browse/YARN-3585
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Peng Zhang
Assignee: Rohith
Priority: Critical
 Fix For: 2.7.1

 Attachments: 0001-YARN-3585.patch, YARN-3585.patch


 With NM recovery enabled, after decommission, nodemanager log show stop but 
 process cannot end. 
 non daemon thread:
 {noformat}
 DestroyJavaVM prio=10 tid=0x7f3460011800 nid=0x29ec waiting on 
 condition [0x]
 leveldb prio=10 tid=0x7f3354001800 nid=0x2a97 runnable 
 [0x]
 VM Thread prio=10 tid=0x7f3460167000 nid=0x29f8 runnable 
 Gang worker#0 (Parallel GC Threads) prio=10 tid=0x7f346002 
 nid=0x29ed runnable 
 Gang worker#1 (Parallel GC Threads) prio=10 tid=0x7f3460022000 
 nid=0x29ee runnable 
 Gang worker#2 (Parallel GC Threads) prio=10 tid=0x7f3460024000 
 nid=0x29ef runnable 
 Gang worker#3 (Parallel GC Threads) prio=10 tid=0x7f3460025800 
 nid=0x29f0 runnable 
 Gang worker#4 (Parallel GC Threads) prio=10 tid=0x7f3460027800 
 nid=0x29f1 runnable 
 Gang worker#5 (Parallel GC Threads) prio=10 tid=0x7f3460029000 
 nid=0x29f2 runnable 
 Gang worker#6 (Parallel GC Threads) prio=10 tid=0x7f346002b000 
 nid=0x29f3 runnable 
 Gang worker#7 (Parallel GC Threads) prio=10 tid=0x7f346002d000 
 nid=0x29f4 runnable 
 Concurrent Mark-Sweep GC Thread prio=10 tid=0x7f3460120800 nid=0x29f7 
 runnable 
 Gang worker#0 (Parallel CMS Threads) prio=10 tid=0x7f346011c800 
 nid=0x29f5 runnable 
 Gang worker#1 (Parallel CMS Threads) prio=10 tid=0x7f346011e800 
 nid=0x29f6 runnable 
 VM Periodic Task Thread prio=10 tid=0x7f346019f800 nid=0x2a01 waiting 
 on condition 
 {noformat}
 and jni leveldb thread stack
 {noformat}
 Thread 12 (Thread 0x7f33dd842700 (LWP 10903)):
 #0  0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
 /lib64/libpthread.so.0
 #1  0x7f33dfce2a3b in leveldb::(anonymous 
 namespace)::PosixEnv::BGThreadWrapper(void*) () from 
 /tmp/libleveldbjni-64-1-6922178968300745716.8
 #2  0x003d83407851 in start_thread () from /lib64/libpthread.so.0
 #3  0x003d830e811d in clone () from /lib64/libc.so.6
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3751) TestAHSWebServices fails after YARN-3467

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572869#comment-14572869
 ] 

Hudson commented on YARN-3751:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2146 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2146/])
YARN-3751. Fixed AppInfo to check if used resources are null. Contributed by 
Sunil G. (zjshen: rev dbc4f64937ea2b4c941a3ac49afc4eeba3f5b763)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java
* hadoop-yarn-project/CHANGES.txt


 TestAHSWebServices fails after YARN-3467
 

 Key: YARN-3751
 URL: https://issues.apache.org/jira/browse/YARN-3751
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Sunil G
 Fix For: 2.8.0

 Attachments: 0001-YARN-3751.patch


 YARN-3467 changed AppInfo and assumed that used resource is not null. It's 
 not true as this information is not published to timeline server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572867#comment-14572867
 ] 

Hudson commented on YARN-1462:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2146 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2146/])
Revert YARN-1462. Correct fix version from branch-2.7.1 to branch-2.8 in 
(zjshen: rev 4eec2fd132a7c3d100f2124b99ca8cd7befa27c7)
* hadoop-yarn-project/CHANGES.txt
Revert YARN-1462. Made RM write application tags to timeline server and 
exposed them to users via generic history web UI and REST API. Contributed by 
Xuan Gong. (zjshen: rev bc85959eddcb11037e8b9f0e06780b7c3e1cbab6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestApplicatonReport.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/NotRunningJob.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestYARNRunner.java
* hadoop-yarn-project/CHANGES.txt


 AHS API and other AHS changes to handle tags for completed MR jobs
 --

 Key: YARN-1462
 URL: https://issues.apache.org/jira/browse/YARN-1462
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Xuan Gong
 Fix For: 2.8.0

 Attachments: YARN-1462-branch-2.7-1.2.patch, 
 YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, 
 YARN-1462.3.patch, YARN-1462.4.patch


 AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3751) TestAHSWebServices fails after YARN-3467

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572897#comment-14572897
 ] 

Hudson commented on YARN-3751:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #207 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/207/])
YARN-3751. Fixed AppInfo to check if used resources are null. Contributed by 
Sunil G. (zjshen: rev dbc4f64937ea2b4c941a3ac49afc4eeba3f5b763)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java


 TestAHSWebServices fails after YARN-3467
 

 Key: YARN-3751
 URL: https://issues.apache.org/jira/browse/YARN-3751
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Sunil G
 Fix For: 2.8.0

 Attachments: 0001-YARN-3751.patch


 YARN-3467 changed AppInfo and assumed that used resource is not null. It's 
 not true as this information is not published to timeline server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3017) ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID

2015-06-04 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572289#comment-14572289
 ] 

Rohith commented on YARN-3017:
--

Apoligies for coming very late into this issue.. Thinking that changing 
containerId format may breaks complatability when rolling upgrade has been done 
with RM HA + work preserving enabled? IIUC, using ZKRMStateStore, rolling 
upgrade can be done now.

 ContainerID in ResourceManager Log Has Slightly Different Format From 
 AppAttemptID
 --

 Key: YARN-3017
 URL: https://issues.apache.org/jira/browse/YARN-3017
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.8.0
Reporter: MUFEED USMAN
Priority: Minor
  Labels: PatchAvailable
 Attachments: YARN-3017.patch, YARN-3017_1.patch, YARN-3017_2.patch


 Not sure if this should be filed as a bug or not.
 In the ResourceManager log in the events surrounding the creation of a new
 application attempt,
 ...
 ...
 2014-11-14 17:45:37,258 INFO
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching
 masterappattempt_1412150883650_0001_02
 ...
 ...
 The application attempt has the ID format _1412150883650_0001_02.
 Whereas the associated ContainerID goes by _1412150883650_0001_02_.
 ...
 ...
 2014-11-14 17:45:37,260 INFO
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting 
 up
 container Container: [ContainerId: container_1412150883650_0001_02_01,
 NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource: memory:2048, 
 vCores:1,
 disks:0.0, Priority: 0, Token: Token { kind: ContainerToken, service:
 10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_02
 ...
 ...
 Curious to know if this is kept like that for a reason. If not while using
 filtering tools to, say, grep events surrounding a specific attempt by the
 numeric ID part information may slip out during troubleshooting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3733) DominantRC#compare() does not work as expected if cluster resource is empty

2015-06-04 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572239#comment-14572239
 ] 

Sunil G commented on YARN-3733:
---

Patch looks good to me. +1

For MockRM.submitApp, I think we need to support the addition of Cores and 
Memory. I will file a separate ticket to handle the same if its fine.

 DominantRC#compare() does not work as expected if cluster resource is empty
 ---

 Key: YARN-3733
 URL: https://issues.apache.org/jira/browse/YARN-3733
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3 , 2 NM , 2 RM
 one NM - 3 GB 6 v core
Reporter: Bibin A Chundatt
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, 
 0002-YARN-3733.patch, YARN-3733.patch


 Steps to reproduce
 =
 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster)
 2. Configure map and reduce size to 512 MB  after changing scheduler minimum 
 size to 512 MB
 3. Configure capacity scheduler and AM limit to .5 
 (DominantResourceCalculator is configured)
 4. Submit 30 concurrent task 
 5. Switch RM
 Actual
 =
 For 12 Jobs AM gets allocated and all 12 starts running
 No other Yarn child is initiated , *all 12 Jobs in Running state for ever*
 Expected
 ===
 Only 6 should be running at a time since max AM allocated is .5 (3072 MB)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.

2015-06-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572240#comment-14572240
 ] 

Hadoop QA commented on YARN-41:
---

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  19m 45s | Pre-patch trunk has 3 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 12 new or modified test files. |
| {color:green}+1{color} | javac |   7m 35s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m 53s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m 33s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 36s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   5m 55s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 22s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-server-common. |
| {color:green}+1{color} | yarn tests |   6m  3s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:green}+1{color} | yarn tests |  50m 16s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| {color:green}+1{color} | yarn tests |   1m 52s | Tests passed in 
hadoop-yarn-server-tests. |
| | | 107m 51s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12735565/YARN-41-8.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b5f0d29 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8190/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8190/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8190/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8190/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8190/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-tests test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8190/artifact/patchprocess/testrun_hadoop-yarn-server-tests.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8190/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8190/console |


This message was automatically generated.

 The RM should handle the graceful shutdown of the NM.
 -

 Key: YARN-41
 URL: https://issues.apache.org/jira/browse/YARN-41
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Ravi Teja Ch N V
Assignee: Devaraj K
 Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, 
 MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, 
 YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41-7.patch, 
 YARN-41-8.patch, YARN-41.patch


 Instead of waiting for the NM expiry, RM should remove and handle the NM, 
 which is shutdown gracefully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3017) ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID

2015-06-04 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572248#comment-14572248
 ] 

zhihai xu commented on YARN-3017:
-

+1 non-binding
The checkstyle issue looks like a script issue
{code}
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerId.java:119:50:
 Name 'epochFormat' must match pattern '^[A-Z][A-Z0-9]*(_[A-Z0-9]+)*$'.
{code}
I think the pattern used to match the name is not correct.
Thanks again for working on this.

 ContainerID in ResourceManager Log Has Slightly Different Format From 
 AppAttemptID
 --

 Key: YARN-3017
 URL: https://issues.apache.org/jira/browse/YARN-3017
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.8.0
Reporter: MUFEED USMAN
Priority: Minor
  Labels: PatchAvailable
 Attachments: YARN-3017.patch, YARN-3017_1.patch, YARN-3017_2.patch


 Not sure if this should be filed as a bug or not.
 In the ResourceManager log in the events surrounding the creation of a new
 application attempt,
 ...
 ...
 2014-11-14 17:45:37,258 INFO
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching
 masterappattempt_1412150883650_0001_02
 ...
 ...
 The application attempt has the ID format _1412150883650_0001_02.
 Whereas the associated ContainerID goes by _1412150883650_0001_02_.
 ...
 ...
 2014-11-14 17:45:37,260 INFO
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting 
 up
 container Container: [ContainerId: container_1412150883650_0001_02_01,
 NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource: memory:2048, 
 vCores:1,
 disks:0.0, Priority: 0, Token: Token { kind: ContainerToken, service:
 10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_02
 ...
 ...
 Curious to know if this is kept like that for a reason. If not while using
 filtering tools to, say, grep events surrounding a specific attempt by the
 numeric ID part information may slip out during troubleshooting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3733) DominantRC#compare() does not work as expected if cluster resource is empty

2015-06-04 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572247#comment-14572247
 ] 

Rohith commented on YARN-3733:
--

+1 for handling virtual core's. This will good immprovement for testing 
DominantRC functionality precicely. 

 DominantRC#compare() does not work as expected if cluster resource is empty
 ---

 Key: YARN-3733
 URL: https://issues.apache.org/jira/browse/YARN-3733
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3 , 2 NM , 2 RM
 one NM - 3 GB 6 v core
Reporter: Bibin A Chundatt
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, 
 0002-YARN-3733.patch, YARN-3733.patch


 Steps to reproduce
 =
 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster)
 2. Configure map and reduce size to 512 MB  after changing scheduler minimum 
 size to 512 MB
 3. Configure capacity scheduler and AM limit to .5 
 (DominantResourceCalculator is configured)
 4. Submit 30 concurrent task 
 5. Switch RM
 Actual
 =
 For 12 Jobs AM gets allocated and all 12 starts running
 No other Yarn child is initiated , *all 12 Jobs in Running state for ever*
 Expected
 ===
 Only 6 should be running at a time since max AM allocated is .5 (3072 MB)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.

2015-06-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572318#comment-14572318
 ] 

Hadoop QA commented on YARN-41:
---

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  20m 58s | Pre-patch trunk has 3 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 12 new or modified test files. |
| {color:green}+1{color} | javac |   7m 45s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 53s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   3m  1s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m 35s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   6m  3s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 30s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-server-common. |
| {color:green}+1{color} | yarn tests |   6m 16s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:red}-1{color} | yarn tests |  50m 29s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| {color:green}+1{color} | yarn tests |   1m 52s | Tests passed in 
hadoop-yarn-server-tests. |
| | | 110m 24s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12735565/YARN-41-8.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 1bb79c9 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8192/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8192/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8192/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8192/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8192/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-tests test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8192/artifact/patchprocess/testrun_hadoop-yarn-server-tests.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8192/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8192/console |


This message was automatically generated.

 The RM should handle the graceful shutdown of the NM.
 -

 Key: YARN-41
 URL: https://issues.apache.org/jira/browse/YARN-41
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Ravi Teja Ch N V
Assignee: Devaraj K
 Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, 
 MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, 
 YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41-7.patch, 
 YARN-41-8.patch, YARN-41.patch


 Instead of waiting for the NM expiry, RM should remove and handle the NM, 
 which is shutdown gracefully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3754) Race condition when the NodeManager is shutting down and container is launched

2015-06-04 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572244#comment-14572244
 ] 

Rohith commented on YARN-3754:
--

bq. When NM is shutting down, ContainerLaunch is also interrupted. During this 
interrupted exception handling, NM tries to update container diagnostics. But 
from main thread statestore is down ,hence caused the DB Close exception.
I think this issue caused since NM jvm did not exit on_time which allowed to 
process the statestore event. After YARN-3585 , I think this should be OK.
[~bibinchundatt] Can you regression it pls

 Race condition when the NodeManager is shutting down and container is launched
 --

 Key: YARN-3754
 URL: https://issues.apache.org/jira/browse/YARN-3754
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
 Environment: Suse 11 Sp3
Reporter: Bibin A Chundatt
Assignee: Sunil G
Priority: Critical
 Attachments: NM.log


 Container is launched and returned to ContainerImpl
 NodeManager closed the DB connection which resulting in 
 {{org.iq80.leveldb.DBException: Closed}}. 
 *Attaching the exception trace*
 {code}
 2015-05-30 02:11:49,122 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
  Unable to update state store diagnostics for 
 container_e310_1432817693365_3338_01_02
 java.io.IOException: org.iq80.leveldb.DBException: Closed
 at 
 org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.storeContainerDiagnostics(NMLeveldbStateStoreService.java:261)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$ContainerDiagnosticsUpdateTransition.transition(ContainerImpl.java:1109)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$ContainerDiagnosticsUpdateTransition.transition(ContainerImpl.java:1101)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:1129)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:83)
 at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:246)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: org.iq80.leveldb.DBException: Closed
 at org.fusesource.leveldbjni.internal.JniDB.put(JniDB.java:123)
 at org.fusesource.leveldbjni.internal.JniDB.put(JniDB.java:106)
 at 
 org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.storeContainerDiagnostics(NMLeveldbStateStoreService.java:259)
 ... 15 more
 {code}
 we can add a check whether DB is closed while we move container from ACQUIRED 
 state.
 As per the discussion in YARN-3585 have add the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2674) Distributed shell AM may re-launch containers if RM work preserving restart happens

2015-06-04 Thread Chun Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun Chen updated YARN-2674:

Attachment: YARN-2674.3.patch

 Distributed shell AM may re-launch containers if RM work preserving restart 
 happens
 ---

 Key: YARN-2674
 URL: https://issues.apache.org/jira/browse/YARN-2674
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Chun Chen
Assignee: Chun Chen
 Attachments: YARN-2674.1.patch, YARN-2674.2.patch, YARN-2674.3.patch


 Currently, if RM work preserving restart happens while distributed shell is 
 running, distribute shell AM may re-launch all the containers, including 
 new/running/complete. We must make sure it won't re-launch the 
 running/complete containers.
 We need to remove allocated containers from 
 AMRMClientImpl#remoteRequestsTable once AM receive them from RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2674) Distributed shell AM may re-launch containers if RM work preserving restart happens

2015-06-04 Thread Chun Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572535#comment-14572535
 ] 

Chun Chen commented on YARN-2674:
-

Upload YARN-2674.3.patch with a test case and more detailed comments.

 Distributed shell AM may re-launch containers if RM work preserving restart 
 happens
 ---

 Key: YARN-2674
 URL: https://issues.apache.org/jira/browse/YARN-2674
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Chun Chen
Assignee: Chun Chen
 Attachments: YARN-2674.1.patch, YARN-2674.2.patch, YARN-2674.3.patch


 Currently, if RM work preserving restart happens while distributed shell is 
 running, distribute shell AM may re-launch all the containers, including 
 new/running/complete. We must make sure it won't re-launch the 
 running/complete containers.
 We need to remove allocated containers from 
 AMRMClientImpl#remoteRequestsTable once AM receive them from RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2392) add more diags about app retry limits on AM failures

2015-06-04 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572419#comment-14572419
 ] 

Steve Loughran commented on YARN-2392:
--

checkstyle
{code}
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java:1464:
 ' Then click on links to logs of each attempt.\n' have incorrect indentation 
level 8, expected level should be 10.
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java:1020:
 Line is longer than 80 characters (found 81).
{code}

 add more diags about app retry limits on AM failures
 

 Key: YARN-2392
 URL: https://issues.apache.org/jira/browse/YARN-2392
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
 Attachments: YARN-2392-001.patch, YARN-2392-002.patch, 
 YARN-2392-002.patch


 # when an app fails the failure count is shown, but not what the global + 
 local limits are. If the two are different, they should both be printed. 
 # the YARN-2242 strings don't have enough whitespace between text and the URL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.

2015-06-04 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572376#comment-14572376
 ] 

Devaraj K commented on YARN-41:
---

{code:xml}
-1  pre-patch   19m 45s Pre-patch trunk has 3 extant Findbugs (version 
3.0.0) warnings.
{code}

These findbugs are not related to the patch here.

 The RM should handle the graceful shutdown of the NM.
 -

 Key: YARN-41
 URL: https://issues.apache.org/jira/browse/YARN-41
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Ravi Teja Ch N V
Assignee: Devaraj K
 Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, 
 MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, 
 YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41-7.patch, 
 YARN-41-8.patch, YARN-41.patch


 Instead of waiting for the NM expiry, RM should remove and handle the NM, 
 which is shutdown gracefully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3733) DominantRC#compare() does not work as expected if cluster resource is empty

2015-06-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1457#comment-1457
 ] 

Hadoop QA commented on YARN-3733:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 28s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 36s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 39s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 40s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 36s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 59s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 56s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |  50m 24s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  94m 18s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12737453/0002-YARN-3733.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b5f0d29 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8191/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8191/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8191/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8191/console |


This message was automatically generated.

 DominantRC#compare() does not work as expected if cluster resource is empty
 ---

 Key: YARN-3733
 URL: https://issues.apache.org/jira/browse/YARN-3733
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3 , 2 NM , 2 RM
 one NM - 3 GB 6 v core
Reporter: Bibin A Chundatt
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, 
 0002-YARN-3733.patch, YARN-3733.patch


 Steps to reproduce
 =
 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster)
 2. Configure map and reduce size to 512 MB  after changing scheduler minimum 
 size to 512 MB
 3. Configure capacity scheduler and AM limit to .5 
 (DominantResourceCalculator is configured)
 4. Submit 30 concurrent task 
 5. Switch RM
 Actual
 =
 For 12 Jobs AM gets allocated and all 12 starts running
 No other Yarn child is initiated , *all 12 Jobs in Running state for ever*
 Expected
 ===
 Only 6 should be running at a time since max AM allocated is .5 (3072 MB)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572591#comment-14572591
 ] 

Hudson commented on YARN-3762:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #218 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/218/])
YARN-3762. FairScheduler: CME on FSParentQueue#getQueueUserAclInfo. (kasha) 
(kasha: rev edb9cd0f7aa1ecaf34afaa120e3d79583e0ec689)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java


 FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
 ---

 Key: YARN-3762
 URL: https://issues.apache.org/jira/browse/YARN-3762
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 2.8.0

 Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch


 In our testing, we ran into the following ConcurrentModificationException:
 {noformat}
 halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0
 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, 
 queueName=root.testyarnpool3, queueCurrentCapacity=0.0, 
 queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0
 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client
 java.util.ConcurrentModificationException: 
 java.util.ConcurrentModificationException
   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
   at java.util.ArrayList$Itr.next(ArrayList.java:851)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572597#comment-14572597
 ] 

Hudson commented on YARN-1462:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #218 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/218/])
Revert YARN-1462. Correct fix version from branch-2.7.1 to branch-2.8 in 
(zjshen: rev 4eec2fd132a7c3d100f2124b99ca8cd7befa27c7)
* hadoop-yarn-project/CHANGES.txt
Revert YARN-1462. Made RM write application tags to timeline server and 
exposed them to users via generic history web UI and REST API. Contributed by 
Xuan Gong. (zjshen: rev bc85959eddcb11037e8b9f0e06780b7c3e1cbab6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestYARNRunner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/NotRunningJob.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestApplicatonReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java


 AHS API and other AHS changes to handle tags for completed MR jobs
 --

 Key: YARN-1462
 URL: https://issues.apache.org/jira/browse/YARN-1462
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Xuan Gong
 Fix For: 2.8.0

 Attachments: YARN-1462-branch-2.7-1.2.patch, 
 YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, 
 YARN-1462.3.patch, YARN-1462.4.patch


 AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3751) TestAHSWebServices fails after YARN-3467

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572599#comment-14572599
 ] 

Hudson commented on YARN-3751:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #218 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/218/])
YARN-3751. Fixed AppInfo to check if used resources are null. Contributed by 
Sunil G. (zjshen: rev dbc4f64937ea2b4c941a3ac49afc4eeba3f5b763)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java


 TestAHSWebServices fails after YARN-3467
 

 Key: YARN-3751
 URL: https://issues.apache.org/jira/browse/YARN-3751
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Sunil G
 Fix For: 2.8.0

 Attachments: 0001-YARN-3751.patch


 YARN-3467 changed AppInfo and assumed that used resource is not null. It's 
 not true as this information is not published to timeline server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572601#comment-14572601
 ] 

Hudson commented on YARN-3585:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #218 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/218/])
YARN-3585. NodeManager cannot exit on SHUTDOWN event triggered and NM recovery 
is enabled. Contributed by Rohith Sharmaks (jlowe: rev 
e13b671aa510f553f4a6a232b4694b6a4cce88ae)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* hadoop-yarn-project/CHANGES.txt


 NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
 --

 Key: YARN-3585
 URL: https://issues.apache.org/jira/browse/YARN-3585
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Peng Zhang
Assignee: Rohith
Priority: Critical
 Fix For: 2.7.1

 Attachments: 0001-YARN-3585.patch, YARN-3585.patch


 With NM recovery enabled, after decommission, nodemanager log show stop but 
 process cannot end. 
 non daemon thread:
 {noformat}
 DestroyJavaVM prio=10 tid=0x7f3460011800 nid=0x29ec waiting on 
 condition [0x]
 leveldb prio=10 tid=0x7f3354001800 nid=0x2a97 runnable 
 [0x]
 VM Thread prio=10 tid=0x7f3460167000 nid=0x29f8 runnable 
 Gang worker#0 (Parallel GC Threads) prio=10 tid=0x7f346002 
 nid=0x29ed runnable 
 Gang worker#1 (Parallel GC Threads) prio=10 tid=0x7f3460022000 
 nid=0x29ee runnable 
 Gang worker#2 (Parallel GC Threads) prio=10 tid=0x7f3460024000 
 nid=0x29ef runnable 
 Gang worker#3 (Parallel GC Threads) prio=10 tid=0x7f3460025800 
 nid=0x29f0 runnable 
 Gang worker#4 (Parallel GC Threads) prio=10 tid=0x7f3460027800 
 nid=0x29f1 runnable 
 Gang worker#5 (Parallel GC Threads) prio=10 tid=0x7f3460029000 
 nid=0x29f2 runnable 
 Gang worker#6 (Parallel GC Threads) prio=10 tid=0x7f346002b000 
 nid=0x29f3 runnable 
 Gang worker#7 (Parallel GC Threads) prio=10 tid=0x7f346002d000 
 nid=0x29f4 runnable 
 Concurrent Mark-Sweep GC Thread prio=10 tid=0x7f3460120800 nid=0x29f7 
 runnable 
 Gang worker#0 (Parallel CMS Threads) prio=10 tid=0x7f346011c800 
 nid=0x29f5 runnable 
 Gang worker#1 (Parallel CMS Threads) prio=10 tid=0x7f346011e800 
 nid=0x29f6 runnable 
 VM Periodic Task Thread prio=10 tid=0x7f346019f800 nid=0x2a01 waiting 
 on condition 
 {noformat}
 and jni leveldb thread stack
 {noformat}
 Thread 12 (Thread 0x7f33dd842700 (LWP 10903)):
 #0  0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
 /lib64/libpthread.so.0
 #1  0x7f33dfce2a3b in leveldb::(anonymous 
 namespace)::PosixEnv::BGThreadWrapper(void*) () from 
 /tmp/libleveldbjni-64-1-6922178968300745716.8
 #2  0x003d83407851 in start_thread () from /lib64/libpthread.so.0
 #3  0x003d830e811d in clone () from /lib64/libc.so.6
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3749) We should make a copy of configuration when init MiniYARNCluster with multiple RMs

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572596#comment-14572596
 ] 

Hudson commented on YARN-3749:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #218 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/218/])
YARN-3749. We should make a copy of configuration when init (xgong: rev 
5766a04428f65bb008b5c451f6f09e61e1000300)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/HATestUtil.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceProtocolOnHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java


 We should make a copy of configuration when init MiniYARNCluster with 
 multiple RMs
 --

 Key: YARN-3749
 URL: https://issues.apache.org/jira/browse/YARN-3749
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chun Chen
Assignee: Chun Chen
 Fix For: 2.8.0

 Attachments: YARN-3749.2.patch, YARN-3749.3.patch, YARN-3749.4.patch, 
 YARN-3749.5.patch, YARN-3749.6.patch, YARN-3749.7.patch, YARN-3749.7.patch, 
 YARN-3749.patch


 When I was trying to write a test case for YARN-2674, I found DS client 
 trying to connect to both rm1 and rm2 with the same address 0.0.0.0:18032 
 when RM failover. But I initially set 
 yarn.resourcemanager.address.rm1=0.0.0.0:18032, 
 yarn.resourcemanager.address.rm2=0.0.0.0:28032  After digging, I found it is 
 in ClientRMService where the value of yarn.resourcemanager.address.rm2 
 changed to 0.0.0.0:18032. See the following code in ClientRMService:
 {code}
 clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST,
YarnConfiguration.RM_ADDRESS,

 YarnConfiguration.DEFAULT_RM_ADDRESS,
server.getListenerAddress());
 {code}
 Since we use the same instance of configuration in rm1 and rm2 and init both 
 RM before we start both RM, we will change yarn.resourcemanager.ha.id to rm2 
 during init of rm2 and yarn.resourcemanager.ha.id will become rm2 during 
 starting of rm1.
 So I think it is safe to make a copy of configuration when init both of the 
 rm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2674) Distributed shell AM may re-launch containers if RM work preserving restart happens

2015-06-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572604#comment-14572604
 ] 

Hadoop QA commented on YARN-2674:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 42s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:red}-1{color} | javac |   7m 32s | The applied patch generated  1  
additional warning messages. |
| {color:green}+1{color} | javadoc |   9m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 39s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   1m 30s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |   1m  3s | Tests failed in 
hadoop-yarn-applications-distributedshell. |
| {color:green}+1{color} | yarn tests |   1m 52s | Tests passed in 
hadoop-yarn-server-tests. |
| | |  40m 27s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-applications-distributedshell |
| Failed unit tests | 
hadoop.yarn.applications.distributedshell.TestDistributedShellWithNodeLabels |
|   | hadoop.yarn.applications.distributedshell.TestDSAppMaster |
|   | hadoop.yarn.applications.distributedshell.TestDistributedShell |
|   | hadoop.yarn.applications.distributedshell.TestDistributedShellWithRMHA |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12737533/YARN-2674.3.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / e830207 |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/8193/artifact/patchprocess/diffJavacWarnings.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8193/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-applications-distributedshell.html
 |
| hadoop-yarn-applications-distributedshell test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8193/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt
 |
| hadoop-yarn-server-tests test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8193/artifact/patchprocess/testrun_hadoop-yarn-server-tests.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8193/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8193/console |


This message was automatically generated.

 Distributed shell AM may re-launch containers if RM work preserving restart 
 happens
 ---

 Key: YARN-2674
 URL: https://issues.apache.org/jira/browse/YARN-2674
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Chun Chen
Assignee: Chun Chen
 Attachments: YARN-2674.1.patch, YARN-2674.2.patch, YARN-2674.3.patch


 Currently, if RM work preserving restart happens while distributed shell is 
 running, distribute shell AM may re-launch all the containers, including 
 new/running/complete. We must make sure it won't re-launch the 
 running/complete containers.
 We need to remove allocated containers from 
 AMRMClientImpl#remoteRequestsTable once AM receive them from RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3758) The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not working as expected in FairScheduler

2015-06-04 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572657#comment-14572657
 ] 

Naganarasimha G R commented on YARN-3758:
-

Hi [~rohithsharma] This issue is similar to the issue raised in YARN-3525, I 
feel if {{yarn.scheduler.minimum-allocation-mb}} is specific to capacity then 
better change it to {{yarn.scheduler.capacity.minimum-allocation-mb}} similar 
to the suggestion in YARN-3525. So that there is less confusion. Thoughts ?

 The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not 
 working as expected in FairScheduler
 

 Key: YARN-3758
 URL: https://issues.apache.org/jira/browse/YARN-3758
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: skrho

 Hello there~~
 I have 2 clusters
 First cluster is 5 node , default 1 application queue, Capacity scheduler, 8G 
 Physical memory each node
 Second cluster is 10 node, 2 application queuey, fair-scheduler, 230G 
 Physical memory each node
 Wherever a mapreduce job is running, I want resourcemanager is to set the 
 minimum memory  256m to container
 So I was changing configuration in yarn-site.xml  mapred-site.xml
 yarn.scheduler.minimum-allocation-mb : 256
 mapreduce.map.java.opts : -Xms256m 
 mapreduce.reduce.java.opts : -Xms256m 
 mapreduce.map.memory.mb : 256 
 mapreduce.reduce.memory.mb : 256 
 In First cluster  whenever a mapreduce job is running , I can see used memory 
 256m in web console( http://installedIP:8088/cluster/nodes )
 But In Second cluster whenever a mapreduce job is running , I can see used 
 memory 1024m in web console( http://installedIP:8088/cluster/nodes ) 
 I know default memory value is 1024m, so if there is not changing memory 
 setting, the default value is working.
 I have been testing for two weeks, but I don't know why mimimum memory 
 setting is not working in second cluster
 Why this difference is happened? 
 Am I wrong setting configuration?
 or Is there bug?
 Thank you for reading~~



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3751) TestAHSWebServices fails after YARN-3467

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572620#comment-14572620
 ] 

Hudson commented on YARN-3751:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #948 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/948/])
YARN-3751. Fixed AppInfo to check if used resources are null. Contributed by 
Sunil G. (zjshen: rev dbc4f64937ea2b4c941a3ac49afc4eeba3f5b763)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java


 TestAHSWebServices fails after YARN-3467
 

 Key: YARN-3751
 URL: https://issues.apache.org/jira/browse/YARN-3751
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Sunil G
 Fix For: 2.8.0

 Attachments: 0001-YARN-3751.patch


 YARN-3467 changed AppInfo and assumed that used resource is not null. It's 
 not true as this information is not published to timeline server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3749) We should make a copy of configuration when init MiniYARNCluster with multiple RMs

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572617#comment-14572617
 ] 

Hudson commented on YARN-3749:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #948 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/948/])
YARN-3749. We should make a copy of configuration when init (xgong: rev 
5766a04428f65bb008b5c451f6f09e61e1000300)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceProtocolOnHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/HATestUtil.java


 We should make a copy of configuration when init MiniYARNCluster with 
 multiple RMs
 --

 Key: YARN-3749
 URL: https://issues.apache.org/jira/browse/YARN-3749
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chun Chen
Assignee: Chun Chen
 Fix For: 2.8.0

 Attachments: YARN-3749.2.patch, YARN-3749.3.patch, YARN-3749.4.patch, 
 YARN-3749.5.patch, YARN-3749.6.patch, YARN-3749.7.patch, YARN-3749.7.patch, 
 YARN-3749.patch


 When I was trying to write a test case for YARN-2674, I found DS client 
 trying to connect to both rm1 and rm2 with the same address 0.0.0.0:18032 
 when RM failover. But I initially set 
 yarn.resourcemanager.address.rm1=0.0.0.0:18032, 
 yarn.resourcemanager.address.rm2=0.0.0.0:28032  After digging, I found it is 
 in ClientRMService where the value of yarn.resourcemanager.address.rm2 
 changed to 0.0.0.0:18032. See the following code in ClientRMService:
 {code}
 clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST,
YarnConfiguration.RM_ADDRESS,

 YarnConfiguration.DEFAULT_RM_ADDRESS,
server.getListenerAddress());
 {code}
 Since we use the same instance of configuration in rm1 and rm2 and init both 
 RM before we start both RM, we will change yarn.resourcemanager.ha.id to rm2 
 during init of rm2 and yarn.resourcemanager.ha.id will become rm2 during 
 starting of rm1.
 So I think it is safe to make a copy of configuration when init both of the 
 rm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572611#comment-14572611
 ] 

Hudson commented on YARN-3762:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #948 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/948/])
YARN-3762. FairScheduler: CME on FSParentQueue#getQueueUserAclInfo. (kasha) 
(kasha: rev edb9cd0f7aa1ecaf34afaa120e3d79583e0ec689)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java


 FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
 ---

 Key: YARN-3762
 URL: https://issues.apache.org/jira/browse/YARN-3762
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 2.8.0

 Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch


 In our testing, we ran into the following ConcurrentModificationException:
 {noformat}
 halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0
 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, 
 queueName=root.testyarnpool3, queueCurrentCapacity=0.0, 
 queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0
 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client
 java.util.ConcurrentModificationException: 
 java.util.ConcurrentModificationException
   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
   at java.util.ArrayList$Itr.next(ArrayList.java:851)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572622#comment-14572622
 ] 

Hudson commented on YARN-3585:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #948 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/948/])
YARN-3585. NodeManager cannot exit on SHUTDOWN event triggered and NM recovery 
is enabled. Contributed by Rohith Sharmaks (jlowe: rev 
e13b671aa510f553f4a6a232b4694b6a4cce88ae)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java


 NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
 --

 Key: YARN-3585
 URL: https://issues.apache.org/jira/browse/YARN-3585
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Peng Zhang
Assignee: Rohith
Priority: Critical
 Fix For: 2.7.1

 Attachments: 0001-YARN-3585.patch, YARN-3585.patch


 With NM recovery enabled, after decommission, nodemanager log show stop but 
 process cannot end. 
 non daemon thread:
 {noformat}
 DestroyJavaVM prio=10 tid=0x7f3460011800 nid=0x29ec waiting on 
 condition [0x]
 leveldb prio=10 tid=0x7f3354001800 nid=0x2a97 runnable 
 [0x]
 VM Thread prio=10 tid=0x7f3460167000 nid=0x29f8 runnable 
 Gang worker#0 (Parallel GC Threads) prio=10 tid=0x7f346002 
 nid=0x29ed runnable 
 Gang worker#1 (Parallel GC Threads) prio=10 tid=0x7f3460022000 
 nid=0x29ee runnable 
 Gang worker#2 (Parallel GC Threads) prio=10 tid=0x7f3460024000 
 nid=0x29ef runnable 
 Gang worker#3 (Parallel GC Threads) prio=10 tid=0x7f3460025800 
 nid=0x29f0 runnable 
 Gang worker#4 (Parallel GC Threads) prio=10 tid=0x7f3460027800 
 nid=0x29f1 runnable 
 Gang worker#5 (Parallel GC Threads) prio=10 tid=0x7f3460029000 
 nid=0x29f2 runnable 
 Gang worker#6 (Parallel GC Threads) prio=10 tid=0x7f346002b000 
 nid=0x29f3 runnable 
 Gang worker#7 (Parallel GC Threads) prio=10 tid=0x7f346002d000 
 nid=0x29f4 runnable 
 Concurrent Mark-Sweep GC Thread prio=10 tid=0x7f3460120800 nid=0x29f7 
 runnable 
 Gang worker#0 (Parallel CMS Threads) prio=10 tid=0x7f346011c800 
 nid=0x29f5 runnable 
 Gang worker#1 (Parallel CMS Threads) prio=10 tid=0x7f346011e800 
 nid=0x29f6 runnable 
 VM Periodic Task Thread prio=10 tid=0x7f346019f800 nid=0x2a01 waiting 
 on condition 
 {noformat}
 and jni leveldb thread stack
 {noformat}
 Thread 12 (Thread 0x7f33dd842700 (LWP 10903)):
 #0  0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
 /lib64/libpthread.so.0
 #1  0x7f33dfce2a3b in leveldb::(anonymous 
 namespace)::PosixEnv::BGThreadWrapper(void*) () from 
 /tmp/libleveldbjni-64-1-6922178968300745716.8
 #2  0x003d83407851 in start_thread () from /lib64/libpthread.so.0
 #3  0x003d830e811d in clone () from /lib64/libc.so.6
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3758) The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not working in container

2015-06-04 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572628#comment-14572628
 ] 

Rohith commented on YARN-3758:
--

Had looked into code for CS and FS. The minimum allocation understanding and 
its behavior is different acros CS and FS.
# CS : It is straight forward that if any request with less than 
min-allocation-mb, then the CS normalize the request to min-allocation-mb. And 
containers are allocated with minimum-allocation-mb. 
# FS : if any request with less than min-allocation-mb then the FS normalize 
the request with the factor {{yarn.scheduler.increment-allocation-mb}}. Example 
in description, min-alocation-mb is 256mb, but increment-allocation-mb default 
1024mb which always allocate 1024mb to containers. There is huge effect of 
{{yarn.scheduler.increment-allocation-mb}} which changes the requested memory 
and assign with newly calculated resource.

The behavior is not consistent with CS and FS. I am not sure why there an 
additional configuration introduced in FS? Is it bug ?

 The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not 
 working in container
 

 Key: YARN-3758
 URL: https://issues.apache.org/jira/browse/YARN-3758
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: skrho

 Hello there~~
 I have 2 clusters
 First cluster is 5 node , default 1 application queue, Capacity scheduler, 8G 
 Physical memory each node
 Second cluster is 10 node, 2 application queuey, fair-scheduler, 230G 
 Physical memory each node
 Wherever a mapreduce job is running, I want resourcemanager is to set the 
 minimum memory  256m to container
 So I was changing configuration in yarn-site.xml  mapred-site.xml
 yarn.scheduler.minimum-allocation-mb : 256
 mapreduce.map.java.opts : -Xms256m 
 mapreduce.reduce.java.opts : -Xms256m 
 mapreduce.map.memory.mb : 256 
 mapreduce.reduce.memory.mb : 256 
 In First cluster  whenever a mapreduce job is running , I can see used memory 
 256m in web console( http://installedIP:8088/cluster/nodes )
 But In Second cluster whenever a mapreduce job is running , I can see used 
 memory 1024m in web console( http://installedIP:8088/cluster/nodes ) 
 I know default memory value is 1024m, so if there is not changing memory 
 setting, the default value is working.
 I have been testing for two weeks, but I don't know why mimimum memory 
 setting is not working in second cluster
 Why this difference is happened? 
 Am I wrong setting configuration?
 or Is there bug?
 Thank you for reading~~



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.

2015-06-04 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572584#comment-14572584
 ] 

Junping Du commented on YARN-41:


bq. These findbugs are not related to the patch here.
Agree. Also, the test failure is not related and the same failure also show up 
in other patches, like: YARN-3248. We may should file a separated JIRA to fix 
this. Committing latest patch in.

 The RM should handle the graceful shutdown of the NM.
 -

 Key: YARN-41
 URL: https://issues.apache.org/jira/browse/YARN-41
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Ravi Teja Ch N V
Assignee: Devaraj K
 Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, 
 MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, 
 YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41-7.patch, 
 YARN-41-8.patch, YARN-41.patch


 Instead of waiting for the NM expiry, RM should remove and handle the NM, 
 which is shutdown gracefully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572609#comment-14572609
 ] 

Hudson commented on YARN-41:


FAILURE: Integrated in Hadoop-trunk-Commit #7963 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7963/])
YARN-41. The RM should handle the graceful shutdown of the NM. Contributed by 
Devaraj K. (junping_du: rev d7e7f6aa03c67b6a6ccf664adcb06d90bc963e58)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/client/ResourceTrackerPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClusterMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UnRegisterNodeManagerResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UnRegisterNodeManagerRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/ResourceTracker.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeState.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/ResourceTracker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UnRegisterNodeManagerRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ClusterMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UnRegisterNodeManagerResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/LocalRMInterface.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceTrackerPBServiceImpl.java
* 

[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572618#comment-14572618
 ] 

Hudson commented on YARN-1462:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #948 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/948/])
Revert YARN-1462. Correct fix version from branch-2.7.1 to branch-2.8 in 
(zjshen: rev 4eec2fd132a7c3d100f2124b99ca8cd7befa27c7)
* hadoop-yarn-project/CHANGES.txt
Revert YARN-1462. Made RM write application tags to timeline server and 
exposed them to users via generic history web UI and REST API. Contributed by 
Xuan Gong. (zjshen: rev bc85959eddcb11037e8b9f0e06780b7c3e1cbab6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/NotRunningJob.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestYARNRunner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestApplicatonReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationReport.java


 AHS API and other AHS changes to handle tags for completed MR jobs
 --

 Key: YARN-1462
 URL: https://issues.apache.org/jira/browse/YARN-1462
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Xuan Gong
 Fix For: 2.8.0

 Attachments: YARN-1462-branch-2.7-1.2.patch, 
 YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, 
 YARN-1462.3.patch, YARN-1462.4.patch


 AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3758) The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not working in container

2015-06-04 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572630#comment-14572630
 ] 

Rohith commented on YARN-3758:
--

bq. Is it bug ?
To be clear, is the inconsistent behavior is bug? or implemented intentionally 
for FS?

 The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not 
 working in container
 

 Key: YARN-3758
 URL: https://issues.apache.org/jira/browse/YARN-3758
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: skrho

 Hello there~~
 I have 2 clusters
 First cluster is 5 node , default 1 application queue, Capacity scheduler, 8G 
 Physical memory each node
 Second cluster is 10 node, 2 application queuey, fair-scheduler, 230G 
 Physical memory each node
 Wherever a mapreduce job is running, I want resourcemanager is to set the 
 minimum memory  256m to container
 So I was changing configuration in yarn-site.xml  mapred-site.xml
 yarn.scheduler.minimum-allocation-mb : 256
 mapreduce.map.java.opts : -Xms256m 
 mapreduce.reduce.java.opts : -Xms256m 
 mapreduce.map.memory.mb : 256 
 mapreduce.reduce.memory.mb : 256 
 In First cluster  whenever a mapreduce job is running , I can see used memory 
 256m in web console( http://installedIP:8088/cluster/nodes )
 But In Second cluster whenever a mapreduce job is running , I can see used 
 memory 1024m in web console( http://installedIP:8088/cluster/nodes ) 
 I know default memory value is 1024m, so if there is not changing memory 
 setting, the default value is working.
 I have been testing for two weeks, but I don't know why mimimum memory 
 setting is not working in second cluster
 Why this difference is happened? 
 Am I wrong setting configuration?
 or Is there bug?
 Thank you for reading~~



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3758) The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not working as expected in FairScheduler

2015-06-04 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-3758:
-
Summary: The mininum memory setting(yarn.scheduler.minimum-allocation-mb) 
is not working as expected in FairScheduler  (was: The mininum memory 
setting(yarn.scheduler.minimum-allocation-mb) is not working in container)

 The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not 
 working as expected in FairScheduler
 

 Key: YARN-3758
 URL: https://issues.apache.org/jira/browse/YARN-3758
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: skrho

 Hello there~~
 I have 2 clusters
 First cluster is 5 node , default 1 application queue, Capacity scheduler, 8G 
 Physical memory each node
 Second cluster is 10 node, 2 application queuey, fair-scheduler, 230G 
 Physical memory each node
 Wherever a mapreduce job is running, I want resourcemanager is to set the 
 minimum memory  256m to container
 So I was changing configuration in yarn-site.xml  mapred-site.xml
 yarn.scheduler.minimum-allocation-mb : 256
 mapreduce.map.java.opts : -Xms256m 
 mapreduce.reduce.java.opts : -Xms256m 
 mapreduce.map.memory.mb : 256 
 mapreduce.reduce.memory.mb : 256 
 In First cluster  whenever a mapreduce job is running , I can see used memory 
 256m in web console( http://installedIP:8088/cluster/nodes )
 But In Second cluster whenever a mapreduce job is running , I can see used 
 memory 1024m in web console( http://installedIP:8088/cluster/nodes ) 
 I know default memory value is 1024m, so if there is not changing memory 
 setting, the default value is working.
 I have been testing for two weeks, but I don't know why mimimum memory 
 setting is not working in second cluster
 Why this difference is happened? 
 Am I wrong setting configuration?
 or Is there bug?
 Thank you for reading~~



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3733) Fix DominantRC#compare() does not work as expected if cluster resource is empty

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573512#comment-14573512
 ] 

Hudson commented on YARN-3733:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7970 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7970/])
Add missing test file of YARN-3733 (wangda: rev 
405bbcf68c32d8fd8a83e46e686eacd14e5a533c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/resource/TestResourceCalculator.java


 Fix DominantRC#compare() does not work as expected if cluster resource is 
 empty
 ---

 Key: YARN-3733
 URL: https://issues.apache.org/jira/browse/YARN-3733
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3 , 2 NM , 2 RM
 one NM - 3 GB 6 v core
Reporter: Bibin A Chundatt
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, 
 0002-YARN-3733.patch, YARN-3733.patch


 Steps to reproduce
 =
 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster)
 2. Configure map and reduce size to 512 MB  after changing scheduler minimum 
 size to 512 MB
 3. Configure capacity scheduler and AM limit to .5 
 (DominantResourceCalculator is configured)
 4. Submit 30 concurrent task 
 5. Switch RM
 Actual
 =
 For 12 Jobs AM gets allocated and all 12 starts running
 No other Yarn child is initiated , *all 12 Jobs in Running state for ever*
 Expected
 ===
 Only 6 should be running at a time since max AM allocated is .5 (3072 MB)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS

2015-06-04 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573515#comment-14573515
 ] 

Zhijie Shen commented on YARN-3044:
---

I'm not sure because as far as I can tell, NM's impl is different from RM's, 
but it's up to you to figure out the proper solution:-)

 [Event producers] Implement RM writing app lifecycle events to ATS
 --

 Key: YARN-3044
 URL: https://issues.apache.org/jira/browse/YARN-3044
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3044-YARN-2928.004.patch, 
 YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch, 
 YARN-3044-YARN-2928.007.patch, YARN-3044-YARN-2928.008.patch, 
 YARN-3044-YARN-2928.009.patch, YARN-3044-YARN-2928.010.patch, 
 YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, 
 YARN-3044.20150416-1.patch


 Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN

2015-06-04 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573468#comment-14573468
 ] 

kassiano josé matteussi commented on YARN-2139:
---

Dears, 

I have studied resource management under Hadoop applications running wrapped in 
Linux containers and I have faced troubles to restrict disk I/O with cgroups 
(bps_write, bps_read). 

Does anybody know if it is possible to do so?

I have heard that limiting I/O with cgroups is restricted to synchronous 
writing (SYNC) and that is why it wouldn't work well with Hadoop + HDFS. Is 
this still true in more recent kernel implementation?

Best Regards,
Kassiano

 [Umbrella] Support for Disk as a Resource in YARN 
 --

 Key: YARN-2139
 URL: https://issues.apache.org/jira/browse/YARN-2139
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Wei Yan
 Attachments: Disk_IO_Isolation_Scheduling_3.pdf, 
 Disk_IO_Scheduling_Design_1.pdf, Disk_IO_Scheduling_Design_2.pdf, 
 YARN-2139-prototype-2.patch, YARN-2139-prototype.patch


 YARN should consider disk as another resource for (1) scheduling tasks on 
 nodes, (2) isolation at runtime, (3) spindle locality. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3768) Index out of range exception with environment variables without values

2015-06-04 Thread Joe Ferner (JIRA)
Joe Ferner created YARN-3768:


 Summary: Index out of range exception with environment variables 
without values
 Key: YARN-3768
 URL: https://issues.apache.org/jira/browse/YARN-3768
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.5.0
Reporter: Joe Ferner


Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range 
exception occurs if an environment variable is encountered without a value.

I believe this occurs because java will not return empty strings from the split 
method. Similar to this 
http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)