date:20130913

Karthik Kambatla created YARN-1192:
--

 Summary: Update HAServiceState to STOPPING on RM#stop()
 Key: YARN-1192
 URL: https://issues.apache.org/jira/browse/YARN-1192
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


Post HADOOP-9945, we should update HAServiceState in RMHAProtocolService to 
STOPPING on stop().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1185) FileSystemRMStateStore can leave partial files that prevent subsequent recovery

[
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Lowe updated YARN-1185:
-

Summary: FileSystemRMStateStore can leave partial files that prevent
subsequent recovery (was: FileSystemRMStateStore doesn't use temporary files
when writing data)

bq. The RM will not start if there is anything wrong with the stored state. So
it some write is partial/empty is will not start.

The concern I have about that approach is it requires manual intervention from
ops when there is a problem, and the current scheme can lead to that situation
occurring because the RM can crash at arbitrary points. I think the RM should
try to prevent that situation from occurring and/or have the ability to
automatically recover from that situation if it does occur. The RM could skip
the corrupted info and continue if the info is deemed not critical to the
overall recovery process. Then we're only involving ops if the corruption is
very serious.

{quote}
So we could do the following.
Storing app data may continue to be optimistic and since thats the main
workload we continue to do what we do today.
Storing global data (mainly the security stuff) can change to be more atomic.
{quote}

That sounds reasonable, especially if the RM is more robust during recovery. I
understand it's a tradeoff between reliability and performance, especially with
the RPC overhead when talking to HDFS and the potentially high rate of state
churn.

Thanks for the informative discussion, [~bikassaha]! Updating the summary to
better reflect the problem and not a particular solution.

FileSystemRMStateStore can leave partial files that prevent subsequent
recovery
---

Key: YARN-1185
URL: https://issues.apache.org/jira/browse/YARN-1185
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe

FileSystemRMStateStore writes directly to the destination file when storing
state. However if the RM were to crash in the middle of the write, the
recovery method could encounter a partially-written file and either outright
crash during recovery or silently load incomplete state.
To avoid this, the data should be written to a temporary file and renamed to
the destination file afterwards.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1024) Define a virtual core unambigiously

2013-09-13 Thread Eli Collins (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766124#comment-13766124
]

Eli Collins commented on YARN-1024:
---

bq. keeping virtual cores to express parallelism sounds good as it is clear it
is not a real core.

Hm, I read this the other way. If a framework asks for three vcores on a host
it intends to run some code on three real physical cores at the same time. If a
long lived framework wants to reserve 2 cores per host it would ask for 2 cores
(and 100% YCU per core).

Sandy's proposal, switching to cores and YCU instead of just vcores, is
equivalent to the proposal above of getting rid of vcores and supporting
fractional cores. A vcore becomes a core and YCU is just a way to express
that you want a fraction of a core. Sounds good to me.

Define a virtual core unambigiously
---

Key: YARN-1024
URL: https://issues.apache.org/jira/browse/YARN-1024
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Arun C Murthy
Assignee: Arun C Murthy
Attachments: CPUasaYARNresource.pdf

We need to clearly define the meaning of a virtual core unambiguously so that
it's easy to migrate applications between clusters.
For e.g. here is Amazon EC2 definition of ECU:
http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it
Essentially we need to clearly define a YARN Virtual Core (YVC).
Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the
equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.*

[jira] [Created] (YARN-1193) ResourceManger.clusterTimeStamp should be reset when RM transitions to active

Karthik Kambatla created YARN-1193:
--

 Summary: ResourceManger.clusterTimeStamp should be reset when RM 
transitions to active
 Key: YARN-1193
 URL: https://issues.apache.org/jira/browse/YARN-1193
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


ResourceManager.clusterTimeStamp is used to generate application-ids. 

Currently, when the RM transitions to active-standby-active back and forth, the 
clusterTimeStamp stays the same leading to apps getting the same ids as jobs 
from before. This leads to other races in staging directory etc.

To avoid this, it is better to set it on every transition to Active.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes

2013-09-13 Thread Junping Du (JIRA)

[
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766141#comment-13766141
]

Junping Du commented on YARN-311:
-

Thanks Luke for review and comments!

Dynamic node resource configuration: core scheduler changes
---

Key: YARN-311
URL: https://issues.apache.org/jira/browse/YARN-311
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
Attachments: YARN-311-v1.patch, YARN-311-v2.patch, YARN-311-v3.patch,
YARN-311-v4.patch, YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch,
YARN-311-v6.2.patch, YARN-311-v6.patch

As the first step, we go for resource change on RM side and expose admin APIs
(admin protocol, CLI, REST and JMX API) later. In this jira, we will only
contain changes in scheduler.
The flow to update node's resource and awareness in resource scheduling is:
1. Resource update is through admin API to RM and take effect on RMNodeImpl.
2. When next NM heartbeat for updating status comes, the RMNode's resource
change will be aware and the delta resource is added to schedulerNode's
availableResource before actual scheduling happens.
3. Scheduler do resource allocation according to new availableResource in
SchedulerNode.
For more design details, please refer proposal and discussions in parent
JIRA: YARN-291.

[jira] [Created] (YARN-1186) Add support for simulating several important behaviors in the MRAM to yarn scheduler simulator

Wei Yan created YARN-1186:
-

 Summary: Add support for simulating several important behaviors in 
the MRAM to yarn scheduler simulator
 Key: YARN-1186
 URL: https://issues.apache.org/jira/browse/YARN-1186
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan


Add support for simulating some important behaviors in the MRAM (such as 
slowstart, headroom, etc) to the Yarn scheduler load simulator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1027) Implement RMHAProtocolService


 [ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1027:
---

Attachment: yarn-1027-7.patch

 Implement RMHAProtocolService
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: test-yarn-1027.patch, yarn-1027-1.patch, 
 yarn-1027-2.patch, yarn-1027-3.patch, yarn-1027-4.patch, yarn-1027-5.patch, 
 yarn-1027-6.patch, yarn-1027-7.patch, yarn-1027-including-yarn-1098-3.patch, 
 yarn-1027-in-rm-poc.patch


 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1027) Implement RMHAProtocolService

[
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766162#comment-13766162
]

Karthik Kambatla commented on YARN-1027:

bq. Should createAndInit/Start/Stop methods in RM be synchronized? Can they
race with other activity in the RM happening on the dispatcher thread?
stop() is equivalent to stopping the RM previously. createAndInit/start also
don't change the behavior in any way. Their callers themselves are
synchronized, so I don't see the need to synchronize these as well.

bq. Was getClusterTimeStamp() addition necessary? Its good to keep refactorings
separate.
Filed another subtask under YARN-149 to address this.

Fixed all other comments.

Running a pseudo-cluster with YARN-1068 applied - will update here on the
scenarios shortly.

Implement RMHAProtocolService
-

Key: YARN-1027
URL: https://issues.apache.org/jira/browse/YARN-1027
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
Attachments: test-yarn-1027.patch, yarn-1027-1.patch,
yarn-1027-2.patch, yarn-1027-3.patch, yarn-1027-4.patch, yarn-1027-5.patch,
yarn-1027-6.patch, yarn-1027-7.patch, yarn-1027-including-yarn-1098-3.patch,
yarn-1027-in-rm-poc.patch

Implement existing HAServiceProtocol from Hadoop common. This protocol is the
single point of interaction between the RM and HA clients/services.

[jira] [Created] (YARN-1187) Add discrete event-based simulation to yarn scheduler simulator

Wei Yan created YARN-1187:
-

 Summary: Add discrete event-based simulation to yarn scheduler 
simulator
 Key: YARN-1187
 URL: https://issues.apache.org/jira/browse/YARN-1187
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan


Follow the discussion in YARN-1021.

Discrete event simulation decouples the running from any real-world clock. This 
allows users to step through the execution, set debug points, and definitely 
get a deterministic rexec. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1027) Implement RMHAProtocolService

[
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766174#comment-13766174
]

Hadoop QA commented on YARN-1027:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12602941/yarn-1027-7.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 1 new
or modified test files.

{color:red}-1 javac{color:red}. The patch appears to cause the build to
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1908//console

This message is automatically generated.

Implement RMHAProtocolService
-

Implement existing HAServiceProtocol from Hadoop common. This protocol is the
single point of interaction between the RM and HA clients/services.

[jira] [Created] (YARN-1194) TestContainerLogsPage test fails on trunk

2013-09-13 Thread Roman Shaposhnik (JIRA)

Roman Shaposhnik created YARN-1194:
--

 Summary: TestContainerLogsPage test fails on trunk
 Key: YARN-1194
 URL: https://issues.apache.org/jira/browse/YARN-1194
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
Priority: Minor


Running TestContainerLogsPage on trunk while Native IO is enabled makes it fail

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1027) Implement RMHAProtocolService


 [ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1027:
---

Attachment: yarn-1027-7.patch

 Implement RMHAProtocolService
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: test-yarn-1027.patch, yarn-1027-1.patch, 
 yarn-1027-2.patch, yarn-1027-3.patch, yarn-1027-4.patch, yarn-1027-5.patch, 
 yarn-1027-6.patch, yarn-1027-7.patch, yarn-1027-7.patch, 
 yarn-1027-including-yarn-1098-3.patch, yarn-1027-in-rm-poc.patch


 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-905) Add state filters to nodes CLI


[ 
https://issues.apache.org/jira/browse/YARN-905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13765778#comment-13765778
 ] 

Wei Yan commented on YARN-905:
--

[~sandyr], [~vinodkv]
Confused last time. I'll close YARN-1126 and update a patch supporting new 
features here.

 Add state filters to nodes CLI
 --

 Key: YARN-905
 URL: https://issues.apache.org/jira/browse/YARN-905
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Wei Yan
 Attachments: Yarn-905.patch, YARN-905.patch, YARN-905.patch


 It would be helpful for the nodes CLI to have a node-states option that 
 allows it to return nodes that are not just in the RUNNING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1193) ResourceManger.clusterTimeStamp should be reset when RM transitions to active


 [ 
https://issues.apache.org/jira/browse/YARN-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1193:
---

Attachment: yarn-1193-1.patch

Straight-forward patch.

 ResourceManger.clusterTimeStamp should be reset when RM transitions to active
 -

 Key: YARN-1193
 URL: https://issues.apache.org/jira/browse/YARN-1193
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-1193-1.patch


 ResourceManager.clusterTimeStamp is used to generate application-ids. 
 Currently, when the RM transitions to active-standby-active back and forth, 
 the clusterTimeStamp stays the same leading to apps getting the same ids as 
 jobs from before. This leads to other races in staging directory etc.
 To avoid this, it is better to set it on every transition to Active.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1130) Improve the log flushing for tasks when mapred.userlog.limit.kb is set

2013-09-13 Thread Paul Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Han updated YARN-1130:
---

Attachment: YARN-1130.patch

 Improve the log flushing for tasks when mapred.userlog.limit.kb is set
 --

 Key: YARN-1130
 URL: https://issues.apache.org/jira/browse/YARN-1130
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.5-alpha
Reporter: Paul Han
 Attachments: YARN-1130.patch


 When userlog limit is set with something like this:
 {code}
 property
 namemapred.userlog.limit.kb/name
 value2048/value
 descriptionThe maximum size of user-logs of each task in KB. 0 disables the 
 cap.
 /description
 /property
 {code}
 the log entry will be truncated randomly for the jobs.
 The log size is left between 1.2MB to 1.6MB.
 Since the log is already limited, avoid the log truncation is crucial for 
 user.
 The other issue with the current 
 impl(org.apache.hadoop.yarn.ContainerLogAppender) is that log entries will 
 not flush to file until the container shutdown and logmanager close all 
 appenders. If user likes to see the log during task execution, it doesn't 
 support it.
 Will propose a patch to add a flush mechanism and also flush the log when 
 task is done.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1027) Implement RMHAProtocolService


[ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766203#comment-13766203
 ] 

Hadoop QA commented on YARN-1027:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12602948/yarn-1027-7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1909//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1909//console

This message is automatically generated.

 Implement RMHAProtocolService
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: test-yarn-1027.patch, yarn-1027-1.patch, 
 yarn-1027-2.patch, yarn-1027-3.patch, yarn-1027-4.patch, yarn-1027-5.patch, 
 yarn-1027-6.patch, yarn-1027-7.patch, yarn-1027-7.patch, 
 yarn-1027-including-yarn-1098-3.patch, yarn-1027-in-rm-poc.patch


 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1189) NMTokenSecretManagerInNM is not being told when applications have finished


[ 
https://issues.apache.org/jira/browse/YARN-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766208#comment-13766208
 ] 

Omkar Vinit Joshi commented on YARN-1189:
-

Yes this is clearly a leak... I had this locally when I was working on 
it..somehow missed the change.. It should be called when application completely 
finishes..Attaching a quick patch...

 NMTokenSecretManagerInNM is not being told when applications have finished 
 ---

 Key: YARN-1189
 URL: https://issues.apache.org/jira/browse/YARN-1189
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi

 The {{appFinished}} method is not being called when applications have 
 finished.  This causes a couple of leaks as {{oldMasterKeys}} and 
 {{appToAppAttemptMap}} are never being pruned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-1189) NMTokenSecretManagerInNM is not being told when applications have finished


 [ 
https://issues.apache.org/jira/browse/YARN-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi reassigned YARN-1189:
---

Assignee: Omkar Vinit Joshi

 NMTokenSecretManagerInNM is not being told when applications have finished 
 ---

 Key: YARN-1189
 URL: https://issues.apache.org/jira/browse/YARN-1189
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi

 The {{appFinished}} method is not being called when applications have 
 finished.  This causes a couple of leaks as {{oldMasterKeys}} and 
 {{appToAppAttemptMap}} are never being pruned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1189) NMTokenSecretManagerInNM is not being told when applications have finished


 [ 
https://issues.apache.org/jira/browse/YARN-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1189:


Attachment: YARN-1189-20130912.1.patch

 NMTokenSecretManagerInNM is not being told when applications have finished 
 ---

 Key: YARN-1189
 URL: https://issues.apache.org/jira/browse/YARN-1189
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1189-20130912.1.patch


 The {{appFinished}} method is not being called when applications have 
 finished.  This causes a couple of leaks as {{oldMasterKeys}} and 
 {{appToAppAttemptMap}} are never being pruned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-867) Isolation of failures in aux services

[
https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13765878#comment-13765878
]

Zhijie Shen commented on YARN-867:
--

The next task is to handle the throwable from AuxiliaryService. In previous
thread, what we plan to do is to fail the container directly, and let the AM
know that the container is failed due to AUXSERVICE_FAILED. For MR, it may be
okay, because without ShuffleHandler, MR jobs cannot run properly. However,
should NM always make the decision to fail the container? I'm concerned that:
1. NM doesn't know what the AuxiliaryService serves the application and how
important it is.
2. NM doesn't know how critical the exception is, or whether it is transit or
reproducible.
Therefore, if the application can toleran

Isolation of failures in aux services
--

Key: YARN-867
URL: https://issues.apache.org/jira/browse/YARN-867
Project: Hadoop YARN
Issue Type: Bug
Components: nodemanager
Reporter: Hitesh Shah
Assignee: Xuan Gong
Priority: Critical
Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch,
YARN-867.4.patch, YARN-867.sampleCode.2.patch

Today, a malicious application can bring down the NM by sending bad data to a
service. For example, sending data to the ShuffleService such that it results
any non-IOException will cause the NM's async dispatcher to exit as the
service's INIT APP event is not handled properly.

[jira] [Updated] (YARN-1183) MiniYARNCluster shutdown takes several minutes intermittently


 [ 
https://issues.apache.org/jira/browse/YARN-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Klochkov updated YARN-1183:
--

Attachment: YARN-1183--n2.patch

Attaching an updated patch. Updated the name of the wait method. Changed the 
way it gets notifications when app masters get registered/unregistered so now 
ApplicationAttemptId is used as the key.

 MiniYARNCluster shutdown takes several minutes intermittently
 -

 Key: YARN-1183
 URL: https://issues.apache.org/jira/browse/YARN-1183
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Andrey Klochkov
 Attachments: YARN-1183--n2.patch, YARN-1183.patch


 As described in MAPREDUCE-5501 sometimes M/R tests leave MRAppMaster java 
 processes living for several minutes after successful completion of the 
 corresponding test. There is a concurrency issue in MiniYARNCluster shutdown 
 logic which leads to this. Sometimes RM stops before an app master sends it's 
 last report, and then the app master keeps retrying for 6 minutes. In some 
 cases it leads to failures in subsequent tests, and it affects performance of 
 tests as app masters eat resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1189) NMTokenSecretManagerInNM is not being told when applications have finished


[ 
https://issues.apache.org/jira/browse/YARN-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766221#comment-13766221
 ] 

Hadoop QA commented on YARN-1189:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12602954/YARN-1189-20130912.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.TestApplication
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalResourcesTrackerImpl

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1910//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1910//console

This message is automatically generated.

 NMTokenSecretManagerInNM is not being told when applications have finished 
 ---

 Key: YARN-1189
 URL: https://issues.apache.org/jira/browse/YARN-1189
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1189-20130912.1.patch


 The {{appFinished}} method is not being called when applications have 
 finished.  This causes a couple of leaks as {{oldMasterKeys}} and 
 {{appToAppAttemptMap}} are never being pruned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1068) Add admin support for HA operations


 [ 
https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1068:
---

Attachment: yarn-1068-1.patch

Updated patch to capture the updates to YARN-1027.

 Add admin support for HA operations
 ---

 Key: YARN-1068
 URL: https://issues.apache.org/jira/browse/YARN-1068
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-1068-1.patch, yarn-1068-prelim.patch


 To transitionTo{Active,Standby} etc. we should support admin operations the 
 same way DFS does.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-1184) ClassCastException is thrown during preemption When a huge job is submitted to a queue B whose resources is used by a job in queueA

2013-09-13 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K reassigned YARN-1184:
---

Assignee: Devaraj K

 ClassCastException is thrown during preemption When a huge job is submitted 
 to a queue B whose resources is used by a job in queueA
 ---

 Key: YARN-1184
 URL: https://issues.apache.org/jira/browse/YARN-1184
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: J.Andreina
Assignee: Devaraj K

 preemption is enabled.
 Queue = a,b
 a capacity = 30%
 b capacity = 70%
 Step 1: Assign a big job to queue a ( so that job_a will utilize some 
 resources from queue b)
 Step 2: Assigne a big job to queue b.
 Following exception is thrown at Resource Manager
 {noformat}
 2013-09-12 10:42:32,535 ERROR [SchedulingMonitor 
 (ProportionalCapacityPreemptionPolicy)] yarn.YarnUncaughtExceptionHandler 
 (YarnUncaughtExceptionHandler.java:uncaughtException(68)) - Thread 
 Thread[SchedulingMonitor (ProportionalCapacityPreemptionPolicy),5,main] threw 
 an Exception.
 java.lang.ClassCastException: java.util.Collections$UnmodifiableSet cannot be 
 cast to java.util.NavigableSet
   at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.getContainersToPreempt(ProportionalCapacityPreemptionPolicy.java:403)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:202)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:72)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PreemptionChecker.run(SchedulingMonitor.java:82)
   at java.lang.Thread.run(Thread.java:662)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-867) Isolation of failures in aux services

[
https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13765884#comment-13765884
]

Zhijie Shen commented on YARN-867:
--

Sorry to post the broken comment before.

Think about the problem again. Essentially, problem is the implementation of
AuxiliaryService may throw RuntimeException (or other Throwable), and fail the
thread of NM dispatcher. Wrapping the calling statements with try/catch can
basically prevent NM failure.
The next task is to handle the throwable from AuxiliaryService. In previous
thread, what we plan to do is to fail the container directly, and let the AM
know that the container is failed due to AUXSERVICE_FAILED. For MR, it may be
okay, because without ShuffleHandler, MR jobs cannot run properly. However,
should NM always make the decision to fail the container? I'm concerned that:
1. NM doesn't know what the AuxiliaryService serves the application and how
important it is.
2. NM doesn't know how critical the exception is, or whether it is transit or
reproducible.
Therefore, if the application can tolerant the AuxiliaryService failure? For
example, if the AuxiliaryService just does some node-local monitoring work, the
application can complete with the AuxiliaryService not working. Therefore, I'm
wondering whether we should leave the decision to the AM. The application knows
how to handle the exception best. NM just need to exposure the failure of the
AuxiliaryService to the application in some method. Thoughts?

Isolation of failures in aux services
--

[jira] [Commented] (YARN-1027) Implement RMHAProtocolService


[ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766227#comment-13766227
 ] 

Karthik Kambatla commented on YARN-1027:


For manual testing, I applied the patches yarn-1027-7.patch, yarn-1193-1.patch 
(update clusterTimeStamp on transitionToActive), and yarn-1068-1.patch (support 
for admin commands). Testing steps:
# *Start the RM*. Verified it was in Standby mode - Log, webui check, netstat 
for ports check. jmap histo showed 168465 objects with 19476960 bytes.
# *Transition to Active*. Verified it was in Active mode - Log, webui, NM 
connected, netstat for ports check. jmap histo showed 253430 objects with 
31171544 bytes.
# *Run MR pi job*. Job finished successfully. WebUI worked as expected. jmap 
histo showed 288406 objects with 35726096 bytes.
# *Transition to Standby*. Verified it was in Standby mode. jmap histo showed 
282392 objects with 33975600 bytes. 
# Repeated steps 2, 3, 4 once more.
# *Stop the RM*. RM stopped as expected and the logs didn't show any untoward 
exceptions etc.

 Implement RMHAProtocolService
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: test-yarn-1027.patch, yarn-1027-1.patch, 
 yarn-1027-2.patch, yarn-1027-3.patch, yarn-1027-4.patch, yarn-1027-5.patch, 
 yarn-1027-6.patch, yarn-1027-7.patch, yarn-1027-7.patch, 
 yarn-1027-including-yarn-1098-3.patch, yarn-1027-in-rm-poc.patch


 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1183) MiniYARNCluster shutdown takes several minutes intermittently

[
https://issues.apache.org/jira/browse/YARN-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766237#comment-13766237
]

Andrey Klochkov commented on YARN-1183:
---

bq. MiniYARNCluster is used by several tests. This might bite us if and when we
run tests parallely.
Concurrency level won't make any difference even with that. BTW I'm actually
running MR tests in parallel now. That's when this issue with cluster shutdown
working incorrectly becomes more evident.

Thanks for catching the thing with synchronized block, fixing it.

MiniYARNCluster shutdown takes several minutes intermittently
-

Key: YARN-1183
URL: https://issues.apache.org/jira/browse/YARN-1183
Project: Hadoop YARN
Issue Type: Bug
Reporter: Andrey Klochkov
Attachments: YARN-1183--n2.patch, YARN-1183.patch

As described in MAPREDUCE-5501 sometimes M/R tests leave MRAppMaster java
processes living for several minutes after successful completion of the
corresponding test. There is a concurrency issue in MiniYARNCluster shutdown
logic which leads to this. Sometimes RM stops before an app master sends it's
last report, and then the app master keeps retrying for 6 minutes. In some
cases it leads to failures in subsequent tests, and it affects performance of
tests as app masters eat resources.

[jira] [Created] (YARN-1195) RM may relaunch already KILLED / FAILED jobs after RM restarts

Jian He created YARN-1195:
-

 Summary: RM may relaunch already KILLED / FAILED jobs after RM 
restarts
 Key: YARN-1195
 URL: https://issues.apache.org/jira/browse/YARN-1195
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He


Just like YARN-540, RM restarts after job killed/failed , but before App state 
info is cleaned from store. the next time RM comes back, it will relaunch the 
job again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1194) TestContainerLogsPage test fails on trunk

2013-09-13 Thread Roman Shaposhnik (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Shaposhnik updated YARN-1194:
---

Attachment: YARN-1194.patch.txt

 TestContainerLogsPage test fails on trunk
 -

 Key: YARN-1194
 URL: https://issues.apache.org/jira/browse/YARN-1194
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
Priority: Minor
 Attachments: YARN-1194.patch.txt


 Running TestContainerLogsPage on trunk while Native IO is enabled makes it 
 fail

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1183) MiniYARNCluster shutdown takes several minutes intermittently

[
https://issues.apache.org/jira/browse/YARN-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13765904#comment-13765904
]

Karthik Kambatla commented on YARN-1183:

bq. It may be a non necessary optimization in the testing code
MiniYARNCluster is used by several tests. This might bite us if and when we run
tests parallely.

bq. Can you advice on how to get ApplicationID from
RegisterApplicationMasterRequest/RegisterApplicationMasterResponse?
Here, using host:port should be good - only a single application runs on the
host:port at any point.

Also, in the following code, even the while() should also be in the
synchronized block. Otherwise, it is possible to loose notifications and wait
longer than needed.
{code}
while (!appMasters.isEmpty() System.currentTimeMillis() - started
timeoutMillis) {
synchronized (appMasters) {
appMasters.wait(1000);
}
}
{code}

MiniYARNCluster shutdown takes several minutes intermittently
-

Key: YARN-1183
URL: https://issues.apache.org/jira/browse/YARN-1183
Project: Hadoop YARN
Issue Type: Bug
Reporter: Andrey Klochkov
Attachments: YARN-1183--n2.patch, YARN-1183.patch

[jira] [Commented] (YARN-1194) TestContainerLogsPage test fails on trunk


[ 
https://issues.apache.org/jira/browse/YARN-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766245#comment-13766245
 ] 

Hadoop QA commented on YARN-1194:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12602959/YARN-1194.patch.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1911//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1911//console

This message is automatically generated.

 TestContainerLogsPage test fails on trunk
 -

 Key: YARN-1194
 URL: https://issues.apache.org/jira/browse/YARN-1194
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
Priority: Minor
 Fix For: 2.1.1-beta

 Attachments: YARN-1194.patch.txt


 Running TestContainerLogsPage on trunk while Native IO is enabled makes it 
 fail

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart


 [ 
https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-540:
-

Attachment: YARN-540.7.patch

 Race condition causing RM to potentially relaunch already unregistered AMs on 
 RM restart
 

 Key: YARN-540
 URL: https://issues.apache.org/jira/browse/YARN-540
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.3.patch, 
 YARN-540.4.patch, YARN-540.5.patch, YARN-540.6.patch, YARN-540.7.patch, 
 YARN-540.7.patch, YARN-540.patch, YARN-540.patch


 When job succeeds and successfully call finishApplicationMaster, RM shutdown 
 and restart-dispatcher is stopped before it can process REMOVE_APP event. The 
 next time RM comes back, it will reload the existing state files even though 
 the job is succeeded

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart

[
https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766274#comment-13766274
]

Hadoop QA commented on YARN-540:

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12602971/YARN-540.7.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 3 new
or modified test files.

{color:red}-1 javac{color:red}. The patch appears to cause the build to
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1912//console

This message is automatically generated.

Race condition causing RM to potentially relaunch already unregistered AMs on
RM restart

Key: YARN-540
URL: https://issues.apache.org/jira/browse/YARN-540
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.3.patch,
YARN-540.4.patch, YARN-540.5.patch, YARN-540.6.patch, YARN-540.7.patch,
YARN-540.7.patch, YARN-540.patch, YARN-540.patch

When job succeeds and successfully call finishApplicationMaster, RM shutdown
and restart-dispatcher is stopped before it can process REMOVE_APP event. The
next time RM comes back, it will reload the existing state files even though
the job is succeeded

[jira] [Updated] (YARN-1191) [YARN-321] Update artifact versions for application history service

2013-09-13 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-1191:


Summary: [YARN-321] Update artifact versions for application history 
service  (was: [YARN-321] Compilation is failing for YARN-321 branch)

 [YARN-321] Update artifact versions for application history service
 ---

 Key: YARN-1191
 URL: https://issues.apache.org/jira/browse/YARN-1191
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1191-1.patch


 Compilation is failing for YARN-321 branch

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1191) [YARN-321] Update artifact versions for application history service

2013-09-13 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766276#comment-13766276
 ] 

Devaraj K commented on YARN-1191:
-

+1, Patch looks good to me.

 [YARN-321] Update artifact versions for application history service
 ---

 Key: YARN-1191
 URL: https://issues.apache.org/jira/browse/YARN-1191
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1191-1.patch


 Compilation is failing for YARN-321 branch

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-1196) LocalDirsHandlerService never change failedDirs back to normal even when these disks turn good

2013-09-13 Thread Nemon Lou (JIRA)

Nemon Lou created YARN-1196:
---

 Summary: LocalDirsHandlerService never change failedDirs back to 
normal even when these disks turn good
 Key: YARN-1196
 URL: https://issues.apache.org/jira/browse/YARN-1196
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Nemon Lou


A simple way to reproduce it:
1,change access mode of one node manager's local-dirs to 000
After a few seconds,this node manager will become unhealthy.
2,change access mode of one node manager's local-dirs back to normal.
The node manager is still unhealthy with all local-dirs in bad state even after 
a long time.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1196) LocalDirsHandlerService never change failedDirs back to normal even when these disks turn good

2013-09-13 Thread Nemon Lou (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nemon Lou updated YARN-1196:

Description:
A simple way to reproduce it:
1,change access mode of one node manager's local-dirs to 000
After a few seconds,this node manager will become unhealthy.
2,change access mode of the node manager's local-dirs back to normal.
The node manager is still unhealthy with all local-dirs in bad state even after
a long time.

was:
A simple way to reproduce it:
1,change access mode of one node manager's local-dirs to 000
After a few seconds,this node manager will become unhealthy.
2,change access mode of one node manager's local-dirs back to normal.
The node manager is still unhealthy with all local-dirs in bad state even after
a long time.

LocalDirsHandlerService never change failedDirs back to normal even when
these disks turn good
--

Key: YARN-1196
URL: https://issues.apache.org/jira/browse/YARN-1196
Project: Hadoop YARN
Issue Type: Bug
Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Nemon Lou

A simple way to reproduce it:
1,change access mode of one node manager's local-dirs to 000
After a few seconds,this node manager will become unhealthy.
2,change access mode of the node manager's local-dirs back to normal.
The node manager is still unhealthy with all local-dirs in bad state even
after a long time.

[jira] [Commented] (YARN-1197) Add container merge support in YARN

2013-09-13 Thread Wangda Tan (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766343#comment-13766343
]

Wangda Tan commented on YARN-1197:
--

I don't know is it possible to add this in RM or NM side.
And I think it should be easier to move some existing applications (OpenMPI,
PBS, etc.) to YARN platform, because such application should have their own
daemons in old implementation, and container merge can be helpful to leverage
their original logic with less modifications to be a resident of YARN :)
Welcome your suggestions and comments!
--
Thanks,
Wangda

Add container merge support in YARN
---

Key: YARN-1197
URL: https://issues.apache.org/jira/browse/YARN-1197
Project: Hadoop YARN
Issue Type: Task
Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan

Currently, YARN cannot support merge several containers in one node to a big
container, which can make us incrementally ask resources, merge them to a
bigger one, and launch our processes. The user scenario is,
In some applications (like OpenMPI) has their own daemons in each node (one
for each node) in their original implementation, and their user's processes
are directly launched by its local daemon (like task-tracker in MRv1, but
it's per-application). Many functionalities are depended on the pipes created
when a process forked by its father, like IO-forwarding, process monitoring
(it will do more logic than what NM did for us) and may cause some
scalability issues.
A very common resource request in MPI world is, give me 100G memory in the
cluster, I will launch 100 processes in this resource. In current YARN, we
have following two choices to make this happen,
1) Send allocation request with 1G memory iteratively, until we got 100G
memories in total. Then ask NM launch such 100 MPI processes. That will cause
some problems like cannot support IO-forwarding, processes monitoring, etc.
as mentioned above.
2) Send a larger resource request, like 10G. But we may encounter following
problems,
2.1 Such a large resource request is hard to get at one time.
2.2 We cannot use other resources more than the number we specified in the
node (we can only launch one daemon in one node).
2.3 Hard to decide how much resource to ask.
So my proposal is,
1) We can incrementally send resource request with small resources like
before, until we get enough resources in total
2) Merge resource in the same node, make only one big container in each node
3) Launch daemons in each node, and the daemon will spawn its local processes
and manage them.
For example,
We need to run 10 processes, 1G for each, finally we got
container 1, 2, 3, 4, 5 in node1.
container 6, 7, 8 in node2.
container 9, 10 in node3.
Then we will,
merge [1, 2, 3, 4, 5] to container_11 with 5G, launch a daemon, and the
daemon will launch 5 processes
merge [6, 7, 8] to container_12 with 3G, launch a daemon, and the daemon will
launch 3 processes
merge [9, 10] to container_13 with 2G, launch a daemon, and the daemon will
launch 2 processes

[jira] [Created] (YARN-1197) Add container merge support in YARN

2013-09-13 Thread Wangda Tan (JIRA)

Wangda Tan created YARN-1197:


 Summary: Add container merge support in YARN
 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan


Currently, YARN cannot support merge several containers in one node to a big 
container, which can make us incrementally ask resources, merge them to a 
bigger one, and launch our processes. The user scenario is,

In some applications (like OpenMPI) has their own daemons in each node (one for 
each node) in their original implementation, and their user's processes are 
directly launched by its local daemon (like task-tracker in MRv1, but it's 
per-application). Many functionalities are depended on the pipes created when a 
process forked by its father, like IO-forwarding, process monitoring (it will 
do more logic than what NM did for us) and may cause some scalability issues.

A very common resource request in MPI world is, give me 100G memory in the 
cluster, I will launch 100 processes in this resource. In current YARN, we 
have following two choices to make this happen,
1) Send allocation request with 1G memory iteratively, until we got 100G 
memories in total. Then ask NM launch such 100 MPI processes. That will cause 
some problems like cannot support IO-forwarding, processes monitoring, etc. as 
mentioned above.
2) Send a larger resource request, like 10G. But we may encounter following 
problems,
   2.1 Such a large resource request is hard to get at one time.
   2.2 We cannot use other resources more than the number we specified in the 
node (we can only launch one daemon in one node).
   2.3 Hard to decide how much resource to ask.

So my proposal is,
1) We can incrementally send resource request with small resources like before, 
until we get enough resources in total
2) Merge resource in the same node, make only one big container in each node
3) Launch daemons in each node, and the daemon will spawn its local processes 
and manage them.

For example,
We need to run 10 processes, 1G for each, finally we got
container 1, 2, 3, 4, 5 in node1.
container 6, 7, 8 in node2.
container 9, 10 in node3.
Then we will,
merge [1, 2, 3, 4, 5] to container_11 with 5G, launch a daemon, and the daemon 
will launch 5 processes
merge [6, 7, 8] to container_12 with 3G, launch a daemon, and the daemon will 
launch 3 processes
merge [9, 10] to container_13 with 2G, launch a daemon, and the daemon will 
launch 2 processes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart

[
https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766290#comment-13766290
]

Bikas Saha commented on YARN-540:
-

bq. Delete throws exception in case of not-existing
If that is the case, then why didnt this code in the previous patch cause an
exception to be thrown for a normal job? This is removing the app that should
already have been removed after unregister.
{code}
+ // application completely done and remove from state store.
+ // App state may be already removed during
RMAppFinishingOrRemovingTransition.
+ RMStateStore store = app.rmContext.getStateStore();
+ store.removeApplication(app)
{code}

bq. it should not be possible to generate RMAppEventType.ATTEMPT_FAILED event
at that state
Can the app crash while its waiting to be unregistered. Will that generate an
ATTEMPT_FAILED? Can the node crash and cause an ATTEMPT_FAILED. If yes, then
these would be apply to the FINISHING state also.

bq. In case of REMOVING, return YARNApplicationState as RUNNING, makes sense?
In general an app can be removed while its in ACCEPTED state also (kill app
after submission) These should also go through the REMOVING state. So its not
necessary that the app state will always be RUNNING. We probably need to save
the previous state and return that while the app is in REMOVING state.

Race condition causing RM to potentially relaunch already unregistered AMs on
RM restart

[jira] [Commented] (YARN-1183) MiniYARNCluster shutdown takes several minutes intermittently

[
https://issues.apache.org/jira/browse/YARN-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766597#comment-13766597
]

Hadoop QA commented on YARN-1183:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12603037/YARN-1183--n3.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 1 new
or modified test files.

{color:red}-1 javac{color:red}. The patch appears to cause the build to
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1916//console

This message is automatically generated.

MiniYARNCluster shutdown takes several minutes intermittently
-

Key: YARN-1183
URL: https://issues.apache.org/jira/browse/YARN-1183
Project: Hadoop YARN
Issue Type: Bug
Reporter: Andrey Klochkov
Attachments: YARN-1183--n2.patch, YARN-1183--n3.patch, YARN-1183.patch

[jira] [Commented] (YARN-978) [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation


[ 
https://issues.apache.org/jira/browse/YARN-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766572#comment-13766572
 ] 

Xuan Gong commented on YARN-978:


bq. I'm fine with remove it, but trackingUrl is on web UI according to the 
latest patches in YARN-954 and YARN-1023. If it is to be removed, we should 
leave note there.

Yes, We might still need trackingUrl. Originally, I though the trackingUrl will 
be set as null, so we might not need it. But I checked code again. Actually, 
this is from the ApplicationMaster
{code}
  resourceManager.unregisterApplicationMaster(appStatus, appMessage, null);
{code}.

I am thinking since this applicationMaster can be re-wrote or provided by the 
client, this will be changed, too. (At least from MRApplicationMaster, the 
trackUrl is set as non-null). So, we can add trackUrl to the 
applicationAttemptReport.

And I agree that the logUrl should go to the containerReport. 

 [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation
 --

 Key: YARN-978
 URL: https://issues.apache.org/jira/browse/YARN-978
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Xuan Gong
 Fix For: YARN-321

 Attachments: YARN-978-1.patch, YARN-978.2.patch, YARN-978.3.patch, 
 YARN-978.4.patch, YARN-978.5.patch, YARN-978.6.patch, YARN-978.7.patch


 We dont have ApplicationAttemptReport and Protobuf implementation.
 Adding that.
 Thanks,
 Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1157) ResourceManager UI has invalid tracking URL link for distributed shell application


[ 
https://issues.apache.org/jira/browse/YARN-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766565#comment-13766565
 ] 

Hadoop QA commented on YARN-1157:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12603030/YARN-1157.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1915//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1915//console

This message is automatically generated.

 ResourceManager UI has invalid tracking URL link for distributed shell 
 application
 --

 Key: YARN-1157
 URL: https://issues.apache.org/jira/browse/YARN-1157
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.1.1-beta

 Attachments: YARN-1157.1.patch


 Submit YARN distributed shell application. Goto ResourceManager Web UI. The 
 application definitely appears. In Tracking UI column, there will be history 
 link. Click on that link. Instead of showing application master web UI, HTTP 
 error 500 would appear.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1194) TestContainerLogsPage fails with native builds

2013-09-13 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766547#comment-13766547
 ] 

Hudson commented on YARN-1194:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4408 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4408/])
YARN-1194. TestContainerLogsPage fails with native builds. Contributed by Roman 
Shaposhnik (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1522968)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java


 TestContainerLogsPage fails with native builds
 --

 Key: YARN-1194
 URL: https://issues.apache.org/jira/browse/YARN-1194
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
Priority: Minor
 Attachments: YARN-1194.patch.txt


 Running TestContainerLogsPage on trunk while Native IO is enabled makes it 
 fail

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-905) Add state filters to nodes CLI


[ 
https://issues.apache.org/jira/browse/YARN-905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766550#comment-13766550
 ] 

Hadoop QA commented on YARN-905:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12603027/YARN-905-addendum.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1914//console

This message is automatically generated.

 Add state filters to nodes CLI
 --

 Key: YARN-905
 URL: https://issues.apache.org/jira/browse/YARN-905
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Wei Yan
 Attachments: YARN-905-addendum.patch, Yarn-905.patch, YARN-905.patch, 
 YARN-905.patch


 It would be helpful for the nodes CLI to have a node-states option that 
 allows it to return nodes that are not just in the RUNNING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1194) TestContainerLogsPage test fails on trunk


[ 
https://issues.apache.org/jira/browse/YARN-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766516#comment-13766516
 ] 

Jason Lowe commented on YARN-1194:
--

+1, lgtm.

 TestContainerLogsPage test fails on trunk
 -

 Key: YARN-1194
 URL: https://issues.apache.org/jira/browse/YARN-1194
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
Priority: Minor
 Fix For: 2.1.1-beta

 Attachments: YARN-1194.patch.txt


 Running TestContainerLogsPage on trunk while Native IO is enabled makes it 
 fail

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1194) TestContainerLogsPage fails with native builds


 [ 
https://issues.apache.org/jira/browse/YARN-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1194:
-

Fix Version/s: (was: 2.1.1-beta)
  Summary: TestContainerLogsPage fails with native builds  (was: 
TestContainerLogsPage test fails on trunk)

Adjusting summary in preparation for commit since this affects more than trunk.

Also the Fix Version normally should not be set until the patch has been 
committed.

 TestContainerLogsPage fails with native builds
 --

 Key: YARN-1194
 URL: https://issues.apache.org/jira/browse/YARN-1194
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
Priority: Minor
 Attachments: YARN-1194.patch.txt


 Running TestContainerLogsPage on trunk while Native IO is enabled makes it 
 fail

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1078) TestNodeManagerResync, TestNodeManagerShutdown, and TestNodeStatusUpdater fail on Windows

2013-09-13 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766509#comment-13766509
 ] 

Hudson commented on YARN-1078:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1547 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1547/])
YARN-1078. TestNodeManagerResync, TestNodeManagerShutdown, and 
TestNodeStatusUpdater fail on Windows. Contributed by Chuan Liu. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1522644)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerShutdown.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java


 TestNodeManagerResync, TestNodeManagerShutdown, and TestNodeStatusUpdater 
 fail on Windows
 -

 Key: YARN-1078
 URL: https://issues.apache.org/jira/browse/YARN-1078
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Chuan Liu
Assignee: Chuan Liu
Priority: Minor
 Fix For: 3.0.0, 2.1.1-beta

 Attachments: YARN-1078.2.patch, YARN-1078.3.patch, 
 YARN-1078.branch-2.patch, YARN-1078.patch


 The three unit tests fail on Windows due to host name resolution differences 
 on Windows, i.e. 127.0.0.1 does not resolve to host name localhost.
 {noformat}
 org.apache.hadoop.security.token.SecretManager$InvalidToken: Given Container 
 container_0__01_00 identifier is not valid for current Node manager. 
 Expected : 127.0.0.1:12345 Found : localhost:12345
 {noformat}
 {noformat}
 testNMConnectionToRM(org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater)
   Time elapsed: 8343 sec   FAILURE!
 org.junit.ComparisonFailure: expected:[localhost]:12345 but 
 was:[127.0.0.1]:12345
   at org.junit.Assert.assertEquals(Assert.java:125)
   at org.junit.Assert.assertEquals(Assert.java:147)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyResourceTracker6.registerNodeManager(TestNodeStatusUpdater.java:712)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
   at $Proxy26.registerNodeManager(Unknown Source)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:212)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:149)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyNodeStatusUpdater4.serviceStart(TestNodeStatusUpdater.java:369)
   at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:213)
   at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNMConnectionToRM(TestNodeStatusUpdater.java:985)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1183) MiniYARNCluster shutdown takes several minutes intermittently


 [ 
https://issues.apache.org/jira/browse/YARN-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Klochkov updated YARN-1183:
--

Attachment: YARN-1183--n4.patch

 MiniYARNCluster shutdown takes several minutes intermittently
 -

 Key: YARN-1183
 URL: https://issues.apache.org/jira/browse/YARN-1183
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Andrey Klochkov
 Attachments: YARN-1183--n2.patch, YARN-1183--n3.patch, 
 YARN-1183--n4.patch, YARN-1183.patch


 As described in MAPREDUCE-5501 sometimes M/R tests leave MRAppMaster java 
 processes living for several minutes after successful completion of the 
 corresponding test. There is a concurrency issue in MiniYARNCluster shutdown 
 logic which leads to this. Sometimes RM stops before an app master sends it's 
 last report, and then the app master keeps retrying for 6 minutes. In some 
 cases it leads to failures in subsequent tests, and it affects performance of 
 tests as app masters eat resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1157) ResourceManager UI has invalid tracking URL link for distributed shell application


[ 
https://issues.apache.org/jira/browse/YARN-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766556#comment-13766556
 ] 

Xuan Gong commented on YARN-1157:
-

The reason is: At RMAppAttemptImpl::generateProxyUriWithoutScheme(String)
{code}
  return 
result.toASCIIString().substring(HttpConfig.getSchemePrefix().length());
{code}
can return an empty String,

But at WebAppProxyServlet, it only check whether the urlString is null or not, 
we should also check the empty string.

 ResourceManager UI has invalid tracking URL link for distributed shell 
 application
 --

 Key: YARN-1157
 URL: https://issues.apache.org/jira/browse/YARN-1157
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.1.1-beta

 Attachments: YARN-1157.1.patch


 Submit YARN distributed shell application. Goto ResourceManager Web UI. The 
 application definitely appears. In Tracking UI column, there will be history 
 link. Click on that link. Instead of showing application master web UI, HTTP 
 error 500 would appear.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1183) MiniYARNCluster shutdown takes several minutes intermittently


 [ 
https://issues.apache.org/jira/browse/YARN-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Klochkov updated YARN-1183:
--

Attachment: YARN-1183--n3.patch

Attaching an updated patch

 MiniYARNCluster shutdown takes several minutes intermittently
 -

 Key: YARN-1183
 URL: https://issues.apache.org/jira/browse/YARN-1183
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Andrey Klochkov
 Attachments: YARN-1183--n2.patch, YARN-1183--n3.patch, YARN-1183.patch


 As described in MAPREDUCE-5501 sometimes M/R tests leave MRAppMaster java 
 processes living for several minutes after successful completion of the 
 corresponding test. There is a concurrency issue in MiniYARNCluster shutdown 
 logic which leads to this. Sometimes RM stops before an app master sends it's 
 last report, and then the app master keeps retrying for 6 minutes. In some 
 cases it leads to failures in subsequent tests, and it affects performance of 
 tests as app masters eat resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1183) MiniYARNCluster shutdown takes several minutes intermittently


[ 
https://issues.apache.org/jira/browse/YARN-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766634#comment-13766634
 ] 

Hadoop QA commented on YARN-1183:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12603041/YARN-1183--n4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1917//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1917//console

This message is automatically generated.

 MiniYARNCluster shutdown takes several minutes intermittently
 -

 Key: YARN-1183
 URL: https://issues.apache.org/jira/browse/YARN-1183
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Andrey Klochkov
 Attachments: YARN-1183--n2.patch, YARN-1183--n3.patch, 
 YARN-1183--n4.patch, YARN-1183.patch


 As described in MAPREDUCE-5501 sometimes M/R tests leave MRAppMaster java 
 processes living for several minutes after successful completion of the 
 corresponding test. There is a concurrency issue in MiniYARNCluster shutdown 
 logic which leads to this. Sometimes RM stops before an app master sends it's 
 last report, and then the app master keeps retrying for 6 minutes. In some 
 cases it leads to failures in subsequent tests, and it affects performance of 
 tests as app masters eat resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1193) ResourceManger.clusterTimeStamp should be reset when RM transitions to active


[ 
https://issues.apache.org/jira/browse/YARN-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766645#comment-13766645
 ] 

Bikas Saha commented on YARN-1193:
--

isnt this already fixed in YARN-1027. Did you mean to open this jira to create 
a getClusterTimestamp() method?

 ResourceManger.clusterTimeStamp should be reset when RM transitions to active
 -

 Key: YARN-1193
 URL: https://issues.apache.org/jira/browse/YARN-1193
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-1193-1.patch


 ResourceManager.clusterTimeStamp is used to generate application-ids. 
 Currently, when the RM transitions to active-standby-active back and forth, 
 the clusterTimeStamp stays the same leading to apps getting the same ids as 
 jobs from before. This leads to other races in staging directory etc.
 To avoid this, it is better to set it on every transition to Active.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1183) MiniYARNCluster shutdown takes several minutes intermittently


[ 
https://issues.apache.org/jira/browse/YARN-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766690#comment-13766690
 ] 

Karthik Kambatla commented on YARN-1183:


Looks good to me. +1. Thanks Andrey.

Observation: Not that we should change anything. We are storing the timestamp 
of when the appMaster registered, but not using it anywhere yet. 

 MiniYARNCluster shutdown takes several minutes intermittently
 -

 Key: YARN-1183
 URL: https://issues.apache.org/jira/browse/YARN-1183
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Andrey Klochkov
 Attachments: YARN-1183--n2.patch, YARN-1183--n3.patch, 
 YARN-1183--n4.patch, YARN-1183.patch


 As described in MAPREDUCE-5501 sometimes M/R tests leave MRAppMaster java 
 processes living for several minutes after successful completion of the 
 corresponding test. There is a concurrency issue in MiniYARNCluster shutdown 
 logic which leads to this. Sometimes RM stops before an app master sends it's 
 last report, and then the app master keeps retrying for 6 minutes. In some 
 cases it leads to failures in subsequent tests, and it affects performance of 
 tests as app masters eat resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1197) Add container merge support in YARN

[
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Bikas Saha updated YARN-1197:
-

Description:
Currently, YARN cannot support merge several containers in one node to a big
container, which can make us incrementally ask resources, merge them to a
bigger one, and launch our processes. The user scenario is described in the
comments.

was:
Currently, YARN cannot support merge several containers in one node to a big
container, which can make us incrementally ask resources, merge them to a
bigger one, and launch our processes. The user scenario is,

In some applications (like OpenMPI) has their own daemons in each node (one for
each node) in their original implementation, and their user's processes are
directly launched by its local daemon (like task-tracker in MRv1, but it's
per-application). Many functionalities are depended on the pipes created when a
process forked by its father, like IO-forwarding, process monitoring (it will
do more logic than what NM did for us) and may cause some scalability issues.

A very common resource request in MPI world is, give me 100G memory in the
cluster, I will launch 100 processes in this resource. In current YARN, we
have following two choices to make this happen,
1) Send allocation request with 1G memory iteratively, until we got 100G
memories in total. Then ask NM launch such 100 MPI processes. That will cause
some problems like cannot support IO-forwarding, processes monitoring, etc. as
mentioned above.
2) Send a larger resource request, like 10G. But we may encounter following
problems,
2.1 Such a large resource request is hard to get at one time.
2.2 We cannot use other resources more than the number we specified in the
node (we can only launch one daemon in one node).
2.3 Hard to decide how much resource to ask.

So my proposal is,
1) We can incrementally send resource request with small resources like before,
until we get enough resources in total
2) Merge resource in the same node, make only one big container in each node
3) Launch daemons in each node, and the daemon will spawn its local processes
and manage them.

For example,
We need to run 10 processes, 1G for each, finally we got
container 1, 2, 3, 4, 5 in node1.
container 6, 7, 8 in node2.
container 9, 10 in node3.
Then we will,
merge [1, 2, 3, 4, 5] to container_11 with 5G, launch a daemon, and the daemon
will launch 5 processes
merge [6, 7, 8] to container_12 with 3G, launch a daemon, and the daemon will
launch 3 processes
merge [9, 10] to container_13 with 2G, launch a daemon, and the daemon will
launch 2 processes

Add container merge support in YARN
---

[jira] [Updated] (YARN-905) Add state filters to nodes CLI


 [ 
https://issues.apache.org/jira/browse/YARN-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-905:
-

Attachment: YARN-905-addendum.patch

 Add state filters to nodes CLI
 --

 Key: YARN-905
 URL: https://issues.apache.org/jira/browse/YARN-905
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Wei Yan
 Attachments: YARN-905-addendum.patch, YARN-905-addendum.patch, 
 Yarn-905.patch, YARN-905.patch, YARN-905.patch


 It would be helpful for the nodes CLI to have a node-states option that 
 allows it to return nodes that are not just in the RUNNING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1193) ResourceManger.clusterTimeStamp should be reset when RM transitions to active


[ 
https://issues.apache.org/jira/browse/YARN-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766684#comment-13766684
 ] 

Karthik Kambatla commented on YARN-1193:


[~bikassaha], looks like I misunderstood your comments on YARN-1027. The latest 
patch there (yarn-1027-7.patch) doesn't have this. 

clusterTimeStamp is a public static final variable in ResourceManager. To set 
it when transitioning to Active, we need to make it non-final, which would 
expose a public static variable. This change, I thought, should go with making 
it private and adding a public get method. 

Do you suggest I merge this back into YARN-1027? Or, is it okay to handle it 
separately. I think it might be cleaner to handle separately so it is easier 
for people to understand why a particular change has been made.



 ResourceManger.clusterTimeStamp should be reset when RM transitions to active
 -

 Key: YARN-1193
 URL: https://issues.apache.org/jira/browse/YARN-1193
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-1193-1.patch


 ResourceManager.clusterTimeStamp is used to generate application-ids. 
 Currently, when the RM transitions to active-standby-active back and forth, 
 the clusterTimeStamp stays the same leading to apps getting the same ids as 
 jobs from before. This leads to other races in staging directory etc.
 To avoid this, it is better to set it on every transition to Active.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1197) Add container merge support in YARN

[
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766650#comment-13766650
]

Bikas Saha commented on YARN-1197:
--

bq. 1) We can incrementally send resource request with small resources like
before, until we get enough resources in total
bq. 2) Merge resource in the same node, make only one big container in each node
When the RM is asked for a container then this is what the RM does. It
incrementally adds reserved space on a node until it can allocate the full
resources desired by the container. Then it assigns the container to the app.
So its not clear how making small allocations and then merging them in the app
is going to help.
By asking the RM directly for 10G resources we can ensure that the RM will
eventually give us that. If we ask for 10 1G resources then we are not
guaranteed that the RM will give them to us on the same node and thus the
overall request may be unsatisfiable.

Add container merge support in YARN
---

[jira] [Commented] (YARN-1193) ResourceManger.clusterTimeStamp should be reset when RM transitions to active

[
https://issues.apache.org/jira/browse/YARN-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766701#comment-13766701
]

Bikas Saha commented on YARN-1193:
--

The patch in YARN-1027 is incorrect without the clustertimestamp modification.
However, we dont need to add a getClusterTimeStamp method in that patch itself.
That can be done in a separate jira. So I meant that we can do the method
refactoring separately. Sorry for not being clear. However, in interest of
getting YARN-1027 done I will take back those comments. So lets keep those
changes in YARN-1027. I am closing this jira.

ResourceManger.clusterTimeStamp should be reset when RM transitions to active
-

Key: YARN-1193
URL: https://issues.apache.org/jira/browse/YARN-1193
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Attachments: yarn-1193-1.patch

ResourceManager.clusterTimeStamp is used to generate application-ids.
Currently, when the RM transitions to active-standby-active back and forth,
the clusterTimeStamp stays the same leading to apps getting the same ids as
jobs from before. This leads to other races in staging directory etc.
To avoid this, it is better to set it on every transition to Active.

[jira] [Resolved] (YARN-1193) ResourceManger.clusterTimeStamp should be reset when RM transitions to active


 [ 
https://issues.apache.org/jira/browse/YARN-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha resolved YARN-1193.
--

Resolution: Duplicate
  Assignee: (was: Karthik Kambatla)

YARN-1027 fixes this.

 ResourceManger.clusterTimeStamp should be reset when RM transitions to active
 -

 Key: YARN-1193
 URL: https://issues.apache.org/jira/browse/YARN-1193
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
 Attachments: yarn-1193-1.patch


 ResourceManager.clusterTimeStamp is used to generate application-ids. 
 Currently, when the RM transitions to active-standby-active back and forth, 
 the clusterTimeStamp stays the same leading to apps getting the same ids as 
 jobs from before. This leads to other races in staging directory etc.
 To avoid this, it is better to set it on every transition to Active.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1078) TestNodeManagerResync, TestNodeManagerShutdown, and TestNodeStatusUpdater fail on Windows

2013-09-13 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766401#comment-13766401
 ] 

Hudson commented on YARN-1078:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #331 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/331/])
YARN-1078. TestNodeManagerResync, TestNodeManagerShutdown, and 
TestNodeStatusUpdater fail on Windows. Contributed by Chuan Liu. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1522644)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerShutdown.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java


 TestNodeManagerResync, TestNodeManagerShutdown, and TestNodeStatusUpdater 
 fail on Windows
 -

 Key: YARN-1078
 URL: https://issues.apache.org/jira/browse/YARN-1078
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Chuan Liu
Assignee: Chuan Liu
Priority: Minor
 Fix For: 3.0.0, 2.1.1-beta

 Attachments: YARN-1078.2.patch, YARN-1078.3.patch, 
 YARN-1078.branch-2.patch, YARN-1078.patch


 The three unit tests fail on Windows due to host name resolution differences 
 on Windows, i.e. 127.0.0.1 does not resolve to host name localhost.
 {noformat}
 org.apache.hadoop.security.token.SecretManager$InvalidToken: Given Container 
 container_0__01_00 identifier is not valid for current Node manager. 
 Expected : 127.0.0.1:12345 Found : localhost:12345
 {noformat}
 {noformat}
 testNMConnectionToRM(org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater)
   Time elapsed: 8343 sec   FAILURE!
 org.junit.ComparisonFailure: expected:[localhost]:12345 but 
 was:[127.0.0.1]:12345
   at org.junit.Assert.assertEquals(Assert.java:125)
   at org.junit.Assert.assertEquals(Assert.java:147)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyResourceTracker6.registerNodeManager(TestNodeStatusUpdater.java:712)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
   at $Proxy26.registerNodeManager(Unknown Source)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:212)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:149)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyNodeStatusUpdater4.serviceStart(TestNodeStatusUpdater.java:369)
   at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:213)
   at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNMConnectionToRM(TestNodeStatusUpdater.java:985)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1197) Add container merge support in YARN

[
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766646#comment-13766646
]

Bikas Saha commented on YARN-1197:
--

(Copying description into comments to reduce email size.)

Add container merge support in YARN
---

[jira] [Updated] (YARN-1027) Implement RMHAProtocolService


 [ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1027:
---

Attachment: yarn-1027-8.patch

Including the patch from YARN-1193.

 Implement RMHAProtocolService
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: test-yarn-1027.patch, yarn-1027-1.patch, 
 yarn-1027-2.patch, yarn-1027-3.patch, yarn-1027-4.patch, yarn-1027-5.patch, 
 yarn-1027-6.patch, yarn-1027-7.patch, yarn-1027-7.patch, yarn-1027-8.patch, 
 yarn-1027-including-yarn-1098-3.patch, yarn-1027-in-rm-poc.patch


 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1189) NMTokenSecretManagerInNM is not being told when applications have finished


 [ 
https://issues.apache.org/jira/browse/YARN-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1189:


Priority: Blocker  (was: Major)

 NMTokenSecretManagerInNM is not being told when applications have finished 
 ---

 Key: YARN-1189
 URL: https://issues.apache.org/jira/browse/YARN-1189
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
Priority: Blocker
 Attachments: YARN-1189-20130912.1.patch, YARN-1189-20130913.txt


 The {{appFinished}} method is not being called when applications have 
 finished.  This causes a couple of leaks as {{oldMasterKeys}} and 
 {{appToAppAttemptMap}} are never being pruned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1189) NMTokenSecretManagerInNM is not being told when applications have finished


[ 
https://issues.apache.org/jira/browse/YARN-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766713#comment-13766713
 ] 

Omkar Vinit Joshi commented on YARN-1189:
-

looks good to me.. thanks [~jlowe] 

 NMTokenSecretManagerInNM is not being told when applications have finished 
 ---

 Key: YARN-1189
 URL: https://issues.apache.org/jira/browse/YARN-1189
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta, 2.1.1-beta
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
Priority: Blocker
 Attachments: YARN-1189-20130912.1.patch, YARN-1189-20130913.txt


 The {{appFinished}} method is not being called when applications have 
 finished.  This causes a couple of leaks as {{oldMasterKeys}} and 
 {{appToAppAttemptMap}} are never being pruned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-975) Adding HDFS implementation for grouped reading and writing interfaces of history storage

2013-09-13 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-975:
---

Attachment: YARN-975.5.patch

Attaching rebased patch.

Thanks,
Mayank

 Adding HDFS implementation for grouped reading and writing interfaces of 
 history storage
 

 Key: YARN-975
 URL: https://issues.apache.org/jira/browse/YARN-975
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-975.1.patch, YARN-975.2.patch, YARN-975.3.patch, 
 YARN-975.4.patch, YARN-975.5.patch


 HDFS implementation should be a standard persistence strategy of history 
 storage

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1189) NMTokenSecretManagerInNM is not being told when applications have finished

2013-09-13 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766738#comment-13766738
 ] 

Daryn Sharp commented on YARN-1189:
---

Oops, I thought the .1 patch was the latest so I didn't see the test.

 NMTokenSecretManagerInNM is not being told when applications have finished 
 ---

 Key: YARN-1189
 URL: https://issues.apache.org/jira/browse/YARN-1189
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta, 2.1.1-beta
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
Priority: Blocker
 Attachments: YARN-1189-20130912.1.patch, YARN-1189-20130913.txt


 The {{appFinished}} method is not being called when applications have 
 finished.  This causes a couple of leaks as {{oldMasterKeys}} and 
 {{appToAppAttemptMap}} are never being pruned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-975) Adding HDFS implementation for grouped reading and writing interfaces of history storage

2013-09-13 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766774#comment-13766774
 ] 

Mayank Bansal commented on YARN-975:


Patch needs rebasing

 Adding HDFS implementation for grouped reading and writing interfaces of 
 history storage
 

 Key: YARN-975
 URL: https://issues.apache.org/jira/browse/YARN-975
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-975.1.patch, YARN-975.2.patch, YARN-975.3.patch, 
 YARN-975.4.patch, YARN-975.5.patch


 HDFS implementation should be a standard persistence strategy of history 
 storage

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1027) Implement RMHAProtocolService


[ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766750#comment-13766750
 ] 

Hadoop QA commented on YARN-1027:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12603056/yarn-1027-8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1919//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1919//console

This message is automatically generated.

 Implement RMHAProtocolService
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: test-yarn-1027.patch, yarn-1027-1.patch, 
 yarn-1027-2.patch, yarn-1027-3.patch, yarn-1027-4.patch, yarn-1027-5.patch, 
 yarn-1027-6.patch, yarn-1027-7.patch, yarn-1027-7.patch, yarn-1027-8.patch, 
 yarn-1027-including-yarn-1098-3.patch, yarn-1027-in-rm-poc.patch


 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1194) TestContainerLogsPage fails with native builds

2013-09-13 Thread Roman Shaposhnik (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766638#comment-13766638
 ] 

Roman Shaposhnik commented on YARN-1194:


[~jlowe] thanks a lot for a quick review/commit!

 TestContainerLogsPage fails with native builds
 --

 Key: YARN-1194
 URL: https://issues.apache.org/jira/browse/YARN-1194
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
Priority: Minor
 Fix For: 3.0.0, 2.1.1-beta

 Attachments: YARN-1194.patch.txt


 Running TestContainerLogsPage on trunk while Native IO is enabled makes it 
 fail

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

Omkar Vinit Joshi created YARN-1198:
---

 Summary: Capacity Scheduler headroom calculation does not work as 
expected
 Key: YARN-1198
 URL: https://issues.apache.org/jira/browse/YARN-1198
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi


Today headroom calculation (for the app) takes place only when
* New node is added/removed from the cluster
* New container is getting assigned to the application.

However there are potentially lot of situations which are not considered for 
this calculation
* If a container finishes then headroom for that application will change and 
should be notified to the AM accordingly.
* If a single user has submitted multiple applications (app1 and app2) to the 
same queue then
** If app1's container finishes then not only app1's but also app2's AM should 
be notified about the change in headroom.
** Similarly if a container is assigned to any applications app1/app2 then both 
AM should be notified about their headroom.
** To simplify the whole communication process it is ideal to keep headroom per 
User per LeafQueue so that everyone gets the same picture (apps belonging to 
same user and submitted in same queue).
* If a new user submits an application to the queue then all applications 
submitted by all users in that queue should be notified of the headroom change.
* Also today headroom is an absolute number ( I think it should be normalized 
but then this is going to be not backward compatible..)
* Also  when admin user refreshes queue headroom has to be updated.

These all are the potential bugs in headroom calculations


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes

[
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766787#comment-13766787
]

Arun C Murthy commented on YARN-311:

Can I get a few more days to review this? Thanks.

Also, let's put this in 2.3.0 (not 2.1.1). Thanks.

Dynamic node resource configuration: core scheduler changes
---

[jira] [Updated] (YARN-311) Dynamic node resource configuration: core scheduler changes

[
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Arun C Murthy updated YARN-311:
---

Target Version/s: 2.3.0 (was: 2.1.1-beta)

Dynamic node resource configuration: core scheduler changes
---

[jira] [Commented] (YARN-451) Add more metrics to RM page

2013-09-13 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766759#comment-13766759
 ] 

Sangjin Lee commented on YARN-451:
--

I am pretty close to getting a patch ready for review on this. A quick question 
before that however: the proposed change contains changes in YARN (changes in 
message definition to carry this extra info, and subsequent UI changes) and 
mapreduce (mapreduce application providing this information).

Should I create two sub-tasks (one for YARN and one for MAPREDUCE) and provide 
separate patches for them?

 Add more metrics to RM page
 ---

 Key: YARN-451
 URL: https://issues.apache.org/jira/browse/YARN-451
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
Reporter: Lohit Vijayarenu
Priority: Minor

 ResourceManager webUI shows list of RUNNING applications, but it does not 
 tell which applications are requesting more resource compared to others. With 
 cluster running hundreds of applications at once it would be useful to have 
 some kind of metric to show high-resource usage applications vs low-resource 
 usage ones. At the minimum showing number of containers is good option.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart


 [ 
https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-540:
-

Attachment: YARN-540.8.patch

bq. why didnt this code in the previous patch cause an exception to be thrown 
for a normal job? 
Because I added a check in RMAppRemovingTransition instead of FinalTransition

bq. Can the app crash while its waiting to be unregistered. Will that generate 
an ATTEMPT_FAILED? Can the node crash and cause an ATTEMPT_FAILED. 
Since AppAttempt is already in FINISHING state if App is in REMOVING state. if 
app crashed,  attempt will receive  CONTAINER_FINISHED event and then attempt 
goes to FINISHED state.
If the node crash, attempt should receive EXPIRE event and attempt should go to 
FINISHED state as well. 

bq. We probably need to save the previous state and return that while the app 
is in REMOVING state.
Yes, added a function to return the previous state when App is in REMOVING state

 Race condition causing RM to potentially relaunch already unregistered AMs on 
 RM restart
 

 Key: YARN-540
 URL: https://issues.apache.org/jira/browse/YARN-540
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.3.patch, 
 YARN-540.4.patch, YARN-540.5.patch, YARN-540.6.patch, YARN-540.7.patch, 
 YARN-540.7.patch, YARN-540.8.patch, YARN-540.patch, YARN-540.patch


 When job succeeds and successfully call finishApplicationMaster, RM shutdown 
 and restart-dispatcher is stopped before it can process REMOVE_APP event. The 
 next time RM comes back, it will reload the existing state files even though 
 the job is succeeded

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1189) NMTokenSecretManagerInNM is not being told when applications have finished

2013-09-13 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766728#comment-13766728
 ] 

Daryn Sharp commented on YARN-1189:
---

+1 but a test, even a mock that spies appFinished would be great to avoid a 
regression

 NMTokenSecretManagerInNM is not being told when applications have finished 
 ---

 Key: YARN-1189
 URL: https://issues.apache.org/jira/browse/YARN-1189
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta, 2.1.1-beta
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
Priority: Blocker
 Attachments: YARN-1189-20130912.1.patch, YARN-1189-20130913.txt


 The {{appFinished}} method is not being called when applications have 
 finished.  This causes a couple of leaks as {{oldMasterKeys}} and 
 {{appToAppAttemptMap}} are never being pruned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-953) [YARN-321] Change ResourceManager to use HistoryStorage to log history data


[ 
https://issues.apache.org/jira/browse/YARN-953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766872#comment-13766872
 ] 

Hadoop QA commented on YARN-953:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12603086/YARN-953.4.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1922//console

This message is automatically generated.

 [YARN-321] Change ResourceManager to use HistoryStorage to log history data
 ---

 Key: YARN-953
 URL: https://issues.apache.org/jira/browse/YARN-953
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: YARN-953.1.patch, YARN-953.2.patch, YARN-953.3.patch, 
 YARN-953.4.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1189) NMTokenSecretManagerInNM is not being told when applications have finished


 [ 
https://issues.apache.org/jira/browse/YARN-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1189:
-

Attachment: YARN-1189-20130913.txt

Thanks, Omkar.  Patch looks good to me.  Here's the patch with a unit test to 
make sure we're calling the token secret manager when the application is 
finished.

 NMTokenSecretManagerInNM is not being told when applications have finished 
 ---

 Key: YARN-1189
 URL: https://issues.apache.org/jira/browse/YARN-1189
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1189-20130912.1.patch, YARN-1189-20130913.txt


 The {{appFinished}} method is not being called when applications have 
 finished.  This causes a couple of leaks as {{oldMasterKeys}} and 
 {{appToAppAttemptMap}} are never being pruned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-953) [YARN-321] Change ResourceManager to use HistoryStorage to log history data


 [ 
https://issues.apache.org/jira/browse/YARN-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-953:
-

Attachment: YARN-953.4.patch

Rebase the patch

 [YARN-321] Change ResourceManager to use HistoryStorage to log history data
 ---

 Key: YARN-953
 URL: https://issues.apache.org/jira/browse/YARN-953
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: YARN-953.1.patch, YARN-953.2.patch, YARN-953.3.patch, 
 YARN-953.4.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1001) YARN should provide per application-type and state statistics

2013-09-13 Thread Srimanth Gunturi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766878#comment-13766878
 ] 

Srimanth Gunturi commented on YARN-1001:


What we need is a call like {{/appStates?type=mapreduce}}, and that would give 
per-state counts of the various MR apps. It should also include a total count 
of MR apps. 

Something like 
{noformat}
{
  total: 10,
  submitted: 10,
  running: 3,
  pending: 4,
  completed: 2,
  killed: 1,
  failed: 1
}
{noformat}

 YARN should provide per application-type and state statistics
 -

 Key: YARN-1001
 URL: https://issues.apache.org/jira/browse/YARN-1001
 Project: Hadoop YARN
  Issue Type: Task
  Components: api
Affects Versions: 2.1.0-beta
Reporter: Srimanth Gunturi
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-1001.1.patch, YARN-1001.2.patch, YARN-1001.3.patch


 In Ambari we plan to show for MR2 the number of applications finished, 
 running, waiting, etc. It would be efficient if YARN could provide per 
 application-type and state aggregated counts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-978) [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation


 [ 
https://issues.apache.org/jira/browse/YARN-978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-978:
---

Attachment: YARN-978.8.patch

New patch adds TrackingUrl back, and remove the logUrl(This will be exposed by 
containerReport)

 [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation
 --

 Key: YARN-978
 URL: https://issues.apache.org/jira/browse/YARN-978
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Xuan Gong
 Fix For: YARN-321

 Attachments: YARN-978-1.patch, YARN-978.2.patch, YARN-978.3.patch, 
 YARN-978.4.patch, YARN-978.5.patch, YARN-978.6.patch, YARN-978.7.patch, 
 YARN-978.8.patch


 We dont have ApplicationAttemptReport and Protobuf implementation.
 Adding that.
 Thanks,
 Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-978) [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation


[ 
https://issues.apache.org/jira/browse/YARN-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766916#comment-13766916
 ] 

Hadoop QA commented on YARN-978:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12603089/YARN-978.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1923//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1923//console

This message is automatically generated.

 [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation
 --

 Key: YARN-978
 URL: https://issues.apache.org/jira/browse/YARN-978
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Xuan Gong
 Fix For: YARN-321

 Attachments: YARN-978-1.patch, YARN-978.2.patch, YARN-978.3.patch, 
 YARN-978.4.patch, YARN-978.5.patch, YARN-978.6.patch, YARN-978.7.patch, 
 YARN-978.8.patch


 We dont have ApplicationAttemptReport and Protobuf implementation.
 Adding that.
 Thanks,
 Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart


[ 
https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766926#comment-13766926
 ] 

Bikas Saha commented on YARN-540:
-

bq. Because I added a check in RMAppRemovingTransition instead of 
FinalTransition
The check in RMAppRemovingTransition will pass in the normal case because the 
app has unregistered and this is the first call to remove app. Then in the end 
when the app container exits then FinalTransition is called and there is no 
check at that time. so removeapp will be called a second time and the delete 
will throw an exception. Is that not the flow?

 Race condition causing RM to potentially relaunch already unregistered AMs on 
 RM restart
 

 Key: YARN-540
 URL: https://issues.apache.org/jira/browse/YARN-540
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.3.patch, 
 YARN-540.4.patch, YARN-540.5.patch, YARN-540.6.patch, YARN-540.7.patch, 
 YARN-540.7.patch, YARN-540.8.patch, YARN-540.patch, YARN-540.patch


 When job succeeds and successfully call finishApplicationMaster, RM shutdown 
 and restart-dispatcher is stopped before it can process REMOVE_APP event. The 
 next time RM comes back, it will reload the existing state files even though 
 the job is succeeded

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (YARN-1178) TestContainerLogsPage#testContainerLogPageAccess is failing


 [ 
https://issues.apache.org/jira/browse/YARN-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe resolved YARN-1178.
--

Resolution: Duplicate

Marking this as a duplicate of YARN-1194 since that already has a patch posted.

 TestContainerLogsPage#testContainerLogPageAccess is failing
 ---

 Key: YARN-1178
 URL: https://issues.apache.org/jira/browse/YARN-1178
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles

 Test is failing after YARN-649. This test is only run in native mode 
 mvn clean test -Pnative -Dtest=TestContainerLogsPage

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1189) NMTokenSecretManagerInNM is not being told when applications have finished


[ 
https://issues.apache.org/jira/browse/YARN-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766499#comment-13766499
 ] 

Hadoop QA commented on YARN-1189:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12603017/YARN-1189-20130913.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1913//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1913//console

This message is automatically generated.

 NMTokenSecretManagerInNM is not being told when applications have finished 
 ---

 Key: YARN-1189
 URL: https://issues.apache.org/jira/browse/YARN-1189
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1189-20130912.1.patch, YARN-1189-20130913.txt


 The {{appFinished}} method is not being called when applications have 
 finished.  This causes a couple of leaks as {{oldMasterKeys}} and 
 {{appToAppAttemptMap}} are never being pruned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-905) Add state filters to nodes CLI


 [ 
https://issues.apache.org/jira/browse/YARN-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-905:
-

Attachment: YARN-905-addendum.patch

Update an addendum patch that fixs case-ignored all and invalid input 
exception.

 Add state filters to nodes CLI
 --

 Key: YARN-905
 URL: https://issues.apache.org/jira/browse/YARN-905
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Wei Yan
 Attachments: YARN-905-addendum.patch, Yarn-905.patch, YARN-905.patch, 
 YARN-905.patch


 It would be helpful for the nodes CLI to have a node-states option that 
 allows it to return nodes that are not just in the RUNNING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-978) [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation

2013-09-13 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766935#comment-13766935
 ] 

Mayank Bansal commented on YARN-978:


Looks good +1

Thanks,
Mayank

 [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation
 --

 Key: YARN-978
 URL: https://issues.apache.org/jira/browse/YARN-978
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Xuan Gong
 Fix For: YARN-321

 Attachments: YARN-978-1.patch, YARN-978.2.patch, YARN-978.3.patch, 
 YARN-978.4.patch, YARN-978.5.patch, YARN-978.6.patch, YARN-978.7.patch, 
 YARN-978.8.patch


 We dont have ApplicationAttemptReport and Protobuf implementation.
 Adding that.
 Thanks,
 Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1157) ResourceManager UI has invalid tracking URL link for distributed shell application


 [ 
https://issues.apache.org/jira/browse/YARN-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1157:


Attachment: YARN-1157.1.patch

Trivial patch, no test cases added

 ResourceManager UI has invalid tracking URL link for distributed shell 
 application
 --

 Key: YARN-1157
 URL: https://issues.apache.org/jira/browse/YARN-1157
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.1.1-beta

 Attachments: YARN-1157.1.patch


 Submit YARN distributed shell application. Goto ResourceManager Web UI. The 
 application definitely appears. In Tracking UI column, there will be history 
 link. Click on that link. Instead of showing application master web UI, HTTP 
 error 500 would appear.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart


[ 
https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766943#comment-13766943
 ] 

Jian He commented on YARN-540:
--

bq. Is that not the flow?
Yeah, I think I missed that in the previous patch. That previous patch should 
throw exception for a normal job..

 Race condition causing RM to potentially relaunch already unregistered AMs on 
 RM restart
 

 Key: YARN-540
 URL: https://issues.apache.org/jira/browse/YARN-540
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.3.patch, 
 YARN-540.4.patch, YARN-540.5.patch, YARN-540.6.patch, YARN-540.7.patch, 
 YARN-540.7.patch, YARN-540.8.patch, YARN-540.patch, YARN-540.patch


 When job succeeds and successfully call finishApplicationMaster, RM shutdown 
 and restart-dispatcher is stopped before it can process REMOVE_APP event. The 
 next time RM comes back, it will reload the existing state files even though 
 the job is succeeded

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1001) YARN should provide per application-type and state statistics


[ 
https://issues.apache.org/jira/browse/YARN-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766950#comment-13766950
 ] 

Zhijie Shen commented on YARN-1001:
---

Having talked to [~srimanth.gunturi] offline. Here's more specifications for 
the API:

1. The API takes exact one applicationType now. If we check there's no or more 
than one applicationType, we throw exception. We may work only multiple 
applicationTypes in the future.

2. The API takes zero to many states. If no state is specified, we enumerate 
all states of RMApp, and return the count of each state. If the states are 
specified, we just return the counts of these states.

3. We output the results as following:
{code}
appStatInfo
  statItem
statesubmitted/state
typemapreduce/type
count10/count
  /statItem
  statItem
staterunning/state
typemapreduce/type
count3/count
  /statItem
/appStatInfo
{code}
We don't particularly list the total count, as it can be concluded by summing 
up all the counts, and it is not fit for the schema.

 YARN should provide per application-type and state statistics
 -

 Key: YARN-1001
 URL: https://issues.apache.org/jira/browse/YARN-1001
 Project: Hadoop YARN
  Issue Type: Task
  Components: api
Affects Versions: 2.1.0-beta
Reporter: Srimanth Gunturi
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-1001.1.patch, YARN-1001.2.patch, YARN-1001.3.patch


 In Ambari we plan to show for MR2 the number of applications finished, 
 running, waiting, etc. It would be efficient if YARN could provide per 
 application-type and state aggregated counts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1184) ClassCastException is thrown during preemption When a huge job is submitted to a queue B whose resources is used by a job in queueA


[ 
https://issues.apache.org/jira/browse/YARN-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767039#comment-13767039
 ] 

Hadoop QA commented on YARN-1184:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12603114/Y1184-0.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1924//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1924//console

This message is automatically generated.

 ClassCastException is thrown during preemption When a huge job is submitted 
 to a queue B whose resources is used by a job in queueA
 ---

 Key: YARN-1184
 URL: https://issues.apache.org/jira/browse/YARN-1184
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: J.Andreina
Assignee: Devaraj K
 Fix For: 2.1.1-beta

 Attachments: Y1184-0.patch


 preemption is enabled.
 Queue = a,b
 a capacity = 30%
 b capacity = 70%
 Step 1: Assign a big job to queue a ( so that job_a will utilize some 
 resources from queue b)
 Step 2: Assigne a big job to queue b.
 Following exception is thrown at Resource Manager
 {noformat}
 2013-09-12 10:42:32,535 ERROR [SchedulingMonitor 
 (ProportionalCapacityPreemptionPolicy)] yarn.YarnUncaughtExceptionHandler 
 (YarnUncaughtExceptionHandler.java:uncaughtException(68)) - Thread 
 Thread[SchedulingMonitor (ProportionalCapacityPreemptionPolicy),5,main] threw 
 an Exception.
 java.lang.ClassCastException: java.util.Collections$UnmodifiableSet cannot be 
 cast to java.util.NavigableSet
   at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.getContainersToPreempt(ProportionalCapacityPreemptionPolicy.java:403)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:202)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:72)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PreemptionChecker.run(SchedulingMonitor.java:82)
   at java.lang.Thread.run(Thread.java:662)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes

2013-09-13 Thread Junping Du (JIRA)

[
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767095#comment-13767095
]

Junping Du commented on YARN-311:
-

Sure. Thanks for review. Arun!

Dynamic node resource configuration: core scheduler changes
---

[jira] [Updated] (YARN-1170) yarn proto definitions should specify package as 'hadoop.yarn'


 [ 
https://issues.apache.org/jira/browse/YARN-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1170:


Assignee: Binglin Chang

 yarn proto definitions should specify package as 'hadoop.yarn'
 --

 Key: YARN-1170
 URL: https://issues.apache.org/jira/browse/YARN-1170
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Arun C Murthy
Assignee: Binglin Chang
Priority: Blocker
 Attachments: YARN-1170.v1.patch


 yarn proto definitions should specify package as 'hadoop.yarn' similar to 
 protos with 'hadoop.common'  'hadoop.hdfs' in Common  HDFS respectively.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1170) yarn proto definitions should specify package as 'hadoop.yarn'


 [ 
https://issues.apache.org/jira/browse/YARN-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1170:


Attachment: YARN-1170.v1.patch

 yarn proto definitions should specify package as 'hadoop.yarn'
 --

 Key: YARN-1170
 URL: https://issues.apache.org/jira/browse/YARN-1170
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Arun C Murthy
Assignee: Binglin Chang
Priority: Blocker
 Attachments: YARN-1170.v1.patch, YARN-1170.v1.patch


 yarn proto definitions should specify package as 'hadoop.yarn' similar to 
 protos with 'hadoop.common'  'hadoop.hdfs' in Common  HDFS respectively.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart


[ 
https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767124#comment-13767124
 ] 

Hadoop QA commented on YARN-540:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12603143/YARN-540.9.patch
  against trunk revision .

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1927//console

This message is automatically generated.

 Race condition causing RM to potentially relaunch already unregistered AMs on 
 RM restart
 

 Key: YARN-540
 URL: https://issues.apache.org/jira/browse/YARN-540
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.3.patch, 
 YARN-540.4.patch, YARN-540.5.patch, YARN-540.6.patch, YARN-540.7.patch, 
 YARN-540.7.patch, YARN-540.8.patch, YARN-540.9.patch, YARN-540.patch, 
 YARN-540.patch


 When job succeeds and successfully call finishApplicationMaster, RM shutdown 
 and restart-dispatcher is stopped before it can process REMOVE_APP event. The 
 next time RM comes back, it will reload the existing state files even though 
 the job is succeeded

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1170) yarn proto definitions should specify package as 'hadoop.yarn'


[ 
https://issues.apache.org/jira/browse/YARN-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767123#comment-13767123
 ] 

Arun C Murthy commented on YARN-1170:
-

+1, I'll commit after jenkins is back. Thanks [~decster]!

 yarn proto definitions should specify package as 'hadoop.yarn'
 --

 Key: YARN-1170
 URL: https://issues.apache.org/jira/browse/YARN-1170
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Arun C Murthy
Assignee: Binglin Chang
Priority: Blocker
 Attachments: YARN-1170.v1.patch, YARN-1170.v1.patch


 yarn proto definitions should specify package as 'hadoop.yarn' similar to 
 protos with 'hadoop.common'  'hadoop.hdfs' in Common  HDFS respectively.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart


 [ 
https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-540:
-

Attachment: YARN-540.9.patch

 Race condition causing RM to potentially relaunch already unregistered AMs on 
 RM restart
 

 Key: YARN-540
 URL: https://issues.apache.org/jira/browse/YARN-540
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.3.patch, 
 YARN-540.4.patch, YARN-540.5.patch, YARN-540.6.patch, YARN-540.7.patch, 
 YARN-540.7.patch, YARN-540.8.patch, YARN-540.9.patch, YARN-540.9.patch, 
 YARN-540.patch, YARN-540.patch


 When job succeeds and successfully call finishApplicationMaster, RM shutdown 
 and restart-dispatcher is stopped before it can process REMOVE_APP event. The 
 next time RM comes back, it will reload the existing state files even though 
 the job is succeeded

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1170) yarn proto definitions should specify package as 'hadoop.yarn'


[ 
https://issues.apache.org/jira/browse/YARN-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767160#comment-13767160
 ] 

Hadoop QA commented on YARN-1170:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12603144/YARN-1170.v1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1926//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1926//console

This message is automatically generated.

 yarn proto definitions should specify package as 'hadoop.yarn'
 --

 Key: YARN-1170
 URL: https://issues.apache.org/jira/browse/YARN-1170
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Arun C Murthy
Assignee: Binglin Chang
Priority: Blocker
 Attachments: YARN-1170.v1.patch, YARN-1170.v1.patch


 yarn proto definitions should specify package as 'hadoop.yarn' similar to 
 protos with 'hadoop.common'  'hadoop.hdfs' in Common  HDFS respectively.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1001) YARN should provide per application-type and state statistics


[ 
https://issues.apache.org/jira/browse/YARN-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767159#comment-13767159
 ] 

Hadoop QA commented on YARN-1001:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12603145/YARN-1001.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1925//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1925//console

This message is automatically generated.

 YARN should provide per application-type and state statistics
 -

 Key: YARN-1001
 URL: https://issues.apache.org/jira/browse/YARN-1001
 Project: Hadoop YARN
  Issue Type: Task
  Components: api
Affects Versions: 2.1.0-beta
Reporter: Srimanth Gunturi
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-1001.1.patch, YARN-1001.2.patch, YARN-1001.3.patch, 
 YARN-1001.4.patch


 In Ambari we plan to show for MR2 the number of applications finished, 
 running, waiting, etc. It would be efficient if YARN could provide per 
 application-type and state aggregated counts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1001) YARN should provide per application-type and state statistics