date:20140731


[ 
https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080574#comment-14080574
 ] 

Zhijie Shen commented on YARN-2347:
---

+1. The last patch LGTM. Will commit the patch

 Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in 
 yarn-server-common
 

 Key: YARN-2347
 URL: https://issues.apache.org/jira/browse/YARN-2347
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2347-v2.patch, YARN-2347-v3.patch, 
 YARN-2347-v4.patch, YARN-2347-v5.patch, YARN-2347.patch


 We have similar things for version state for RM, NM, TS (TimelineServer), 
 etc. I think we should consolidate them into a common object.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2229) ContainerId can overflow with RM restart

2014-07-31 Thread Tsuyoshi OZAWA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080576#comment-14080576
 ] 

Tsuyoshi OZAWA commented on YARN-2229:
--

Thanks for the comment, Jian and Anubhav. I took same time to understand what 
we're thinking.

If we'll have a choice to preserve cluster-level compatibility for rolling 
updates, the current design(v10) is not acceptable. We should choose the first 
design as Sid and Zhijie mentioned in this case. However, before starting to 
think cluster-level compatibility at container id level, we don't support 
rolling update because ContainerToken don't have compatibility. Jian's 
suggestion cares it - we don't need to preserve cluster-level compatibility 
currently because we cannot support rolling update. Therefore, I think the 
current design(v10), is acceptable. [~sseth], [~zjshen], [~adhoot], [~jianhe], 
do you agree with v10 design based on these discussion?

 ContainerId can overflow with RM restart
 

 Key: YARN-2229
 URL: https://issues.apache.org/jira/browse/YARN-2229
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2229.1.patch, YARN-2229.10.patch, 
 YARN-2229.10.patch, YARN-2229.2.patch, YARN-2229.2.patch, YARN-2229.3.patch, 
 YARN-2229.4.patch, YARN-2229.5.patch, YARN-2229.6.patch, YARN-2229.7.patch, 
 YARN-2229.8.patch, YARN-2229.9.patch


 On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
 lower 22 bits are for sequence number of Ids. This is for preserving 
 semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
 {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
 {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
 restarts 1024 times.
 To avoid the problem, its better to make containerId long. We need to define 
 the new format of container Id with preserving backward compatibility on this 
 JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2372) There are Chinese Characters in the FairScheduler's document


[ 
https://issues.apache.org/jira/browse/YARN-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080582#comment-14080582
 ] 

Hadoop QA commented on YARN-2372:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12658851/YARN-2372.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4490//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4490//console

This message is automatically generated.

 There are Chinese Characters in the FairScheduler's document
 

 Key: YARN-2372
 URL: https://issues.apache.org/jira/browse/YARN-2372
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.4.1
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Attachments: YARN-2372.patch, YARN-2372.patch, YARN-2372.patch, 
 YARN-2372.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common


[ 
https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080587#comment-14080587
 ] 

Zhijie Shen commented on YARN-2347:
---

Sorry for raising another issues so late. When I try to commit the patch, I 
realize ShuffleHandler from MR project has a reference to Version. In this case,
{code}
@Private
@Unstable
public abstract class Version {
{code}
\@Private annotation seems not to be accurate. Moreover, other applications may 
implement their AuxiliaryService as well, right? In this case, their 
AuxiliaryService is likely to use Version as ShuffleHandler does. Therefore, 
should Version be \@Public instead, and be part of o.a.h.y.api.records in 
hadoop-yarn-api?

 Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in 
 yarn-server-common
 

 Key: YARN-2347
 URL: https://issues.apache.org/jira/browse/YARN-2347
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2347-v2.patch, YARN-2347-v3.patch, 
 YARN-2347-v4.patch, YARN-2347-v5.patch, YARN-2347.patch


 We have similar things for version state for RM, NM, TS (TimelineServer), 
 etc. I think we should consolidate them into a common object.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING

2014-07-31 Thread duanfa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080591#comment-14080591
 ] 

duanfa commented on YARN-1149:
--

i want ask : which hadoop version did you change base?  hadoop 2.0.x.x or 
hadoop 2.1.x.x
please send to my email  duanfa1...@gmail.com

thanks!!!

 NM throws InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 -

 Key: YARN-1149
 URL: https://issues.apache.org/jira/browse/YARN-1149
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ramya Sunil
Assignee: Xuan Gong
 Fix For: 2.2.0

 Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, 
 YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch, YARN-1149.7.patch, 
 YARN-1149.8.patch, YARN-1149.9.patch, YARN-1149_branch-2.1-beta.1.patch


 When nodemanager receives a kill signal when an application has finished 
 execution but log aggregation has not kicked in, 
 InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown
 {noformat}
 2013-08-25 20:45:00,875 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just 
 finished : application_1377459190746_0118
 2013-08-25 20:45:00,876 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate 
 log-file for app application_1377459190746_0118 at 
 /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp
 2013-08-25 20:45:00,876 INFO  logaggregation.LogAggregationService 
 (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation 
 to complete for application_1377459190746_0118
 2013-08-25 20:45:00,891 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for 
 container container_1377459190746_0118_01_04. Current good log dirs are 
 /tmp/yarn/local
 2013-08-25 20:45:00,915 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate 
 log-file for app application_1377459190746_0118
 2013-08-25 20:45:00,925 WARN  application.Application 
 (ApplicationImpl.java:handle(427)) - Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
  
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)   
 at java.lang.Thread.run(Thread.java:662)
 2013-08-25 20:45:00,926 INFO  application.Application 
 (ApplicationImpl.java:handle(430)) - Application 
 application_1377459190746_0118 transitioned from RUNNING to null
 2013-08-25 20:45:00,927 WARN  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(463)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
  is interrupted. Exiting.
 2013-08-25 20:45:00,938 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
 server on 8040
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically


[ 
https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080598#comment-14080598
 ] 

Xuan Gong commented on YARN-2212:
-

bq. AMS#registerApplicationMaster changes not needed.

make changes on authorizeRequest() to let it return AMRMTokenIdentifier instead 
of RMAppAttemptId. So, all the changes in AMS#registerApplicationMaster are for 
this.

bq. May not say stable now.
{code}
  @Stable
  public abstract Token getAMRMToken();
{code}

DONE

bq. ApplicationReport#getAMRMToken for unmanaged AM needs to be updated as well.

When AMRMToken is rolled up, we will update the AMRMToken for current attempt. 
So, ApplicationReport#getAMRMToken will update

bq. we can move the AMRMToken creation from RMAppAttemptImpl to AMLauncher?
 
DONE

bq. Use newInstance instead. 

DONE

bq. Test AMRMClient automatically takes care of the new AMRMToken transfer.

ADDED

bq. Please run on real cluster also and set roll-over interval to a small value 
to make sure it actually works.

tested.

 ApplicationMaster needs to find a way to update the AMRMToken periodically
 --

 Key: YARN-2212
 URL: https://issues.apache.org/jira/browse/YARN-2212
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2212.1.patch, YARN-2212.2.patch, 
 YARN-2212.3.1.patch, YARN-2212.3.patch, YARN-2212.4.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common


[ 
https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080602#comment-14080602
 ] 

Junping Du commented on YARN-2347:
--

Thanks for review and comments, [~zjshen]! That's good point and I agree it is 
possible to be used in future by other applications. However, before the real 
requirements comes in (as applications don't have to follow our practice in 
YARN for versioning), let's play safe to keep it as private as it is mostly 
used among YARN and built-in MR components. We can easily to make an API public 
from private in future, but making a public API back to private (or change 
interfaces) should never happen. So, IMO, it is better to keep it as private at 
this moment. We can open a separated JIRA (and work) to discuss more if you 
have a strong feeling to public it. Thoughts?

 Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in 
 yarn-server-common
 

 Key: YARN-2347
 URL: https://issues.apache.org/jira/browse/YARN-2347
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2347-v2.patch, YARN-2347-v3.patch, 
 YARN-2347-v4.patch, YARN-2347-v5.patch, YARN-2347.patch


 We have similar things for version state for RM, NM, TS (TimelineServer), 
 etc. I think we should consolidate them into a common object.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-1572) Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal


 [ 
https://issues.apache.org/jira/browse/YARN-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du reassigned YARN-1572:


Assignee: Wenwu Peng  (was: Junping Du)

[~gujilangzi], are you working on this? If so, assign this JIRA to you. Please 
attach the log of NPE for latest trunk, I will also help to look at it. Thx! 

 Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal
 --

 Key: YARN-1572
 URL: https://issues.apache.org/jira/browse/YARN-1572
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Wenwu Peng
Assignee: Wenwu Peng
 Attachments: conf.tar.gz, log.tar.gz


 we have lower chance to hit NPE in allocateNodeLocal  when run benchmark(hit 
 4 in 20 times).
 Steps:
 1. setup hadoop 2.2.0 environment
 2. Run for i in {1..10}; do /hadoop/hadoop-smoke/bin/hadoop jar 
 /hadoop/hadoop-smoke/share/hadoop/mapreduce/hadoop-mapreduce-client-common-*.jar
  org.apache.hadoop.fs.TestDFSIO -write -nrFiles 30 -fileSize 64MB; sleep 
 10;done
 2014-01-08 03:56:14,082 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_UPDATE to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateNodeLocal(AppSchedulingInfo.java:291)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:252)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:294)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainer(FifoScheduler.java:614)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignNodeLocalContainers(FifoScheduler.java:524)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainersOnNode(FifoScheduler.java:482)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:419)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:658)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:687)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:95)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:662)
 will attach log and configure files later
 Note: 
 My topology file:
 10.111.89.230   /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com
 10.111.89.231   /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com
 10.111.89.232   /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com
 10.111.89.239   /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com
 10.111.89.233   /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com
 10.111.89.234   /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com
 10.111.89.240   /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com
 10.111.89.236   /QE2/sin2-pekaurora-bdcqe047.eng.vmware.com
 10.111.89.241   /QE2/sin2-pekaurora-bdcqe047.eng.vmware.com
 10.111.89.238   /QE2/sin2-pekaurora-bdcqe048.eng.vmware.com
 10.111.89.242   /QE2/sin2-pekaurora-bdcqe048.eng.vmware.com



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically


 [ 
https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2212:


Attachment: YARN-2212.5.patch

 ApplicationMaster needs to find a way to update the AMRMToken periodically
 --

 Key: YARN-2212
 URL: https://issues.apache.org/jira/browse/YARN-2212
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2212.1.patch, YARN-2212.2.patch, 
 YARN-2212.3.1.patch, YARN-2212.3.patch, YARN-2212.4.patch, YARN-2212.5.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common


[ 
https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080613#comment-14080613
 ] 

Junping Du commented on YARN-2347:
--

Sounds good. Will upload a new patch soon. Thx!

 Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in 
 yarn-server-common
 

 Key: YARN-2347
 URL: https://issues.apache.org/jira/browse/YARN-2347
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2347-v2.patch, YARN-2347-v3.patch, 
 YARN-2347-v4.patch, YARN-2347-v5.patch, YARN-2347.patch


 We have similar things for version state for RM, NM, TS (TimelineServer), 
 etc. I think we should consolidate them into a common object.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common


[ 
https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080610#comment-14080610
 ] 

Zhijie Shen commented on YARN-2347:
---

Make sense. As MR has already used Version, should we at least mark Version as 
\@LimitedPrivate(\{YARN, MAPREDUCE\})?

 Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in 
 yarn-server-common
 

 Key: YARN-2347
 URL: https://issues.apache.org/jira/browse/YARN-2347
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2347-v2.patch, YARN-2347-v3.patch, 
 YARN-2347-v4.patch, YARN-2347-v5.patch, YARN-2347.patch


 We have similar things for version state for RM, NM, TS (TimelineServer), 
 etc. I think we should consolidate them into a common object.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common


 [ 
https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2347:
-

Attachment: YARN-2347-v6.patch

Address latest comments from [~zjshen] in v6 patch.

 Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in 
 yarn-server-common
 

 Key: YARN-2347
 URL: https://issues.apache.org/jira/browse/YARN-2347
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2347-v2.patch, YARN-2347-v3.patch, 
 YARN-2347-v4.patch, YARN-2347-v5.patch, YARN-2347-v6.patch, YARN-2347.patch


 We have similar things for version state for RM, NM, TS (TimelineServer), 
 etc. I think we should consolidate them into a common object.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2374) YARN trunk build failing TestDistributedShell.testDSShell

2014-07-31 Thread Varun Vasudev (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2374:


Attachment: apache-yarn-2374.0.patch

Patch with debug information added to figure out root cause.

 YARN trunk build failing TestDistributedShell.testDSShell
 -

 Key: YARN-2374
 URL: https://issues.apache.org/jira/browse/YARN-2374
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2374.0.patch


 The YARN trunk build has been failing for the last few days in the 
 distributed shell module.
 {noformat}
 testDSShell(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 27.269 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:188)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1572) Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal

2014-07-31 Thread Wenwu Peng (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenwu Peng updated YARN-1572:
-

Attachment: YARN-1572-log.tar.gz

Thanks a lot Junping! please refer to YARN-1572-log.tar.gz  for the log of NPE 
for latest trunk.

java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateNodeLocal(AppSchedulingInfo.java:311)


 Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal
 --

 Key: YARN-1572
 URL: https://issues.apache.org/jira/browse/YARN-1572
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Wenwu Peng
Assignee: Wenwu Peng
 Attachments: YARN-1572-log.tar.gz, conf.tar.gz, log.tar.gz


 we have lower chance to hit NPE in allocateNodeLocal  when run benchmark(hit 
 4 in 20 times).
 Steps:
 1. setup hadoop 2.2.0 environment
 2. Run for i in {1..10}; do /hadoop/hadoop-smoke/bin/hadoop jar 
 /hadoop/hadoop-smoke/share/hadoop/mapreduce/hadoop-mapreduce-client-common-*.jar
  org.apache.hadoop.fs.TestDFSIO -write -nrFiles 30 -fileSize 64MB; sleep 
 10;done
 2014-01-08 03:56:14,082 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_UPDATE to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateNodeLocal(AppSchedulingInfo.java:291)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:252)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:294)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainer(FifoScheduler.java:614)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignNodeLocalContainers(FifoScheduler.java:524)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainersOnNode(FifoScheduler.java:482)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:419)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:658)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:687)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:95)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:662)
 will attach log and configure files later
 Note: 
 My topology file:
 10.111.89.230   /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com
 10.111.89.231   /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com
 10.111.89.232   /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com
 10.111.89.239   /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com
 10.111.89.233   /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com
 10.111.89.234   /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com
 10.111.89.240   /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com
 10.111.89.236   /QE2/sin2-pekaurora-bdcqe047.eng.vmware.com
 10.111.89.241   /QE2/sin2-pekaurora-bdcqe047.eng.vmware.com
 10.111.89.238   /QE2/sin2-pekaurora-bdcqe048.eng.vmware.com
 10.111.89.242   /QE2/sin2-pekaurora-bdcqe048.eng.vmware.com



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2374) YARN trunk build failing TestDistributedShell.testDSShell


[ 
https://issues.apache.org/jira/browse/YARN-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080655#comment-14080655
 ] 

Hadoop QA commented on YARN-2374:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12658863/apache-yarn-2374.0.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell:

  
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4493//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4493//console

This message is automatically generated.

 YARN trunk build failing TestDistributedShell.testDSShell
 -

 Key: YARN-2374
 URL: https://issues.apache.org/jira/browse/YARN-2374
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2374.0.patch


 The YARN trunk build has been failing for the last few days in the 
 distributed shell module.
 {noformat}
 testDSShell(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 27.269 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:188)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1572) Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal

2014-07-31 Thread Wenwu Peng (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenwu Peng updated YARN-1572:
-

Description: 
we have lower chance to hit NPE in allocateNodeLocal  when run benchmark(hit 4 
in 20 times).

2014-07-31 04:18:19,653 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Assigned 
container container_1406794589275_0001_01_21 of capacity memory:1024, 
vCores:1 on host datanode10:57281, which has 6 containers, memory:6144, 
vCores:6 used and memory:2048, vCores:2 available after allocation
2014-07-31 04:18:19,654 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type NODE_UPDATE to the scheduler
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateNodeLocal(AppSchedulingInfo.java:311)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:268)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:136)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainer(FifoScheduler.java:683)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignNodeLocalContainers(FifoScheduler.java:602)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainersOnNode(FifoScheduler.java:560)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:488)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:729)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:774)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:101)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:599)
at java.lang.Thread.run(Thread.java:662)
2014-07-31 04:18:19,655 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



  was:
we have lower chance to hit NPE in allocateNodeLocal  when run benchmark(hit 4 
in 20 times).

Steps:
1. setup hadoop 2.2.0 environment
2. Run for i in {1..10}; do /hadoop/hadoop-smoke/bin/hadoop jar 
/hadoop/hadoop-smoke/share/hadoop/mapreduce/hadoop-mapreduce-client-common-*.jar
 org.apache.hadoop.fs.TestDFSIO -write -nrFiles 30 -fileSize 64MB; sleep 10;done


2014-01-08 03:56:14,082 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type NODE_UPDATE to the scheduler
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateNodeLocal(AppSchedulingInfo.java:291)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:252)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:294)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainer(FifoScheduler.java:614)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignNodeLocalContainers(FifoScheduler.java:524)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainersOnNode(FifoScheduler.java:482)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:419)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:658)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:687)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:95)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
at java.lang.Thread.run(Thread.java:662)

will attach log and configure files later

Note: 
My topology file:
10.111.89.230   /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com
10.111.89.231   /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com
10.111.89.232   /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com
10.111.89.239   /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com
10.111.89.233   /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com
10.111.89.234   /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com
10.111.89.240   /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com
10.111.89.236   /QE2/sin2-pekaurora-bdcqe047.eng.vmware.com
10.111.89.241

[jira] [Commented] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically

[
https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080665#comment-14080665
]

Hadoop QA commented on YARN-2212:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12658859/YARN-2212.5.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 7 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebApp
org.apache.hadoop.yarn.client.TestResourceTrackerOnHA

org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnAMRMTokenRollOver
org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA
org.apache.hadoop.yarn.client.TestRMFailover
org.apache.hadoop.yarn.client.api.impl.TestAMRMClient
org.apache.hadoop.yarn.client.api.impl.TestNMClient
org.apache.hadoop.yarn.client.TestGetGroups

org.apache.hadoop.yarn.client.TestResourceManagerAdministrationProtocolPBClientImpl

org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
org.apache.hadoop.yarn.client.api.impl.TestYarnClient

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication

org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication

org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs

org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens

org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore

org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps

org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService
org.apache.hadoop.yarn.server.resourcemanager.TestRMHA

org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/4491//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4491//console

This message is automatically generated.

ApplicationMaster needs to find a way to update the AMRMToken periodically
--

Key: YARN-2212
URL: https://issues.apache.org/jira/browse/YARN-2212
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
Attachments: YARN-2212.1.patch, YARN-2212.2.patch,
YARN-2212.3.1.patch, YARN-2212.3.patch, YARN-2212.4.patch, YARN-2212.5.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common


[ 
https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080669#comment-14080669
 ] 

Hadoop QA commented on YARN-2347:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12658862/YARN-2347-v6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4492//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4492//console

This message is automatically generated.

 Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in 
 yarn-server-common
 

 Key: YARN-2347
 URL: https://issues.apache.org/jira/browse/YARN-2347
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2347-v2.patch, YARN-2347-v3.patch, 
 YARN-2347-v4.patch, YARN-2347-v5.patch, YARN-2347-v6.patch, YARN-2347.patch


 We have similar things for version state for RM, NM, TS (TimelineServer), 
 etc. I think we should consolidate them into a common object.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2372) There are Chinese Characters in the FairScheduler's document

2014-07-31 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated YARN-2372:
--

Attachment: YARN-2372.patch

 There are Chinese Characters in the FairScheduler's document
 

 Key: YARN-2372
 URL: https://issues.apache.org/jira/browse/YARN-2372
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.4.1
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Attachments: YARN-2372.patch, YARN-2372.patch, YARN-2372.patch, 
 YARN-2372.patch, YARN-2372.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common


[ 
https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080680#comment-14080680
 ] 

Hudson commented on YARN-2347:
--

FAILURE: Integrated in Hadoop-trunk-Commit #5991 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5991/])
YARN-2347. Consolidated RMStateVersion and NMDBSchemaVersion into Version in 
yarn-server-common. Contributed by Junping Du. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1614838)
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/Version.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/impl
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/impl/pb
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/impl/pb/VersionPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/records
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/proto/yarn_server_nodemanager_recovery.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/RMStateVersion.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/RMStateVersionPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java


 Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in 
 yarn-server-common

[jira] [Commented] (YARN-2372) There are Chinese Characters in the FairScheduler's document


[ 
https://issues.apache.org/jira/browse/YARN-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080689#comment-14080689
 ] 

Hadoop QA commented on YARN-2372:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12658877/YARN-2372.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4494//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4494//console

This message is automatically generated.

 There are Chinese Characters in the FairScheduler's document
 

 Key: YARN-2372
 URL: https://issues.apache.org/jira/browse/YARN-2372
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.4.1
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Attachments: YARN-2372.patch, YARN-2372.patch, YARN-2372.patch, 
 YARN-2372.patch, YARN-2372.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2051) Fix code bug and add more unit tests for PBImpls


[ 
https://issues.apache.org/jira/browse/YARN-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080716#comment-14080716
 ] 

Junping Du commented on YARN-2051:
--

Hi [~decster], thanks for working on this. I will review your patch ASAP.

 Fix code bug and add more unit tests for PBImpls
 

 Key: YARN-2051
 URL: https://issues.apache.org/jira/browse/YARN-2051
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Junping Du
Assignee: Binglin Chang
Priority: Critical
 Attachments: YARN-2051.v1.patch


 From YARN-2016, we can see some bug could exist in PB implementation of 
 protocol. The bad news is most of these PBImpl don't have any unit test to 
 verify the info is not lost or changed after serialization/deserialization. 
 We should add more tests for it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2051) Fix code bug and add more unit tests for PBImpls


[ 
https://issues.apache.org/jira/browse/YARN-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080719#comment-14080719
 ] 

Junping Du commented on YARN-2051:
--

Forget to mention, +1 on the idea for testing these PB objects automatically. I 
love it so much! :) 

 Fix code bug and add more unit tests for PBImpls
 

 Key: YARN-2051
 URL: https://issues.apache.org/jira/browse/YARN-2051
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Junping Du
Assignee: Binglin Chang
Priority: Critical
 Attachments: YARN-2051.v1.patch


 From YARN-2016, we can see some bug could exist in PB implementation of 
 protocol. The bad news is most of these PBImpl don't have any unit test to 
 verify the info is not lost or changed after serialization/deserialization. 
 We should add more tests for it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common


[ 
https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080722#comment-14080722
 ] 

Hudson commented on YARN-2347:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #629 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/629/])
YARN-2347. Consolidated RMStateVersion and NMDBSchemaVersion into Version in 
yarn-server-common. Contributed by Junping Du. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1614838)
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/Version.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/impl
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/impl/pb
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/impl/pb/VersionPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/records
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/proto/yarn_server_nodemanager_recovery.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/RMStateVersion.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/RMStateVersionPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java


 Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in 
 yarn-server-common

[jira] [Commented] (YARN-2051) Fix code bug and add more unit tests for PBImpls


[ 
https://issues.apache.org/jira/browse/YARN-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080727#comment-14080727
 ] 

Junping Du commented on YARN-2051:
--

Not sure if patch is still updated, manually kick off Jenkins test again.

 Fix code bug and add more unit tests for PBImpls
 

 Key: YARN-2051
 URL: https://issues.apache.org/jira/browse/YARN-2051
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Junping Du
Assignee: Binglin Chang
Priority: Critical
 Attachments: YARN-2051.v1.patch


 From YARN-2016, we can see some bug could exist in PB implementation of 
 protocol. The bad news is most of these PBImpl don't have any unit test to 
 verify the info is not lost or changed after serialization/deserialization. 
 We should add more tests for it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2372) There are Chinese Characters in the FairScheduler's document


[ 
https://issues.apache.org/jira/browse/YARN-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080735#comment-14080735
 ] 

Junping Du commented on YARN-2372:
--

Nice catch, [~azuryy]! Actually, these are special punctuation from Chinese 
input which is hardly to find. 
+1 on the patch. [~azuryy], any more places with the same issue? If not, I will 
commit it shortly.

 There are Chinese Characters in the FairScheduler's document
 

 Key: YARN-2372
 URL: https://issues.apache.org/jira/browse/YARN-2372
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.4.1
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Attachments: YARN-2372.patch, YARN-2372.patch, YARN-2372.patch, 
 YARN-2372.patch, YARN-2372.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2051) Fix code bug and add more unit tests for PBImpls


[ 
https://issues.apache.org/jira/browse/YARN-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080750#comment-14080750
 ] 

Hadoop QA commented on YARN-2051:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12655677/YARN-2051.v1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4495//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4495//console

This message is automatically generated.

 Fix code bug and add more unit tests for PBImpls
 

 Key: YARN-2051
 URL: https://issues.apache.org/jira/browse/YARN-2051
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Junping Du
Assignee: Binglin Chang
Priority: Critical
 Attachments: YARN-2051.v1.patch


 From YARN-2016, we can see some bug could exist in PB implementation of 
 protocol. The bad news is most of these PBImpl don't have any unit test to 
 verify the info is not lost or changed after serialization/deserialization. 
 We should add more tests for it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2051) Fix code bug and add more unit tests for PBImpls


[ 
https://issues.apache.org/jira/browse/YARN-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080780#comment-14080780
 ] 

Junping Du commented on YARN-2051:
--

Again, good work, [~decster]!
Some comments below, most of them are trivial:

{code}
+System.out.printf(Validate %s %s\n, recordClass.getName(),
+protoClass.getName());
{code}
Please replace this and other places that try to print to console with LOG.

{code}
+ret = Sets.newHashSet(genTypeValue(params[0]));
{code}
Please remove unnecessary space in the end of this line.

{code}
throw new IllegalArgumentException(type not support:  + type);
{code}
May be type:  + type +  is not supported is more readable?

{code}
+  private static Object genByNewInstance(Class clazz) throws Exception {
{code}
generateNewInstance() sounds like a better name?

{code}
ret = newInstance.invoke(null, args);
{code}
The code here has risk of NPE if newInstance method is not found previously (it 
is possible, as newInstance() method is not forced to have, although most class 
obey this rule). Better to add some exception handling here.

{code}
+  } else if (clazz.equals(ByteBuffer.class)) {
+// return new ByteBuffer every time
+// to prevent potential side effects
+return ByteBuffer.allocate(4);
+  }
{code}
What's reasonable value we generate here for ByteBuffer? Just empty. Isn't it?


 Fix code bug and add more unit tests for PBImpls
 

 Key: YARN-2051
 URL: https://issues.apache.org/jira/browse/YARN-2051
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Junping Du
Assignee: Binglin Chang
Priority: Critical
 Attachments: YARN-2051.v1.patch


 From YARN-2016, we can see some bug could exist in PB implementation of 
 protocol. The bad news is most of these PBImpl don't have any unit test to 
 verify the info is not lost or changed after serialization/deserialization. 
 We should add more tests for it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2283) RM failed to release the AM container

2014-07-31 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080804#comment-14080804
 ] 

Sunil G commented on YARN-2283:
---

Seems to be duplicate to MAPREDUCE-5888 
[~jlowe] cud u pls confirm whether its the same issue.

 RM failed to release the AM container
 -

 Key: YARN-2283
 URL: https://issues.apache.org/jira/browse/YARN-2283
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
 Environment: NM1: AM running
 NM2: Map task running
 mapreduce.map.maxattempts=1
Reporter: Nishan Shetty
Priority: Critical

 During container stability test i faced this problem
 While job is running map task got killed
 Observe that eventhough application is FAILED MRAppMaster process is running 
 till timeout because RM did not release  the AM container
 {code}
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_1405318134611_0002_01_05 Container Transitioned from RUNNING to 
 COMPLETED
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
  Completed container: container_1405318134611_0002_01_05 in state: 
 COMPLETED event:FINISHED
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=testos 
 OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS  
 APPID=application_1405318134611_0002
 CONTAINERID=container_1405318134611_0002_01_05
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore:
  Finish information of container container_1405318134611_0002_01_05 is 
 written
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter: 
 Stored the finish data of container container_1405318134611_0002_01_05
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode:
  Released container container_1405318134611_0002_01_05 of capacity 
 memory:1024, vCores:1 on host HOST-10-18-40-153:45026, which currently has 
 1 containers, memory:2048, vCores:1 used and memory:6144, vCores:7 
 available, release resources=true
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
 default used=memory:2048, vCores:1 numContainers=1 user=testos 
 user-resources=memory:2048, vCores:1
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
 completedContainer container=Container: [ContainerId: 
 container_1405318134611_0002_01_05, NodeId: HOST-10-18-40-153:45026, 
 NodeHttpAddress: HOST-10-18-40-153:45025, Resource: memory:1024, vCores:1, 
 Priority: 5, Token: Token { kind: ContainerToken, service: 10.18.40.153:45026 
 }, ] queue=default: capacity=1.0, absoluteCapacity=1.0, 
 usedResources=memory:2048, vCores:1, usedCapacity=0.25, 
 absoluteUsedCapacity=0.25, numApps=1, numContainers=1 cluster=memory:8192, 
 vCores:8
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 completedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 
 used=memory:2048, vCores:1 cluster=memory:8192, vCores:8
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Re-sorting completed queue: root.default stats: default: capacity=1.0, 
 absoluteCapacity=1.0, usedResources=memory:2048, vCores:1, 
 usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=1
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application attempt appattempt_1405318134611_0002_01 released container 
 container_1405318134611_0002_01_05 on node: host: HOST-10-18-40-153:45026 
 #containers=1 available=6144 used=2048 with event: FINISHED
 2014-07-14 14:43:34,924 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Updating application attempt appattempt_1405318134611_0002_01 with final 
 state: FINISHING
 2014-07-14 14:43:34,924 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1405318134611_0002_01 State change from RUNNING to FINAL_SAVING
 2014-07-14 14:43:34,924 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating 
 application application_1405318134611_0002 with final state: FINISHING
 2014-07-14 14:43:34,947 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: NodeDataChanged with state:SyncConnected for

[jira] [Commented] (YARN-2374) YARN trunk build failing TestDistributedShell.testDSShell

2014-07-31 Thread Varun Vasudev (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080805#comment-14080805
 ] 

Varun Vasudev commented on YARN-2374:
-

From Jenkins:
{noformat}
TestDistributedShell.testDSShell:193 Expected host name to start with 
'asf905.gq1.ygridcore.net/67.195.81.149', was 'asf905/67.195.81.149'. Expected 
rpc port to be '-1', was '-1'.
{noformat}

It looks like the calls to NetUtils.getHostName() can return a short name or a 
fully qualified domain name. I'm not sure how to resolve this. The test code 
and the code in distributed shell app master call NetUtils.getHostName() and 
are getting different results. One solution could be to modify both the 
distributed shell app master and the test to use fully qualified domain names, 
but I'm open to suggestions.

 YARN trunk build failing TestDistributedShell.testDSShell
 -

 Key: YARN-2374
 URL: https://issues.apache.org/jira/browse/YARN-2374
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2374.0.patch


 The YARN trunk build has been failing for the last few days in the 
 distributed shell module.
 {noformat}
 testDSShell(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 27.269 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:188)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2051) Fix code bug and add more unit tests for PBImpls


[ 
https://issues.apache.org/jira/browse/YARN-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080829#comment-14080829
 ] 

Hadoop QA commented on YARN-2051:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12658915/YARN-2051.v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4496//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4496//console

This message is automatically generated.

 Fix code bug and add more unit tests for PBImpls
 

 Key: YARN-2051
 URL: https://issues.apache.org/jira/browse/YARN-2051
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Junping Du
Assignee: Binglin Chang
Priority: Critical
 Attachments: YARN-2051.v1.patch, YARN-2051.v2.patch


 From YARN-2016, we can see some bug could exist in PB implementation of 
 protocol. The bad news is most of these PBImpl don't have any unit test to 
 verify the info is not lost or changed after serialization/deserialization. 
 We should add more tests for it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (YARN-2283) RM failed to release the AM container

2014-07-31 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe resolved YARN-2283.
--

Resolution: Duplicate

Yes, it is very likely a duplicate of MAPREDUCE-5888, especially since it no 
longer reproduces on later releases.  Resolving as a duplicate.

The RM is not failing to release the container, rather the RM is intentionally 
giving the AM some time to clean things up after unregistering (i.e.: the 
FINISHING state).  Unfortunately before MAPREDUCE-5888 was fixed the AM could 
hang during a failed job because of a non-daemon thread that was lingering 
around and preventing the JVM from shutting down.  The RM eventually decides 
that the AM has used too much time to cleanup and kills it.

 RM failed to release the AM container
 -

 Key: YARN-2283
 URL: https://issues.apache.org/jira/browse/YARN-2283
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
 Environment: NM1: AM running
 NM2: Map task running
 mapreduce.map.maxattempts=1
Reporter: Nishan Shetty
Priority: Critical

 During container stability test i faced this problem
 While job is running map task got killed
 Observe that eventhough application is FAILED MRAppMaster process is running 
 till timeout because RM did not release  the AM container
 {code}
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_1405318134611_0002_01_05 Container Transitioned from RUNNING to 
 COMPLETED
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
  Completed container: container_1405318134611_0002_01_05 in state: 
 COMPLETED event:FINISHED
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=testos 
 OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS  
 APPID=application_1405318134611_0002
 CONTAINERID=container_1405318134611_0002_01_05
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore:
  Finish information of container container_1405318134611_0002_01_05 is 
 written
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter: 
 Stored the finish data of container container_1405318134611_0002_01_05
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode:
  Released container container_1405318134611_0002_01_05 of capacity 
 memory:1024, vCores:1 on host HOST-10-18-40-153:45026, which currently has 
 1 containers, memory:2048, vCores:1 used and memory:6144, vCores:7 
 available, release resources=true
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
 default used=memory:2048, vCores:1 numContainers=1 user=testos 
 user-resources=memory:2048, vCores:1
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
 completedContainer container=Container: [ContainerId: 
 container_1405318134611_0002_01_05, NodeId: HOST-10-18-40-153:45026, 
 NodeHttpAddress: HOST-10-18-40-153:45025, Resource: memory:1024, vCores:1, 
 Priority: 5, Token: Token { kind: ContainerToken, service: 10.18.40.153:45026 
 }, ] queue=default: capacity=1.0, absoluteCapacity=1.0, 
 usedResources=memory:2048, vCores:1, usedCapacity=0.25, 
 absoluteUsedCapacity=0.25, numApps=1, numContainers=1 cluster=memory:8192, 
 vCores:8
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 completedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 
 used=memory:2048, vCores:1 cluster=memory:8192, vCores:8
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Re-sorting completed queue: root.default stats: default: capacity=1.0, 
 absoluteCapacity=1.0, usedResources=memory:2048, vCores:1, 
 usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=1
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application attempt appattempt_1405318134611_0002_01 released container 
 container_1405318134611_0002_01_05 on node: host: HOST-10-18-40-153:45026 
 #containers=1 available=6144 used=2048 with event: FINISHED
 2014-07-14 14:43:34,924 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Updating application attempt appattempt_1405318134611_0002_01 with final 
 state: FINISHING
 2014-07-14 14:43:34,924 INFO

[jira] [Commented] (YARN-2374) YARN trunk build failing TestDistributedShell.testDSShell

2014-07-31 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080914#comment-14080914
 ] 

Naganarasimha G R commented on YARN-2374:
-

I dont know how much this might help . refer this link with jvm bug 
http://bugs.java.com/view_bug.do?bug_id=7166687;. 

 YARN trunk build failing TestDistributedShell.testDSShell
 -

 Key: YARN-2374
 URL: https://issues.apache.org/jira/browse/YARN-2374
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2374.0.patch


 The YARN trunk build has been failing for the last few days in the 
 distributed shell module.
 {noformat}
 testDSShell(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 27.269 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:188)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common


[ 
https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080922#comment-14080922
 ] 

Hudson commented on YARN-2347:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1823 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1823/])
YARN-2347. Consolidated RMStateVersion and NMDBSchemaVersion into Version in 
yarn-server-common. Contributed by Junping Du. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1614838)
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/Version.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/impl
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/impl/pb
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/impl/pb/VersionPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/records
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/proto/yarn_server_nodemanager_recovery.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/RMStateVersion.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/RMStateVersionPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java


 Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in 
 yarn-server-common

[jira] [Commented] (YARN-2371) Wrong NMToken is issued when NM preserving restarts with containers running


[ 
https://issues.apache.org/jira/browse/YARN-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080925#comment-14080925
 ] 

Junping Du commented on YARN-2371:
--

Nice finding, [~zhiguohong]! The fix here looks reasonable to me. It reminds me 
that we also have recently changes to replace checking appAttemptID with 
checking appID in authorizing NMToken for the similar reason. For unit test, I 
suggest to have a separated test method or at least  separated code segment for 
your case with proper document on scenario of cases.

 Wrong NMToken is issued when NM preserving restarts with containers running
 ---

 Key: YARN-2371
 URL: https://issues.apache.org/jira/browse/YARN-2371
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
 Attachments: YARN-2371.patch


 When application is submitted with 
 ApplicationSubmissionContext.getKeepContainersAcrossApplicationAttempts() == 
 true, and NM is restarted with containers running, wrong NMToken is issued 
 to AM through RegisterApplicationMasterResponse.
 See the NM log:
 {code}
 2014-07-30 11:59:58,941 ERROR 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
  Unauthorized request to start container.-
 NMToken for application attempt : appattempt_1406691610864_0002_01 was 
 used for starting container with container token issued for application 
 attempt : appattempt_1406691610864_0002_02
 {code}
 The reason is in below code:
 {code} 
 createAndGetNMToken(String applicationSubmitter,
   ApplicationAttemptId appAttemptId, Container container) {
   ..
   Token token =
   createNMToken(container.getId().getApplicationAttemptId(),
 container.getNodeId(), applicationSubmitter);
  ..
 }
 {code} 
 appAttemptId instead of container.getId().getApplicationAttemptId() 
 should be passed to createNMToken.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2371) Wrong NMToken is issued when NM preserving restarts with containers running


[ 
https://issues.apache.org/jira/browse/YARN-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080931#comment-14080931
 ] 

Junping Du commented on YARN-2371:
--

Look at the code on trunk again. It looks like the check is already on appID 
rather than appAttemptID, so exception in description above shouldn't happen on 
latest trunk if only appAttemptID is different. [~zhiguohong], are you using 
trunk to have this exception or some previous released version?
{code}
if (!nmTokenIdentifier.getApplicationAttemptId().getApplicationId().equals(
containerId.getApplicationAttemptId().getApplicationId())) {
  unauthorized = true;
  messageBuilder.append(\nNMToken for application attempt : )
.append(nmTokenIdentifier.getApplicationAttemptId())
.append( was used for starting container with container token)
.append( issued for application attempt : )
.append(containerId.getApplicationAttemptId());
}
{code}
Though, the message should be improved to reflect applicationID but not 
attemptID.

 Wrong NMToken is issued when NM preserving restarts with containers running
 ---

 Key: YARN-2371
 URL: https://issues.apache.org/jira/browse/YARN-2371
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
 Attachments: YARN-2371.patch


 When application is submitted with 
 ApplicationSubmissionContext.getKeepContainersAcrossApplicationAttempts() == 
 true, and NM is restarted with containers running, wrong NMToken is issued 
 to AM through RegisterApplicationMasterResponse.
 See the NM log:
 {code}
 2014-07-30 11:59:58,941 ERROR 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
  Unauthorized request to start container.-
 NMToken for application attempt : appattempt_1406691610864_0002_01 was 
 used for starting container with container token issued for application 
 attempt : appattempt_1406691610864_0002_02
 {code}
 The reason is in below code:
 {code} 
 createAndGetNMToken(String applicationSubmitter,
   ApplicationAttemptId appAttemptId, Container container) {
   ..
   Token token =
   createNMToken(container.getId().getApplicationAttemptId(),
 container.getNodeId(), applicationSubmitter);
  ..
 }
 {code} 
 appAttemptId instead of container.getId().getApplicationAttemptId() 
 should be passed to createNMToken.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-07-31 Thread Vinod Kumar Vavilapalli (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080936#comment-14080936
]

Vinod Kumar Vavilapalli commented on YARN-2198:
---

Skimmed through the Windows native code and the common changes, look fine
overall. Hoping someone with Windows knowledge ([~ivanmi]?) look at the native
code and someone else ([~cnauroth]?) at the common changes more carefully.

Reviewed the patch with focus on the YARN changes. Some comments follow..

bq. With a helper service the nodemanager no longer gets a free lunch of
accessing the task stdout/stderr
The NM never explicitly reads the stdout/stderr from the container, the streams
are redirected today to their own log files according as the user's code
dictates (for e.g in linux bash -c user-command.sh 1 stderr 2stdout). Do we
need to do this in the WintuilsProcessStubExecutor ?

The LinuxContainerExecutor reads the configuration from a
container-executor.cfg. We may want to unify the configuration for the
executors if in another JIRA.

Rename hadoopwinutilsvc* interfaces, file-names, classes to be something like
WindowsContainerLauncherService or similar to be explicit?

Not sure to me from the patch as to how the service's port is configured. Is it
at the start time or through some configuration?

bq. 1. Service Access check.
Sorry for repeating what you said but if I understand correctly, we need two
things (1) restricting users who can launch the special service and (2)
restricting callers who can invoke the RPCs. So, this is done by the
combination of the OS doing the authentication and the authorization being
explicitly done by the service using the allowed list. Right?

Remove the need to run NodeManager as privileged account for Windows Secure
Container Executor
--

Key: YARN-2198
URL: https://issues.apache.org/jira/browse/YARN-2198
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Remus Rusanu
Assignee: Remus Rusanu
Labels: security, windows
Attachments: YARN-2198.1.patch, YARN-2198.2.patch

YARN-1972 introduces a Secure Windows Container Executor. However this
executor requires a the process launching the container to be LocalSystem or
a member of the a local Administrators group. Since the process in question
is the NodeManager, the requirement translates to the entire NM to run as a
privileged account, a very large surface area to review and protect.
This proposal is to move the privileged operations into a dedicated NT
service. The NM can run as a low privilege account and communicate with the
privileged NT service when it needs to launch a container. This would reduce
the surface exposed to the high privileges.
There has to exist a secure, authenticated and authorized channel of
communication between the NM and the privileged NT service. Possible
alternatives are a new TCP endpoint, Java RPC etc. My proposal though would
be to use Windows LPC (Local Procedure Calls), which is a Windows platform
specific inter-process communication channel that satisfies all requirements
and is easy to deploy. The privileged NT service would register and listen on
an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop
with libwinutils which would host the LPC client code. The client would
connect to the LPC port (NtConnectPort) and send a message requesting a
container launch (NtRequestWaitReplyPort). LPC provides authentication and
the privileged NT service can use authorization API (AuthZ) to validate the
caller.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2051) Fix code bug and add more unit tests for PBImpls


[ 
https://issues.apache.org/jira/browse/YARN-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080934#comment-14080934
 ] 

Junping Du commented on YARN-2051:
--

+1. Patch looks good to me. Will commit it tomorrow if no more feedback from 
others.

 Fix code bug and add more unit tests for PBImpls
 

 Key: YARN-2051
 URL: https://issues.apache.org/jira/browse/YARN-2051
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Junping Du
Assignee: Binglin Chang
Priority: Critical
 Attachments: YARN-2051.v1.patch, YARN-2051.v2.patch


 From YARN-2016, we can see some bug could exist in PB implementation of 
 protocol. The bad news is most of these PBImpl don't have any unit test to 
 verify the info is not lost or changed after serialization/deserialization. 
 We should add more tests for it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-07-31 Thread Vinod Kumar Vavilapalli (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080941#comment-14080941
]

Vinod Kumar Vavilapalli commented on YARN-2198:
---

Also, a nit: WintuilsProcessStubExecutor.assumeComplete - assertComplete?

Remove the need to run NodeManager as privileged account for Windows Secure
Container Executor
--

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common


[ 
https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080961#comment-14080961
 ] 

Hudson commented on YARN-2347:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1848 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1848/])
YARN-2347. Consolidated RMStateVersion and NMDBSchemaVersion into Version in 
yarn-server-common. Contributed by Junping Du. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1614838)
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/Version.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/impl
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/impl/pb
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/impl/pb/VersionPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/records
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/proto/yarn_server_nodemanager_recovery.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/RMStateVersion.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/RMStateVersionPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java


 Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in 
 yarn-server-common

[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces


[ 
https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081025#comment-14081025
 ] 

Craig Welch commented on YARN-1994:
---

[~xgong] [~arpitagarwal] [~mipoto] patch .15 should be good to go - please take 
a look.  This is the .11 patch Xuan and Arpit already +1ed with the following 
two changes:
Milan's logic to support overriding the hostname in bind-host + service address 
cases added back it - factored slightly differently to insure it does not 
change behavior unless these have been configured, and moved to overloaded 
methods in Configuration where the base logic resides.  The only other change 
was that I  also moved the getSocketAddr to Configuration as well, I had wanted 
to do this originally to bring it closer to the original code - I didn't 
bother, but since I was making changes/retesting anyway, I went ahead and did 
it.  The new tests were changed to match.  [~mipoto], I successfully tested 
this with an introduced hostname which was not the base hostname of the 
box, and it worked as desired (this overrode the used name/connect address 
based on bind-host + address configuration to the introduced hostname)

 Expose YARN/MR endpoints on multiple interfaces
 ---

 Key: YARN-1994
 URL: https://issues.apache.org/jira/browse/YARN-1994
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Arpit Agarwal
Assignee: Craig Welch
 Attachments: YARN-1994.0.patch, YARN-1994.1.patch, 
 YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, 
 YARN-1994.13.patch, YARN-1994.14.patch, YARN-1994.15.patch, 
 YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, 
 YARN-1994.6.patch, YARN-1994.7.patch


 YARN and MapReduce daemons currently do not support specifying a wildcard 
 address for the server endpoints. This prevents the endpoints from being 
 accessible from all interfaces on a multihomed machine.
 Note that if we do specify INADDR_ANY for any of the options, it will break 
 clients as they will attempt to connect to 0.0.0.0. We need a solution that 
 allows specifying a hostname or IP-address for clients while requesting 
 wildcard bind for the servers.
 (List of endpoints is in a comment below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2304) TestWebServices fails intermittently


[ 
https://issues.apache.org/jira/browse/YARN-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081029#comment-14081029
 ] 

Zhijie Shen commented on YARN-2304:
---

It seems that the test failures don't happen any more. Shall we close the jira?

 Test*WebServices* fails intermittently
 --

 Key: YARN-2304
 URL: https://issues.apache.org/jira/browse/YARN-2304
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Tsuyoshi OZAWA
 Attachments: test-failure-log-RMWeb.txt


 TestNMWebService, TestRMWebService, and TestAMWebService get failed with 
 address already get bind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically


[ 
https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081078#comment-14081078
 ] 

Xuan Gong commented on YARN-2212:
-

submit the same patch

 ApplicationMaster needs to find a way to update the AMRMToken periodically
 --

 Key: YARN-2212
 URL: https://issues.apache.org/jira/browse/YARN-2212
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2212.1.patch, YARN-2212.2.patch, 
 YARN-2212.3.1.patch, YARN-2212.3.patch, YARN-2212.4.patch, YARN-2212.5.patch, 
 YARN-2212.5.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically


 [ 
https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2212:


Attachment: YARN-2212.5.patch

 ApplicationMaster needs to find a way to update the AMRMToken periodically
 --

 Key: YARN-2212
 URL: https://issues.apache.org/jira/browse/YARN-2212
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2212.1.patch, YARN-2212.2.patch, 
 YARN-2212.3.1.patch, YARN-2212.3.patch, YARN-2212.4.patch, YARN-2212.5.patch, 
 YARN-2212.5.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically


[ 
https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081090#comment-14081090
 ] 

Hadoop QA commented on YARN-2212:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12658944/YARN-2212.5.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4497//console

This message is automatically generated.

 ApplicationMaster needs to find a way to update the AMRMToken periodically
 --

 Key: YARN-2212
 URL: https://issues.apache.org/jira/browse/YARN-2212
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2212.1.patch, YARN-2212.2.patch, 
 YARN-2212.3.1.patch, YARN-2212.3.patch, YARN-2212.4.patch, YARN-2212.5.patch, 
 YARN-2212.5.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2033) Investigate merging generic-history into the Timeline Store


 [ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2033:
--

Attachment: YARN-2033.4.patch

Rebase against the latest trunk, and fix some bugs

 Investigate merging generic-history into the Timeline Store
 ---

 Key: YARN-2033
 URL: https://issues.apache.org/jira/browse/YARN-2033
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
 YARN-2033.1.patch, YARN-2033.2.patch, YARN-2033.3.patch, YARN-2033.4.patch, 
 YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch, 
 YARN-2033_ALL.3.patch, YARN-2033_ALL.4.patch


 Having two different stores isn't amicable to generic insights on what's 
 happening with applications. This is to investigate porting generic-history 
 into the Timeline Store.
 One goal is to try and retain most of the client side interfaces as close to 
 what we have today.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2033) Investigate merging generic-history into the Timeline Store


 [ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2033:
--

Attachment: YARN-2033_ALL.4.patch

 Investigate merging generic-history into the Timeline Store
 ---

 Key: YARN-2033
 URL: https://issues.apache.org/jira/browse/YARN-2033
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
 YARN-2033.1.patch, YARN-2033.2.patch, YARN-2033.3.patch, YARN-2033.4.patch, 
 YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch, 
 YARN-2033_ALL.3.patch, YARN-2033_ALL.4.patch


 Having two different stores isn't amicable to generic insights on what's 
 happening with applications. This is to investigate porting generic-history 
 into the Timeline Store.
 One goal is to try and retain most of the client side interfaces as close to 
 what we have today.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits

2014-07-31 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081193#comment-14081193
 ] 

Mayank Bansal commented on YARN-2069:
-

Hi [~wangda] ,

Thanks for your review comments.

Updating the patch with the fix.

Thanks,
Mayank

 CS queue level preemption should respect user-limits
 

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, 
 YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, 
 YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch, YARN-2069-trunk-8.patch


 This is different from (even if related to, and likely share code with) 
 YARN-2113.
 YARN-2113 focuses on making sure that even if queue has its guaranteed 
 capacity, it's individual users are treated in-line with their limits 
 irrespective of when they join in.
 This JIRA is about respecting user-limits while preempting containers to 
 balance queue capacities.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2069) CS queue level preemption should respect user-limits

2014-07-31 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2069:


Attachment: YARN-2069-trunk-8.patch

 CS queue level preemption should respect user-limits
 

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, 
 YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, 
 YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch, YARN-2069-trunk-8.patch


 This is different from (even if related to, and likely share code with) 
 YARN-2113.
 YARN-2113 focuses on making sure that even if queue has its guaranteed 
 capacity, it's individual users are treated in-line with their limits 
 irrespective of when they join in.
 This JIRA is about respecting user-limits while preempting containers to 
 balance queue capacities.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces


[ 
https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081203#comment-14081203
 ] 

Xuan Gong commented on YARN-1994:
-

[~mipoto] Do you have any other comments for this ?

 Expose YARN/MR endpoints on multiple interfaces
 ---

 Key: YARN-1994
 URL: https://issues.apache.org/jira/browse/YARN-1994
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Arpit Agarwal
Assignee: Craig Welch
 Attachments: YARN-1994.0.patch, YARN-1994.1.patch, 
 YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, 
 YARN-1994.13.patch, YARN-1994.14.patch, YARN-1994.15.patch, 
 YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, 
 YARN-1994.6.patch, YARN-1994.7.patch


 YARN and MapReduce daemons currently do not support specifying a wildcard 
 address for the server endpoints. This prevents the endpoints from being 
 accessible from all interfaces on a multihomed machine.
 Note that if we do specify INADDR_ANY for any of the options, it will break 
 clients as they will attempt to connect to 0.0.0.0. We need a solution that 
 allows specifying a hostname or IP-address for clients while requesting 
 wildcard bind for the servers.
 (List of endpoints is in a comment below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces


[ 
https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081207#comment-14081207
 ] 

Craig Welch commented on YARN-1994:
---

[~mipoto] Can you take a look at the latest patch?

 Expose YARN/MR endpoints on multiple interfaces
 ---

 Key: YARN-1994
 URL: https://issues.apache.org/jira/browse/YARN-1994
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Arpit Agarwal
Assignee: Craig Welch
 Attachments: YARN-1994.0.patch, YARN-1994.1.patch, 
 YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, 
 YARN-1994.13.patch, YARN-1994.14.patch, YARN-1994.15.patch, 
 YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, 
 YARN-1994.6.patch, YARN-1994.7.patch


 YARN and MapReduce daemons currently do not support specifying a wildcard 
 address for the server endpoints. This prevents the endpoints from being 
 accessible from all interfaces on a multihomed machine.
 Note that if we do specify INADDR_ANY for any of the options, it will break 
 clients as they will attempt to connect to 0.0.0.0. We need a solution that 
 allows specifying a hostname or IP-address for clients while requesting 
 wildcard bind for the servers.
 (List of endpoints is in a comment below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic

2014-07-31 Thread Carlo Curino (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081234#comment-14081234
]

Carlo Curino commented on YARN-1707:

Agreed on all of the above.
{quote}
I think for moving application across queue is not a ReservationSystem specific
change. I would suggest to check it will not violate restrictions in target
queue before moving it.
{quote}

This makes sense, we should compile a list of invariant to check for (I have a
few in mind, but feedback is likely useful).

Thanks,
Carlo

Making the CapacityScheduler more dynamic
-

Key: YARN-1707
URL: https://issues.apache.org/jira/browse/YARN-1707
Project: Hadoop YARN
Issue Type: Sub-task
Components: capacityscheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
Labels: capacity-scheduler
Attachments: YARN-1707.patch

The CapacityScheduler is a rather static at the moment, and refreshqueue
provides a rather heavy-handed way to reconfigure it. Moving towards
long-running services (tracked in YARN-896) and to enable more advanced
admission control and resource parcelling we need to make the
CapacityScheduler more dynamic. This is instrumental to the umbrella jira
YARN-1051.
Concretely this require the following changes:
* create queues dynamically
* destroy queues dynamically
* dynamically change queue parameters (e.g., capacity)
* modify refreshqueue validation to enforce sum(child.getCapacity())= 100%
instead of ==100%
We limit this to LeafQueues.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2033) Investigate merging generic-history into the Timeline Store


[ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081227#comment-14081227
 ] 

Hadoop QA commented on YARN-2033:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12658949/YARN-2033_ALL.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 20 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4498//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4498//console

This message is automatically generated.

 Investigate merging generic-history into the Timeline Store
 ---

 Key: YARN-2033
 URL: https://issues.apache.org/jira/browse/YARN-2033
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
 YARN-2033.1.patch, YARN-2033.2.patch, YARN-2033.3.patch, YARN-2033.4.patch, 
 YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch, 
 YARN-2033_ALL.3.patch, YARN-2033_ALL.4.patch


 Having two different stores isn't amicable to generic insights on what's 
 happening with applications. This is to investigate porting generic-history 
 into the Timeline Store.
 One goal is to try and retain most of the client side interfaces as close to 
 what we have today.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2375) Allow enabling/disabling timeline server per framework

2014-07-31 Thread Jonathan Eagles (JIRA)

Jonathan Eagles created YARN-2375:
-

 Summary: Allow enabling/disabling timeline server per framework
 Key: YARN-2375
 URL: https://issues.apache.org/jira/browse/YARN-2375
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Eagles






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits


[ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081331#comment-14081331
 ] 

Hadoop QA commented on YARN-2069:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12658971/YARN-2069-trunk-8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4499//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4499//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4499//console

This message is automatically generated.

 CS queue level preemption should respect user-limits
 

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, 
 YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, 
 YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch, YARN-2069-trunk-8.patch


 This is different from (even if related to, and likely share code with) 
 YARN-2113.
 YARN-2113 focuses on making sure that even if queue has its guaranteed 
 capacity, it's individual users are treated in-line with their limits 
 irrespective of when they join in.
 This JIRA is about respecting user-limits while preempting containers to 
 balance queue capacities.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces

2014-07-31 Thread Milan Potocnik (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081341#comment-14081341
 ] 

Milan Potocnik commented on YARN-1994:
--

[~cwelch] looks good, thanks for the effort!

+1 from me

 Expose YARN/MR endpoints on multiple interfaces
 ---

 Key: YARN-1994
 URL: https://issues.apache.org/jira/browse/YARN-1994
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Arpit Agarwal
Assignee: Craig Welch
 Attachments: YARN-1994.0.patch, YARN-1994.1.patch, 
 YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, 
 YARN-1994.13.patch, YARN-1994.14.patch, YARN-1994.15.patch, 
 YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, 
 YARN-1994.6.patch, YARN-1994.7.patch


 YARN and MapReduce daemons currently do not support specifying a wildcard 
 address for the server endpoints. This prevents the endpoints from being 
 accessible from all interfaces on a multihomed machine.
 Note that if we do specify INADDR_ANY for any of the options, it will break 
 clients as they will attempt to connect to 0.0.0.0. We need a solution that 
 allows specifying a hostname or IP-address for clients while requesting 
 wildcard bind for the servers.
 (List of endpoints is in a comment below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure


[ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081382#comment-14081382
 ] 

Hadoop QA commented on YARN-2008:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12658970/YARN-2008.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4500//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4500//console

This message is automatically generated.

 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Craig Welch
 Attachments: YARN-2008.1.patch, YARN-2008.2.patch, YARN-2008.3.patch, 
 YARN-2008.4.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (YARN-2304) TestWebServices fails intermittently

2014-07-31 Thread Tsuyoshi OZAWA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA resolved YARN-2304.
--

Resolution: Fixed

 Test*WebServices* fails intermittently
 --

 Key: YARN-2304
 URL: https://issues.apache.org/jira/browse/YARN-2304
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Tsuyoshi OZAWA
 Attachments: test-failure-log-RMWeb.txt


 TestNMWebService, TestRMWebService, and TestAMWebService get failed with 
 address already get bind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2304) TestWebServices fails intermittently

2014-07-31 Thread Tsuyoshi OZAWA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081487#comment-14081487
 ] 

Tsuyoshi OZAWA commented on YARN-2304:
--

[~zjshen], thank you for notifying. After the work by [~jlowe], we don't see 
the test failure anymore. Closed as a fixed problem.

 Test*WebServices* fails intermittently
 --

 Key: YARN-2304
 URL: https://issues.apache.org/jira/browse/YARN-2304
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Tsuyoshi OZAWA
 Attachments: test-failure-log-RMWeb.txt


 TestNMWebService, TestRMWebService, and TestAMWebService get failed with 
 address already get bind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces


 [ 
https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-1994:
--

Attachment: YARN-1994.15-branch2.patch

Adding a version of the patch for branch-2, because the one from trunk doesn't 
cleanly apply.  Minor changes to deal with some other uncommited work from 
trunk in a unit test.  This patch will fail when applied to trunk most likely, 
that can be ignored.

 Expose YARN/MR endpoints on multiple interfaces
 ---

 Key: YARN-1994
 URL: https://issues.apache.org/jira/browse/YARN-1994
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Arpit Agarwal
Assignee: Craig Welch
 Attachments: YARN-1994.0.patch, YARN-1994.1.patch, 
 YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, 
 YARN-1994.13.patch, YARN-1994.14.patch, YARN-1994.15-branch2.patch, 
 YARN-1994.15.patch, YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, 
 YARN-1994.5.patch, YARN-1994.6.patch, YARN-1994.7.patch


 YARN and MapReduce daemons currently do not support specifying a wildcard 
 address for the server endpoints. This prevents the endpoints from being 
 accessible from all interfaces on a multihomed machine.
 Note that if we do specify INADDR_ANY for any of the options, it will break 
 clients as they will attempt to connect to 0.0.0.0. We need a solution that 
 allows specifying a hostname or IP-address for clients while requesting 
 wildcard bind for the servers.
 (List of endpoints is in a comment below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces


[ 
https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081508#comment-14081508
 ] 

Xuan Gong commented on YARN-1994:
-

Committed to trunk and branch-2. Thanks, Craig, Arpit and Milan

 Expose YARN/MR endpoints on multiple interfaces
 ---

 Key: YARN-1994
 URL: https://issues.apache.org/jira/browse/YARN-1994
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Arpit Agarwal
Assignee: Craig Welch
 Fix For: 2.6.0

 Attachments: YARN-1994.0.patch, YARN-1994.1.patch, 
 YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, 
 YARN-1994.13.patch, YARN-1994.14.patch, YARN-1994.15-branch2.patch, 
 YARN-1994.15.patch, YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, 
 YARN-1994.5.patch, YARN-1994.6.patch, YARN-1994.7.patch


 YARN and MapReduce daemons currently do not support specifying a wildcard 
 address for the server endpoints. This prevents the endpoints from being 
 accessible from all interfaces on a multihomed machine.
 Note that if we do specify INADDR_ANY for any of the options, it will break 
 clients as they will attempt to connect to 0.0.0.0. We need a solution that 
 allows specifying a hostname or IP-address for clients while requesting 
 wildcard bind for the servers.
 (List of endpoints is in a comment below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces


[ 
https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081535#comment-14081535
 ] 

Hudson commented on YARN-1994:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5992 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5992/])
YARN-1994. Expose YARN/MR endpoints on multiple interfaces. Contributed by 
Craig Welch, Milan Potocnik,and Arpit Agarwal (xgong: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1614981)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/AppContext.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/client/MRClientService.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapred/TestTaskAttemptListenerImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MockAppContext.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRuntimeEstimators.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/jobhistory/JHAdminConfig.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRWebAppUtil.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryClientService.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistory.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/server/HSAdminServer.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/WebAppUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryClientService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
*

[jira] [Created] (YARN-2376) Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job counter in

2014-07-31 Thread zhihai xu (JIRA)

zhihai xu created YARN-2376:
---

 Summary: Too many threads blocking on the global JobTracker lock 
from getJobCounters, optimize getJobCounters to release global JobTracker lock 
before access the per job counter in JobInProgress
 Key: YARN-2376
 URL: https://issues.apache.org/jira/browse/YARN-2376
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: zhihai xu
Assignee: zhihai xu


Too many threads blocking on the global JobTracker lock from getJobCounters, 
optimize getJobCounters to release global JobTracker lock before access the per 
job counter in JobInProgress. It may be a lot of JobClients to call 
getJobCounters in JobTracker at the same time, Current code will lock the 
JobTracker to block all the threads to get counter from JobInProgress. It is 
better to unlock the JobTracker when get counter from 
JobInProgress(job.getCounters(counters)). So all the theads can run parallel 
when access its own job counter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2376) Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job counter in

2014-07-31 Thread zhihai xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2376:


Attachment: YARN-2376.000.patch

 Too many threads blocking on the global JobTracker lock from getJobCounters, 
 optimize getJobCounters to release global JobTracker lock before access the 
 per job counter in JobInProgress
 -

 Key: YARN-2376
 URL: https://issues.apache.org/jira/browse/YARN-2376
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2376.000.patch


 Too many threads blocking on the global JobTracker lock from getJobCounters, 
 optimize getJobCounters to release global JobTracker lock before access the 
 per job counter in JobInProgress. It may be a lot of JobClients to call 
 getJobCounters in JobTracker at the same time, Current code will lock the 
 JobTracker to block all the threads to get counter from JobInProgress. It is 
 better to unlock the JobTracker when get counter from 
 JobInProgress(job.getCounters(counters)). So all the theads can run parallel 
 when access its own job counter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (YARN-2376) Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job counter i

2014-07-31 Thread zhihai xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu resolved YARN-2376.
-

Resolution: Duplicate

 Too many threads blocking on the global JobTracker lock from getJobCounters, 
 optimize getJobCounters to release global JobTracker lock before access the 
 per job counter in JobInProgress
 -

 Key: YARN-2376
 URL: https://issues.apache.org/jira/browse/YARN-2376
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2376.000.patch


 Too many threads blocking on the global JobTracker lock from getJobCounters, 
 optimize getJobCounters to release global JobTracker lock before access the 
 per job counter in JobInProgress. It may be a lot of JobClients to call 
 getJobCounters in JobTracker at the same time, Current code will lock the 
 JobTracker to block all the threads to get counter from JobInProgress. It is 
 better to unlock the JobTracker when get counter from 
 JobInProgress(job.getCounters(counters)). So all the theads can run parallel 
 when access its own job counter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure

2014-07-31 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081718#comment-14081718
 ] 

Wangda Tan commented on YARN-2008:
--

Hi [~cwelch],
I found the patch you updated is identical with *.3.patch, could you please 
check?

Thanks

 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Craig Welch
 Attachments: YARN-2008.1.patch, YARN-2008.2.patch, YARN-2008.3.patch, 
 YARN-2008.4.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2377) Localization exception stack traces are not passed as diagnostic info

Gera Shegalov created YARN-2377:
---

 Summary: Localization exception stack traces are not passed as 
diagnostic info
 Key: YARN-2377
 URL: https://issues.apache.org/jira/browse/YARN-2377
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov


In the Localizer log one can only see this kind of message
{code}
14/07/31 10:29:00 INFO localizer.ResourceLocalizationService: DEBUG: FAILED { 
hdfs://ha-nn-uri-0:8020/tmp/hadoop-yarn/staging/gshegalov/.staging/job_1406825443306_0004/job.jar,
 1406827248944, PATTERN, (?:classes/|lib/).* }, java.net.UnknownHos tException: 
ha-nn-uri-0
{code}

And then onlt {{ java.net.UnknownHos tException: ha-nn-uri-0}} message is 
propagated as diagnostics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2377) Localization exception stack traces are not passed as diagnostic info


 [ 
https://issues.apache.org/jira/browse/YARN-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated YARN-2377:


Description: 
In the Localizer log one can only see this kind of message
{code}
14/07/31 10:29:00 INFO localizer.ResourceLocalizationService: DEBUG: FAILED { 
hdfs://ha-nn-uri-0:8020/tmp/hadoop-yarn/staging/gshegalov/.staging/job_1406825443306_0004/job.jar,
 1406827248944, PATTERN, (?:classes/|lib/).* }, java.net.UnknownHos tException: 
ha-nn-uri-0
{code}

And then only {{ java.net.UnknownHos tException: ha-nn-uri-0}} message is 
propagated as diagnostics.

  was:
In the Localizer log one can only see this kind of message
{code}
14/07/31 10:29:00 INFO localizer.ResourceLocalizationService: DEBUG: FAILED { 
hdfs://ha-nn-uri-0:8020/tmp/hadoop-yarn/staging/gshegalov/.staging/job_1406825443306_0004/job.jar,
 1406827248944, PATTERN, (?:classes/|lib/).* }, java.net.UnknownHos tException: 
ha-nn-uri-0
{code}

And then onlt {{ java.net.UnknownHos tException: ha-nn-uri-0}} message is 
propagated as diagnostics.


 Localization exception stack traces are not passed as diagnostic info
 -

 Key: YARN-2377
 URL: https://issues.apache.org/jira/browse/YARN-2377
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov

 In the Localizer log one can only see this kind of message
 {code}
 14/07/31 10:29:00 INFO localizer.ResourceLocalizationService: DEBUG: FAILED { 
 hdfs://ha-nn-uri-0:8020/tmp/hadoop-yarn/staging/gshegalov/.staging/job_1406825443306_0004/job.jar,
  1406827248944, PATTERN, (?:classes/|lib/).* }, java.net.UnknownHos 
 tException: ha-nn-uri-0
 {code}
 And then only {{ java.net.UnknownHos tException: ha-nn-uri-0}} message is 
 propagated as diagnostics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2377) Localization exception stack traces are not passed as diagnostic info


 [ 
https://issues.apache.org/jira/browse/YARN-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated YARN-2377:


Description: 
In the Localizer log one can only see this kind of message
{code}
14/07/31 10:29:00 INFO localizer.ResourceLocalizationService: DEBUG: FAILED { 
hdfs://ha-nn-uri-0:8020/tmp/hadoop-yarn/staging/gshegalov/.staging/job_1406825443306_0004/job.jar,
 1406827248944, PATTERN, (?:classes/|lib/).* }, java.net.UnknownHos tException: 
ha-nn-uri-0
{code}

And then only {{ java.net.UnknownHostException: ha-nn-uri-0}} message is 
propagated as diagnostics.

  was:
In the Localizer log one can only see this kind of message
{code}
14/07/31 10:29:00 INFO localizer.ResourceLocalizationService: DEBUG: FAILED { 
hdfs://ha-nn-uri-0:8020/tmp/hadoop-yarn/staging/gshegalov/.staging/job_1406825443306_0004/job.jar,
 1406827248944, PATTERN, (?:classes/|lib/).* }, java.net.UnknownHos tException: 
ha-nn-uri-0
{code}

And then only {{ java.net.UnknownHos tException: ha-nn-uri-0}} message is 
propagated as diagnostics.


 Localization exception stack traces are not passed as diagnostic info
 -

 Key: YARN-2377
 URL: https://issues.apache.org/jira/browse/YARN-2377
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov

 In the Localizer log one can only see this kind of message
 {code}
 14/07/31 10:29:00 INFO localizer.ResourceLocalizationService: DEBUG: FAILED { 
 hdfs://ha-nn-uri-0:8020/tmp/hadoop-yarn/staging/gshegalov/.staging/job_1406825443306_0004/job.jar,
  1406827248944, PATTERN, (?:classes/|lib/).* }, java.net.UnknownHos 
 tException: ha-nn-uri-0
 {code}
 And then only {{ java.net.UnknownHostException: ha-nn-uri-0}} message is 
 propagated as diagnostics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2372) There are Chinese Characters in the FairScheduler's document

2014-07-31 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081777#comment-14081777
 ] 

Fengdong Yu commented on YARN-2372:
---

I cannot find any more places on this issue by now. Thanks.

 There are Chinese Characters in the FairScheduler's document
 

 Key: YARN-2372
 URL: https://issues.apache.org/jira/browse/YARN-2372
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.4.1
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Attachments: YARN-2372.patch, YARN-2372.patch, YARN-2372.patch, 
 YARN-2372.patch, YARN-2372.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2377) Localization exception stack traces are not passed as diagnostic info


 [ 
https://issues.apache.org/jira/browse/YARN-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated YARN-2377:


Attachment: YARN-2377.v01.patch

v01 for review. With this you get a more actionable stack trace:

{code}
14/07/31 17:46:39 INFO mapreduce.Job: Job job_1406853387336_0001 failed with 
state FAILED due to: Application application_1406853387336_0001 failed 2 times 
due to AM Container for appattempt_1406853387336_0001_02 exited with  
exitCode: -1000
For more detailed output, check application tracking 
page:http://tw-mbp-gshegalov:8088/proxy/application_1406853387336_0001/Then, 
click on links to logs of each attempt.
Diagnostics: java.net.UnknownHostException: ha-nn-uri-0
java.lang.IllegalArgumentException: java.net.UnknownHostException: ha-nn-uri-0
at 
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:373)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:260)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:153)
at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:607)
at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:552)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:139)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2590)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2624)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2606)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:248)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:60)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:356)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:354)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:394)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:353)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:695)
Caused by: java.net.UnknownHostException: ha-nn-uri-0
... 29 more
Caused by: ha-nn-uri-0
java.lang.IllegalArgumentException: java.net.UnknownHostException: ha-nn-uri-0
at 
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:373)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:260)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:153)
at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:607)
at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:552)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:139)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2590)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2624)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2606)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:248)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:60)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:356)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:354)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:394)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:353)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
at

[jira] [Updated] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure


 [ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2008:
--

Attachment: YARN-2008.5.patch

This time, actually with the additional tests :-) 

 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Craig Welch
 Attachments: YARN-2008.1.patch, YARN-2008.2.patch, YARN-2008.3.patch, 
 YARN-2008.4.patch, YARN-2008.5.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically


[ 
https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081826#comment-14081826
 ] 

Hadoop QA commented on YARN-2212:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12658995/YARN-2212.5.rebase.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4501//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4501//console

This message is automatically generated.

 ApplicationMaster needs to find a way to update the AMRMToken periodically
 --

 Key: YARN-2212
 URL: https://issues.apache.org/jira/browse/YARN-2212
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2212.1.patch, YARN-2212.2.patch, 
 YARN-2212.3.1.patch, YARN-2212.3.patch, YARN-2212.4.patch, YARN-2212.5.patch, 
 YARN-2212.5.patch, YARN-2212.5.rebase.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2372) There are Chinese Characters in the FairScheduler's document


[ 
https://issues.apache.org/jira/browse/YARN-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081832#comment-14081832
 ] 

Zhijie Shen commented on YARN-2372:
---

There're non-unicode double quotes in HdfsDesign.apt.vm and 
HdfsNfsGateway.apt.vm. It's not a big change, and I think we can fix them in 
one patch.

 There are Chinese Characters in the FairScheduler's document
 

 Key: YARN-2372
 URL: https://issues.apache.org/jira/browse/YARN-2372
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.4.1
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Attachments: YARN-2372.patch, YARN-2372.patch, YARN-2372.patch, 
 YARN-2372.patch, YARN-2372.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2288) Data persistent in timelinestore should be versioned


 [ 
https://issues.apache.org/jira/browse/YARN-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2288:
-

Attachment: YARN-2288.patch

Upload the patch to version timelinestore.

 Data persistent in timelinestore should be versioned
 

 Key: YARN-2288
 URL: https://issues.apache.org/jira/browse/YARN-2288
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.4.1
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2288.patch


 We have LevelDB-backed TimelineStore, it should have schema version for 
 changes in schema in future.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically


[ 
https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081841#comment-14081841
 ] 

Xuan Gong commented on YARN-2212:
-

Test can be passed locally..

 ApplicationMaster needs to find a way to update the AMRMToken periodically
 --

 Key: YARN-2212
 URL: https://issues.apache.org/jira/browse/YARN-2212
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2212.1.patch, YARN-2212.2.patch, 
 YARN-2212.3.1.patch, YARN-2212.3.patch, YARN-2212.4.patch, YARN-2212.5.patch, 
 YARN-2212.5.patch, YARN-2212.5.rebase.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure

2014-07-31 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081844#comment-14081844
 ] 

Wangda Tan commented on YARN-2008:
--

Hi [~cwelch],
Thanks for updating, now tests can cover all cases I can think about, 
A very minor comment:
Could you please add a small ε for all {{assertEquals}} like following?
bq. +assertEquals( 0.1f, result, 0.01f);

Thanks,
Wangda 



 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Craig Welch
 Attachments: YARN-2008.1.patch, YARN-2008.2.patch, YARN-2008.3.patch, 
 YARN-2008.4.patch, YARN-2008.5.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits

2014-07-31 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081845#comment-14081845
 ] 

Wangda Tan commented on YARN-2069:
--

Hi [~mayank_bansal],
Thanks for uploading, reviewing it now.

Wangda

 CS queue level preemption should respect user-limits
 

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, 
 YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, 
 YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch, YARN-2069-trunk-8.patch, 
 YARN-2069-trunk-9.patch


 This is different from (even if related to, and likely share code with) 
 YARN-2113.
 YARN-2113 focuses on making sure that even if queue has its guaranteed 
 capacity, it's individual users are treated in-line with their limits 
 irrespective of when they join in.
 This JIRA is about respecting user-limits while preempting containers to 
 balance queue capacities.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2051) Fix bug in PBimpls and add more unit tests with reflection


 [ 
https://issues.apache.org/jira/browse/YARN-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2051:
-

Summary: Fix bug in PBimpls and add more unit tests with reflection  (was: 
Fix code in PBimpls and add more unit tests with reflection)

 Fix bug in PBimpls and add more unit tests with reflection
 --

 Key: YARN-2051
 URL: https://issues.apache.org/jira/browse/YARN-2051
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Junping Du
Assignee: Binglin Chang
Priority: Critical
 Attachments: YARN-2051.v1.patch, YARN-2051.v2.patch


 From YARN-2016, we can see some bug could exist in PB implementation of 
 protocol. The bad news is most of these PBImpl don't have any unit test to 
 verify the info is not lost or changed after serialization/deserialization. 
 We should add more tests for it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2051) Fix bug in PBimpls and add more unit tests with reflection