date:20140806


[ 
https://issues.apache.org/jira/browse/YARN-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087357#comment-14087357
 ] 

Junping Du commented on YARN-2288:
--

The test failure seems to be related to configuration of testbed but not be 
related to the patch. Kick off Jenkins test again manually.

 Data persistent in timelinestore should be versioned
 

 Key: YARN-2288
 URL: https://issues.apache.org/jira/browse/YARN-2288
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.4.1
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2288-v2.patch, YARN-2288.patch


 We have LevelDB-backed TimelineStore, it should have schema version for 
 changes in schema in future.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1336) Work-preserving nodemanager restart


[ 
https://issues.apache.org/jira/browse/YARN-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087360#comment-14087360
 ] 

Junping Du commented on YARN-1336:
--

Got it. Will help to review YARN-1337. Thanks [~jlowe]!

 Work-preserving nodemanager restart
 ---

 Key: YARN-1336
 URL: https://issues.apache.org/jira/browse/YARN-1336
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: NMRestartDesignOverview.pdf, YARN-1336-rollup-v2.patch, 
 YARN-1336-rollup.patch


 This serves as an umbrella ticket for tasks related to work-preserving 
 nodemanager restart.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2288) Data persistent in timelinestore should be versioned


[ 
https://issues.apache.org/jira/browse/YARN-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087380#comment-14087380
 ] 

Hadoop QA commented on YARN-2288:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660025/YARN-2288-v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice:

  
org.apache.hadoop.yarn.server.timeline.webapp.TestTimelineWebServices

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4530//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4530//console

This message is automatically generated.

 Data persistent in timelinestore should be versioned
 

 Key: YARN-2288
 URL: https://issues.apache.org/jira/browse/YARN-2288
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.4.1
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2288-v2.patch, YARN-2288.patch


 We have LevelDB-backed TimelineStore, it should have schema version for 
 changes in schema in future.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2374) YARN trunk build failing TestDistributedShell.testDSShell

2014-08-06 Thread Varun Vasudev (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087411#comment-14087411
 ] 

Varun Vasudev commented on YARN-2374:
-

[~jianhe] and [~gkesavan] spoke offline and fixed the hostname; Junping 
resubmitted the patch to Jenkins. Thank you to all three.

 YARN trunk build failing TestDistributedShell.testDSShell
 -

 Key: YARN-2374
 URL: https://issues.apache.org/jira/browse/YARN-2374
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2374.0.patch, apache-yarn-2374.1.patch, 
 apache-yarn-2374.2.patch, apache-yarn-2374.3.patch, apache-yarn-2374.4.patch


 The YARN trunk build has been failing for the last few days in the 
 distributed shell module.
 {noformat}
 testDSShell(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 27.269 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:188)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1954) Add waitFor to AMRMClient(Async)


 [ 
https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1954:
-

Attachment: YARN-1954.5.patch

Thank you for comments, [~zjshen]. Updated a patch:

1. Changed to throw IllegalArgumentException when the arguments are invalid.
2. Added new argument {{logInterval}} to {{waitFor}} API.
3. Removed unnecessary changes.
4. Changed to check countDownChecker#counter == 3  after waitFor in 
TestAMRMClient#testWaitFor.
5. Removed unnecessary synchronized block. Instead of this, added synchronized 
block against {{callback}} to read correct value from main thread, because 
{{callback.notify}} is updated in another thread.

 Add waitFor to AMRMClient(Async)
 

 Key: YARN-1954
 URL: https://issues.apache.org/jira/browse/YARN-1954
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 3.0.0, 2.4.0
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1954.1.patch, YARN-1954.2.patch, YARN-1954.3.patch, 
 YARN-1954.4.patch, YARN-1954.4.patch, YARN-1954.5.patch


 Recently, I saw some use cases of AMRMClient(Async). The painful thing is 
 that the main non-daemon thread has to sit in a dummy loop to prevent AM 
 process exiting before all the tasks are done, while unregistration is 
 triggered on a separate another daemon thread by callback methods (in 
 particular when using AMRMClientAsync). IMHO, it should be beneficial to add 
 a waitFor method to AMRMClient(Async) to block the AM until unregistration or 
 user supplied check point, such that users don't need to write the loop 
 themselves.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1954) Add waitFor to AMRMClient(Async)


 [ 
https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1954:
-

Attachment: (was: YARN-1954.5.patch)

 Add waitFor to AMRMClient(Async)
 

 Key: YARN-1954
 URL: https://issues.apache.org/jira/browse/YARN-1954
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 3.0.0, 2.4.0
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1954.1.patch, YARN-1954.2.patch, YARN-1954.3.patch, 
 YARN-1954.4.patch, YARN-1954.4.patch


 Recently, I saw some use cases of AMRMClient(Async). The painful thing is 
 that the main non-daemon thread has to sit in a dummy loop to prevent AM 
 process exiting before all the tasks are done, while unregistration is 
 triggered on a separate another daemon thread by callback methods (in 
 particular when using AMRMClientAsync). IMHO, it should be beneficial to add 
 a waitFor method to AMRMClient(Async) to block the AM until unregistration or 
 user supplied check point, such that users don't need to write the loop 
 themselves.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1954) Add waitFor to AMRMClient(Async)


 [ 
https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1954:
-

Attachment: YARN-1954.5.patch

 Add waitFor to AMRMClient(Async)
 

 Key: YARN-1954
 URL: https://issues.apache.org/jira/browse/YARN-1954
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 3.0.0, 2.4.0
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1954.1.patch, YARN-1954.2.patch, YARN-1954.3.patch, 
 YARN-1954.4.patch, YARN-1954.4.patch, YARN-1954.5.patch


 Recently, I saw some use cases of AMRMClient(Async). The painful thing is 
 that the main non-daemon thread has to sit in a dummy loop to prevent AM 
 process exiting before all the tasks are done, while unregistration is 
 triggered on a separate another daemon thread by callback methods (in 
 particular when using AMRMClientAsync). IMHO, it should be beneficial to add 
 a waitFor method to AMRMClient(Async) to block the AM until unregistration or 
 user supplied check point, such that users don't need to write the loop 
 themselves.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2382) Resource Manager throws InvalidStateTransitonException

2014-08-06 Thread Nishan Shetty (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087458#comment-14087458
 ] 

Nishan Shetty commented on YARN-2382:
-

Hi [~ywskycn] This issue came when RM is restarted while job is in progress.
What configuration you need can you please specify?

 Resource Manager throws InvalidStateTransitonException
 --

 Key: YARN-2382
 URL: https://issues.apache.org/jira/browse/YARN-2382
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Nishan Shetty

 {code}
 2014-08-05 03:44:47,882 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established to 10.18.40.26/10.18.40.26:11578, initiating session
 2014-08-05 03:44:47,888 INFO org.apache.zookeeper.ClientCnxn: Session 
 establishment complete on server 10.18.40.26/10.18.40.26:11578, sessionid = 
 0x347a051fda60035, negotiated timeout = 1
 2014-08-05 03:44:47,889 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 CONTAINER_ALLOCATED at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:664)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:104)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:764)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:745)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:662)
 2014-08-05 03:44:47,890 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 STATUS_UPDATE at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:664)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:104)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:764)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:745)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:662)
 2014-08-05 03:44:47,890 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 STATUS_UPDATE at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:664)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:104)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:764)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:745)
   at

[jira] [Updated] (YARN-2138) Cleanup notifyDone* methods in RMStateStore

2014-08-06 Thread Varun Saxena (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2138:
---

Attachment: YARN-2138.patch

 Cleanup notifyDone* methods in RMStateStore
 ---

 Key: YARN-2138
 URL: https://issues.apache.org/jira/browse/YARN-2138
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Varun Saxena
 Attachments: YARN-2138.patch


 The storedException passed into notifyDoneStoringApplication is always null. 
 Similarly for other notifyDone* methods. We can clean up these methods as 
 this control flow path is not used anymore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2130) Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext


 [ 
https://issues.apache.org/jira/browse/YARN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2130:
-

Attachment: YARN-2130.8.patch

 Cleanup: Adding getRMAppManager, getQueueACLsManager, 
 getApplicationACLsManager to RMContext
 

 Key: YARN-2130
 URL: https://issues.apache.org/jira/browse/YARN-2130
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2130.1.patch, YARN-2130.2.patch, YARN-2130.3.patch, 
 YARN-2130.4.patch, YARN-2130.5.patch, YARN-2130.6.patch, YARN-2130.7-2.patch, 
 YARN-2130.7.patch, YARN-2130.8.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2130) Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext


[ 
https://issues.apache.org/jira/browse/YARN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087482#comment-14087482
 ] 

Tsuyoshi OZAWA commented on YARN-2130:
--

Rebased on trunk.

 Cleanup: Adding getRMAppManager, getQueueACLsManager, 
 getApplicationACLsManager to RMContext
 

 Key: YARN-2130
 URL: https://issues.apache.org/jira/browse/YARN-2130
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2130.1.patch, YARN-2130.2.patch, YARN-2130.3.patch, 
 YARN-2130.4.patch, YARN-2130.5.patch, YARN-2130.6.patch, YARN-2130.7-2.patch, 
 YARN-2130.7.patch, YARN-2130.8.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1954) Add waitFor to AMRMClient(Async)


[ 
https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087501#comment-14087501
 ] 

Hadoop QA commented on YARN-1954:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660096/YARN-1954.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

  
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
  org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
  org.apache.hadoop.yarn.client.api.impl.TestAMRMClient
  
org.apache.hadoop.yarn.client.api.async.impl.TestAMRMClientAsync
  org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4531//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4531//console

This message is automatically generated.

 Add waitFor to AMRMClient(Async)
 

 Key: YARN-1954
 URL: https://issues.apache.org/jira/browse/YARN-1954
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 3.0.0, 2.4.0
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1954.1.patch, YARN-1954.2.patch, YARN-1954.3.patch, 
 YARN-1954.4.patch, YARN-1954.4.patch, YARN-1954.5.patch


 Recently, I saw some use cases of AMRMClient(Async). The painful thing is 
 that the main non-daemon thread has to sit in a dummy loop to prevent AM 
 process exiting before all the tasks are done, while unregistration is 
 triggered on a separate another daemon thread by callback methods (in 
 particular when using AMRMClientAsync). IMHO, it should be beneficial to add 
 a waitFor method to AMRMClient(Async) to block the AM until unregistration or 
 user supplied check point, such that users don't need to write the loop 
 themselves.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1954) Add waitFor to AMRMClient(Async)


[ 
https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087523#comment-14087523
 ] 

Hadoop QA commented on YARN-1954:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660096/YARN-1954.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

  
org.apache.hadoop.yarn.client.api.async.impl.TestAMRMClientAsync
  org.apache.hadoop.yarn.client.api.impl.TestAMRMClient

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4532//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4532//console

This message is automatically generated.

 Add waitFor to AMRMClient(Async)
 

 Key: YARN-1954
 URL: https://issues.apache.org/jira/browse/YARN-1954
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 3.0.0, 2.4.0
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1954.1.patch, YARN-1954.2.patch, YARN-1954.3.patch, 
 YARN-1954.4.patch, YARN-1954.4.patch, YARN-1954.5.patch


 Recently, I saw some use cases of AMRMClient(Async). The painful thing is 
 that the main non-daemon thread has to sit in a dummy loop to prevent AM 
 process exiting before all the tasks are done, while unregistration is 
 triggered on a separate another daemon thread by callback methods (in 
 particular when using AMRMClientAsync). IMHO, it should be beneficial to add 
 a waitFor method to AMRMClient(Async) to block the AM until unregistration or 
 user supplied check point, such that users don't need to write the loop 
 themselves.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2298) Move TimelineClient to yarn-common project


[ 
https://issues.apache.org/jira/browse/YARN-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087537#comment-14087537
 ] 

Hudson commented on YARN-2298:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #635 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/635/])
YARN-2298. Move TimelineClient to yarn-common project (Contributed by Zhijie 
Shen) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616100)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/TimelineClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineAuthenticator.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineAuthenticator.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/TimelineClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineAuthenticator.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/package-info.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/package-info.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineAuthenticator.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java


 Move TimelineClient to yarn-common project
 --

 Key: YARN-2298
 URL: https://issues.apache.org/jira/browse/YARN-2298
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.6.0

 Attachments: YARN-2298.1.patch


 To allow RM to reuse the timeline client code, we have to move it out of 
 yarn-client module, due to maven dependency issues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2382) Resource Manager throws InvalidStateTransitonException

2014-08-06 Thread Wei Yan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087543#comment-14087543
 ] 

Wei Yan commented on YARN-2382:
---

thanks, [~nishan], that's enough information.

 Resource Manager throws InvalidStateTransitonException
 --

 Key: YARN-2382
 URL: https://issues.apache.org/jira/browse/YARN-2382
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Nishan Shetty

 {code}
 2014-08-05 03:44:47,882 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established to 10.18.40.26/10.18.40.26:11578, initiating session
 2014-08-05 03:44:47,888 INFO org.apache.zookeeper.ClientCnxn: Session 
 establishment complete on server 10.18.40.26/10.18.40.26:11578, sessionid = 
 0x347a051fda60035, negotiated timeout = 1
 2014-08-05 03:44:47,889 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 CONTAINER_ALLOCATED at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:664)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:104)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:764)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:745)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:662)
 2014-08-05 03:44:47,890 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 STATUS_UPDATE at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:664)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:104)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:764)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:745)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:662)
 2014-08-05 03:44:47,890 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 STATUS_UPDATE at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:664)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:104)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:764)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:745)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at

[jira] [Commented] (YARN-2381) aa

2014-08-06 Thread Wei Yan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087549#comment-14087549
 ] 

Wei Yan commented on YARN-2381:
---

Never mind, [~Jackliu91].

 aa
 --

 Key: YARN-2381
 URL: https://issues.apache.org/jira/browse/YARN-2381
 Project: Hadoop YARN
  Issue Type: Test
Reporter: JiankunLiu
Priority: Blocker





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2138) Cleanup notifyDone* methods in RMStateStore


[ 
https://issues.apache.org/jira/browse/YARN-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087557#comment-14087557
 ] 

Hadoop QA commented on YARN-2138:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660104/YARN-2138.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4533//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4533//console

This message is automatically generated.

 Cleanup notifyDone* methods in RMStateStore
 ---

 Key: YARN-2138
 URL: https://issues.apache.org/jira/browse/YARN-2138
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Varun Saxena
 Attachments: YARN-2138.patch


 The storedException passed into notifyDoneStoringApplication is always null. 
 Similarly for other notifyDone* methods. We can clean up these methods as 
 this control flow path is not used anymore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2130) Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext


[ 
https://issues.apache.org/jira/browse/YARN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087583#comment-14087583
 ] 

Hadoop QA commented on YARN-2130:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660110/YARN-2130.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 17 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
  org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA
  org.apache.hadoop.yarn.client.TestRMFailover
  org.apache.hadoop.yarn.client.api.impl.TestAMRMClient
  org.apache.hadoop.yarn.client.api.impl.TestNMClient
  org.apache.hadoop.yarn.client.TestGetGroups
  
org.apache.hadoop.yarn.client.TestResourceManagerAdministrationProtocolPBClientImpl
  
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
  org.apache.hadoop.yarn.client.api.impl.TestYarnClient
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication
  
org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs
  
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService
  org.apache.hadoop.yarn.server.resourcemanager.TestRMHA
  
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4534//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4534//console

This message is automatically generated.

 Cleanup: Adding getRMAppManager, getQueueACLsManager, 
 getApplicationACLsManager to RMContext
 

 Key: YARN-2130
 URL: https://issues.apache.org/jira/browse/YARN-2130
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2130.1.patch, YARN-2130.2.patch, YARN-2130.3.patch, 
 YARN-2130.4.patch, YARN-2130.5.patch, YARN-2130.6.patch, YARN-2130.7-2.patch, 
 YARN-2130.7.patch, YARN-2130.8.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2138) Cleanup notifyDone* methods in RMStateStore

2014-08-06 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087593#comment-14087593
 ] 

Varun Saxena commented on YARN-2138:


[~jianhe], kindly review the changes made for the patch. I have made following 
changes :
1.  Deleted classes RMAppUpdatedSavedEvent,RMAppNewSavedEvent, 
RMAppAttemptNewSavedEvent and RMAppAttemptUpdatESavedEvent as they were 
offering no additional functionality over and above the base class, after 
removal of stored exception and updated exception.
2. Refactored code in RMStateStore and removed notifyDoneXXX methods.
3. Removed code corresponding to exception handling in RMAppImpl and 
RMAppAttemptImpl.
4. Made necessary changes in test cases.

 Cleanup notifyDone* methods in RMStateStore
 ---

 Key: YARN-2138
 URL: https://issues.apache.org/jira/browse/YARN-2138
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Varun Saxena
 Attachments: YARN-2138.patch


 The storedException passed into notifyDoneStoringApplication is always null. 
 Similarly for other notifyDone* methods. We can clean up these methods as 
 this control flow path is not used anymore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1954) Add waitFor to AMRMClient(Async)

[
https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tsuyoshi OZAWA updated YARN-1954:
-

Attachment: YARN-1954.6.patch

Fixed to pass tests:
* Updated test case.
* The following change is unnecessary, so removed it.

{quote}
5. Removed unnecessary synchronized block. Instead of this, added synchronized
block against callback to read correct value from main thread, because
callback.notify is updated in another thread.
{quote}

Add waitFor to AMRMClient(Async)

Key: YARN-1954
URL: https://issues.apache.org/jira/browse/YARN-1954
Project: Hadoop YARN
Issue Type: New Feature
Components: client
Affects Versions: 3.0.0, 2.4.0
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
Attachments: YARN-1954.1.patch, YARN-1954.2.patch, YARN-1954.3.patch,
YARN-1954.4.patch, YARN-1954.4.patch, YARN-1954.5.patch, YARN-1954.6.patch

Recently, I saw some use cases of AMRMClient(Async). The painful thing is
that the main non-daemon thread has to sit in a dummy loop to prevent AM
process exiting before all the tasks are done, while unregistration is
triggered on a separate another daemon thread by callback methods (in
particular when using AMRMClientAsync). IMHO, it should be beneficial to add
a waitFor method to AMRMClient(Async) to block the AM until unregistration or
user supplied check point, such that users don't need to write the loop
themselves.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2298) Move TimelineClient to yarn-common project


[ 
https://issues.apache.org/jira/browse/YARN-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087629#comment-14087629
 ] 

Hudson commented on YARN-2298:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1829 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1829/])
YARN-2298. Move TimelineClient to yarn-common project (Contributed by Zhijie 
Shen) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616100)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/TimelineClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineAuthenticator.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineAuthenticator.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/TimelineClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineAuthenticator.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/package-info.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/package-info.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineAuthenticator.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java


 Move TimelineClient to yarn-common project
 --

 Key: YARN-2298
 URL: https://issues.apache.org/jira/browse/YARN-2298
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.6.0

 Attachments: YARN-2298.1.patch


 To allow RM to reuse the timeline client code, we have to move it out of 
 yarn-client module, due to maven dependency issues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2298) Move TimelineClient to yarn-common project


[ 
https://issues.apache.org/jira/browse/YARN-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087653#comment-14087653
 ] 

Hudson commented on YARN-2298:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1855 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1855/])
YARN-2298. Move TimelineClient to yarn-common project (Contributed by Zhijie 
Shen) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616100)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/TimelineClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineAuthenticator.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineAuthenticator.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/TimelineClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineAuthenticator.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/package-info.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/package-info.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineAuthenticator.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java


 Move TimelineClient to yarn-common project
 --

 Key: YARN-2298
 URL: https://issues.apache.org/jira/browse/YARN-2298
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.6.0

 Attachments: YARN-2298.1.patch


 To allow RM to reuse the timeline client code, we have to move it out of 
 yarn-client module, due to maven dependency issues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1337) Recover containers upon nodemanager restart

2014-08-06 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087665#comment-14087665
 ] 

Jason Lowe commented on YARN-1337:
--

I'm unable to reproduce these test failures locally.  Checking a few of the 
test failures show they are likely all failing because the machine can't lookup 
it's own name, e.g.: java.net.UnknownHostException: asf901.ygridcore.net: 
asf901.ygridcore.net.  I'll work with ops to get the machine fixed and rekick 
Jenkins.

 Recover containers upon nodemanager restart
 ---

 Key: YARN-1337
 URL: https://issues.apache.org/jira/browse/YARN-1337
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1337-v1.patch


 To support work-preserving NM restart we need to recover the state of the 
 containers when the nodemanager went down.  This includes informing the RM of 
 containers that have exited in the interim and a strategy for dealing with 
 the exit codes from those containers along with how to reacquire the active 
 containers and determine their exit codes when they terminate.  The state of 
 finished containers also needs to be recovered.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1954) Add waitFor to AMRMClient(Async)


[ 
https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087690#comment-14087690
 ] 

Hadoop QA commented on YARN-1954:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660129/YARN-1954.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4535//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4535//console

This message is automatically generated.

 Add waitFor to AMRMClient(Async)
 

 Key: YARN-1954
 URL: https://issues.apache.org/jira/browse/YARN-1954
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 3.0.0, 2.4.0
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1954.1.patch, YARN-1954.2.patch, YARN-1954.3.patch, 
 YARN-1954.4.patch, YARN-1954.4.patch, YARN-1954.5.patch, YARN-1954.6.patch


 Recently, I saw some use cases of AMRMClient(Async). The painful thing is 
 that the main non-daemon thread has to sit in a dummy loop to prevent AM 
 process exiting before all the tasks are done, while unregistration is 
 triggered on a separate another daemon thread by callback methods (in 
 particular when using AMRMClientAsync). IMHO, it should be beneficial to add 
 a waitFor method to AMRMClient(Async) to block the AM until unregistration or 
 user supplied check point, such that users don't need to write the loop 
 themselves.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler

2014-08-06 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087723#comment-14087723
 ] 

Wangda Tan commented on YARN-2378:
--


Hi [~subru],
Thanks for uploading patch, I took a look at your patch.

As mentioned by [~vvasudev], there's an other JIRA (YARN-2248) related to 
moving. I think two JIRAs has different advantages, I hope you can decide how 
to merge your works.
- YARN-2378 covers RMApp related changes, which should be done while moving
- YARN-2248 covers more tests for queue-metrics.

I think another major difference is, YARN-2248 will check queue capacity before 
moving and YARN-2378 not. I had a discussion with [~curino] offline about this, 
here I paste what he said:
{code}
Imagine I have a busy cluster an want to migrate apps from queue A to queue B. 
Since we do not provide any transactional semantics from the CLI it would be 
quite hard to make sure I can move an app (even if I kill everything in a queue 
B, and then invoke move A-B, more apps might show up and crowd the target 
queue B before I can successfully move).   Having move to be more sturdy and 
succeed right away, and enhance preemption (if needed) to repair invariants 
seems a better option in this scenario.
I think preemption already would already enforce max capacity, other active 
JIRAs should deal with user-limit as well.
More generally I think eventually preemption can be our universal 
rebalancer/enforcer, allowing us to play a bit more fast an loose with 
move/resizing of queues.
{code}
I agree with this, another example is when refresh queue capacity, some queues 
may be shrunk to lower than its guaranteed/used resource. We will not stop such 
queue refresh, and preemption will also take care this.

Some other comments about YARN-2378
1) I think we should implement state store in move transition:
{code}
  // TODO: Write out change to state store (YARN-1558)
  // Also take care of RM failover
  moveEvent.getResult().set(null);
{code}

2) There’re lots of test failure, I’m afraid it broke some major logic, could 
you please check it?

Will include test review in next iteration.

Thanks,
Wangda

 Adding support for moving apps between queues in Capacity Scheduler
 ---

 Key: YARN-2378
 URL: https://issues.apache.org/jira/browse/YARN-2378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Subramaniam Venkatraman Krishnan
  Labels: capacity-scheduler
 Attachments: YARN-2378.patch


 As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 
 to smaller patches for manageability. This JIRA will address adding support 
 for moving apps between queues in Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1954) Add waitFor to AMRMClient(Async)


[ 
https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087732#comment-14087732
 ] 

Tsuyoshi OZAWA commented on YARN-1954:
--

It's ready for review. [~zjshen], could you review the latest patch?

 Add waitFor to AMRMClient(Async)
 

 Key: YARN-1954
 URL: https://issues.apache.org/jira/browse/YARN-1954
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 3.0.0, 2.4.0
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1954.1.patch, YARN-1954.2.patch, YARN-1954.3.patch, 
 YARN-1954.4.patch, YARN-1954.4.patch, YARN-1954.5.patch, YARN-1954.6.patch


 Recently, I saw some use cases of AMRMClient(Async). The painful thing is 
 that the main non-daemon thread has to sit in a dummy loop to prevent AM 
 process exiting before all the tasks are done, while unregistration is 
 triggered on a separate another daemon thread by callback methods (in 
 particular when using AMRMClientAsync). IMHO, it should be beneficial to add 
 a waitFor method to AMRMClient(Async) to block the AM until unregistration or 
 user supplied check point, such that users don't need to write the loop 
 themselves.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1177) Support automatic failover using ZKFC


 [ 
https://issues.apache.org/jira/browse/YARN-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1177:
---

Attachment: yarn-1177-ancient-version.patch

Here is an ancient version of the patch, that does *not* apply on the latest 
trunk. Posting in case anyone is particularly interested in taking this further 
before I get to it. 

 Support automatic failover using ZKFC
 -

 Key: YARN-1177
 URL: https://issues.apache.org/jira/browse/YARN-1177
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-1177-ancient-version.patch


 Prior to embedding leader election and failover controller in the RM 
 (YARN-1029), it might be a good idea to use ZKFC for a first-cut automatic 
 failover implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2359) Application is hung without timeout and retry after DNS/network is down.


[ 
https://issues.apache.org/jira/browse/YARN-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087760#comment-14087760
 ] 

Karthik Kambatla commented on YARN-2359:


+1. Will commit this later today if no one objects. 

 Application is hung without timeout and retry after DNS/network is down. 
 -

 Key: YARN-2359
 URL: https://issues.apache.org/jira/browse/YARN-2359
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-2359.000.patch, YARN-2359.001.patch, 
 YARN-2359.002.patch


 Application is hung without timeout and retry after DNS/network is down. 
 It is because right after the container is allocated for the AM, the 
 DNS/network is down for the node which has the AM container.
 The application attempt is at state RMAppAttemptState.SCHEDULED, it receive 
 RMAppAttemptEventType.CONTAINER_ALLOCATED event, because the 
 IllegalArgumentException(due to DNS error) happened, it stay at state 
 RMAppAttemptState.SCHEDULED. In the state machine, only two events will be 
 processed at this state:
 RMAppAttemptEventType.CONTAINER_ALLOCATED and RMAppAttemptEventType.KILL.
 The code didn't handle the event(RMAppAttemptEventType.CONTAINER_FINISHED) 
 which will be generated when the node and container timeout. So even the node 
 is removed, the Application is still hung in this state 
 RMAppAttemptState.SCHEDULED.
 The only way to make the application exit this state is to send 
 RMAppAttemptEventType.KILL event which will only be generated when you 
 manually kill the application from Job Client by forceKillApplication.
 To fix the issue, we should add an entry in the state machine table to handle 
 RMAppAttemptEventType.CONTAINER_FINISHED event at state 
 RMAppAttemptState.SCHEDULED
 add the following code in StateMachineFactory:
 {code}.addTransition(RMAppAttemptState.SCHEDULED, 
   RMAppAttemptState.FINAL_SAVING,
   RMAppAttemptEventType.CONTAINER_FINISHED,
   new FinalSavingTransition(
 new AMContainerCrashedBeforeRunningTransition(), 
 RMAppAttemptState.FAILED)){code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2130) Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext


 [ 
https://issues.apache.org/jira/browse/YARN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2130:
-

Attachment: YARN-2130.8.patch

 Cleanup: Adding getRMAppManager, getQueueACLsManager, 
 getApplicationACLsManager to RMContext
 

 Key: YARN-2130
 URL: https://issues.apache.org/jira/browse/YARN-2130
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2130.1.patch, YARN-2130.2.patch, YARN-2130.3.patch, 
 YARN-2130.4.patch, YARN-2130.5.patch, YARN-2130.6.patch, YARN-2130.7-2.patch, 
 YARN-2130.7.patch, YARN-2130.8.patch, YARN-2130.8.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2130) Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext


[ 
https://issues.apache.org/jira/browse/YARN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087762#comment-14087762
 ] 

Tsuyoshi OZAWA commented on YARN-2130:
--

{quote}
java.net.UnknownHostException: asf901.ygridcore.net: asf901.ygridcore.net
at java.net.InetAddress.getLocalHost(InetAddress.java:1402)
{quote}

The test failure looks strange - all failure reason is UnknownHostException. 
Let me kick CI with the same patch again.

 Cleanup: Adding getRMAppManager, getQueueACLsManager, 
 getApplicationACLsManager to RMContext
 

 Key: YARN-2130
 URL: https://issues.apache.org/jira/browse/YARN-2130
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2130.1.patch, YARN-2130.2.patch, YARN-2130.3.patch, 
 YARN-2130.4.patch, YARN-2130.5.patch, YARN-2130.6.patch, YARN-2130.7-2.patch, 
 YARN-2130.7.patch, YARN-2130.8.patch, YARN-2130.8.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2352) FairScheduler: Collect metrics on duration of critical methods that affect performance


[ 
https://issues.apache.org/jira/browse/YARN-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087765#comment-14087765
 ] 

Karthik Kambatla commented on YARN-2352:


The test failures seem unrelated and caused by java.net.UnknownHostException: 
asf901.ygridcore.net: asf901.ygridcore.net. YARN-1337 had a similar issue, and 
it appears it is due to the build machine. 

 FairScheduler: Collect metrics on duration of critical methods that affect 
 performance
 --

 Key: YARN-2352
 URL: https://issues.apache.org/jira/browse/YARN-2352
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.4.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: fs-perf-metrics.png, yarn-2352-1.patch, yarn-2352-2.patch


 We need more metrics for better visibility into FairScheduler performance. At 
 the least, we need to do this for (1) handle node events, (2) update, (3) 
 compute fairshares, (4) preemption.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2352) FairScheduler: Collect metrics on duration of critical methods that affect performance


 [ 
https://issues.apache.org/jira/browse/YARN-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2352:
---

Attachment: yarn-2352-2.patch

Uploading the same patch again to see if Jenkins would run this on a different 
machine. 

 FairScheduler: Collect metrics on duration of critical methods that affect 
 performance
 --

 Key: YARN-2352
 URL: https://issues.apache.org/jira/browse/YARN-2352
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.4.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: fs-perf-metrics.png, yarn-2352-1.patch, 
 yarn-2352-2.patch, yarn-2352-2.patch


 We need more metrics for better visibility into FairScheduler performance. At 
 the least, we need to do this for (1) handle node events, (2) update, (3) 
 compute fairshares, (4) preemption.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2352) FairScheduler: Collect metrics on duration of critical methods that affect performance


[ 
https://issues.apache.org/jira/browse/YARN-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087827#comment-14087827
 ] 

Tsuyoshi OZAWA commented on YARN-2352:
--

I found same test failure by java.net.UnknownHostException: 
asf901.ygridcore.net: asf901.ygridcore.net on YARN-2130.

 FairScheduler: Collect metrics on duration of critical methods that affect 
 performance
 --

 Key: YARN-2352
 URL: https://issues.apache.org/jira/browse/YARN-2352
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.4.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: fs-perf-metrics.png, yarn-2352-1.patch, 
 yarn-2352-2.patch, yarn-2352-2.patch


 We need more metrics for better visibility into FairScheduler performance. At 
 the least, we need to do this for (1) handle node events, (2) update, (3) 
 compute fairshares, (4) preemption.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2130) Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext


[ 
https://issues.apache.org/jira/browse/YARN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087843#comment-14087843
 ] 

Hadoop QA commented on YARN-2130:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660150/YARN-2130.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 17 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4536//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4536//console

This message is automatically generated.

 Cleanup: Adding getRMAppManager, getQueueACLsManager, 
 getApplicationACLsManager to RMContext
 

 Key: YARN-2130
 URL: https://issues.apache.org/jira/browse/YARN-2130
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2130.1.patch, YARN-2130.2.patch, YARN-2130.3.patch, 
 YARN-2130.4.patch, YARN-2130.5.patch, YARN-2130.6.patch, YARN-2130.7-2.patch, 
 YARN-2130.7.patch, YARN-2130.8.patch, YARN-2130.8.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2359) Application is hung without timeout and retry after DNS/network is down.


[ 
https://issues.apache.org/jira/browse/YARN-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087848#comment-14087848
 ] 

Tsuyoshi OZAWA commented on YARN-2359:
--

+1(non-binding), it looks good to me. Also ran tests and confirmed that it 
works.

 Application is hung without timeout and retry after DNS/network is down. 
 -

 Key: YARN-2359
 URL: https://issues.apache.org/jira/browse/YARN-2359
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-2359.000.patch, YARN-2359.001.patch, 
 YARN-2359.002.patch


 Application is hung without timeout and retry after DNS/network is down. 
 It is because right after the container is allocated for the AM, the 
 DNS/network is down for the node which has the AM container.
 The application attempt is at state RMAppAttemptState.SCHEDULED, it receive 
 RMAppAttemptEventType.CONTAINER_ALLOCATED event, because the 
 IllegalArgumentException(due to DNS error) happened, it stay at state 
 RMAppAttemptState.SCHEDULED. In the state machine, only two events will be 
 processed at this state:
 RMAppAttemptEventType.CONTAINER_ALLOCATED and RMAppAttemptEventType.KILL.
 The code didn't handle the event(RMAppAttemptEventType.CONTAINER_FINISHED) 
 which will be generated when the node and container timeout. So even the node 
 is removed, the Application is still hung in this state 
 RMAppAttemptState.SCHEDULED.
 The only way to make the application exit this state is to send 
 RMAppAttemptEventType.KILL event which will only be generated when you 
 manually kill the application from Job Client by forceKillApplication.
 To fix the issue, we should add an entry in the state machine table to handle 
 RMAppAttemptEventType.CONTAINER_FINISHED event at state 
 RMAppAttemptState.SCHEDULED
 add the following code in StateMachineFactory:
 {code}.addTransition(RMAppAttemptState.SCHEDULED, 
   RMAppAttemptState.FINAL_SAVING,
   RMAppAttemptEventType.CONTAINER_FINISHED,
   new FinalSavingTransition(
 new AMContainerCrashedBeforeRunningTransition(), 
 RMAppAttemptState.FAILED)){code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2130) Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext


[ 
https://issues.apache.org/jira/browse/YARN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087851#comment-14087851
 ] 

Tsuyoshi OZAWA commented on YARN-2130:
--

[~kkambatl], could you check the latest patch? I think it address all points 
you mentioned.

 Cleanup: Adding getRMAppManager, getQueueACLsManager, 
 getApplicationACLsManager to RMContext
 

 Key: YARN-2130
 URL: https://issues.apache.org/jira/browse/YARN-2130
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2130.1.patch, YARN-2130.2.patch, YARN-2130.3.patch, 
 YARN-2130.4.patch, YARN-2130.5.patch, YARN-2130.6.patch, YARN-2130.7-2.patch, 
 YARN-2130.7.patch, YARN-2130.8.patch, YARN-2130.8.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2352) FairScheduler: Collect metrics on duration of critical methods that affect performance


[ 
https://issues.apache.org/jira/browse/YARN-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087860#comment-14087860
 ] 

Hadoop QA commented on YARN-2352:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660151/yarn-2352-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4537//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4537//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4537//console

This message is automatically generated.

 FairScheduler: Collect metrics on duration of critical methods that affect 
 performance
 --

 Key: YARN-2352
 URL: https://issues.apache.org/jira/browse/YARN-2352
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.4.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: fs-perf-metrics.png, yarn-2352-1.patch, 
 yarn-2352-2.patch, yarn-2352-2.patch


 We need more metrics for better visibility into FairScheduler performance. At 
 the least, we need to do this for (1) handle node events, (2) update, (3) 
 compute fairshares, (4) preemption.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1177) Support automatic failover using ZKFC


 [ 
https://issues.apache.org/jira/browse/YARN-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1177:
---

Assignee: Wei Yan  (was: Karthik Kambatla)

 Support automatic failover using ZKFC
 -

 Key: YARN-1177
 URL: https://issues.apache.org/jira/browse/YARN-1177
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Wei Yan
 Attachments: yarn-1177-ancient-version.patch


 Prior to embedding leader election and failover controller in the RM 
 (YARN-1029), it might be a good idea to use ZKFC for a first-cut automatic 
 failover implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1177) Support automatic failover using ZKFC


[ 
https://issues.apache.org/jira/browse/YARN-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087873#comment-14087873
 ] 

Karthik Kambatla commented on YARN-1177:


[~ywskycn] - all yours. 

 Support automatic failover using ZKFC
 -

 Key: YARN-1177
 URL: https://issues.apache.org/jira/browse/YARN-1177
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Wei Yan
 Attachments: yarn-1177-ancient-version.patch


 Prior to embedding leader election and failover controller in the RM 
 (YARN-1029), it might be a good idea to use ZKFC for a first-cut automatic 
 failover implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1177) Support automatic failover using ZKFC

2014-08-06 Thread Wei Yan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087869#comment-14087869
 ] 

Wei Yan commented on YARN-1177:
---

hey, [~kasha], I'm interested in taking it. could you assign it to me?

 Support automatic failover using ZKFC
 -

 Key: YARN-1177
 URL: https://issues.apache.org/jira/browse/YARN-1177
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-1177-ancient-version.patch


 Prior to embedding leader election and failover controller in the RM 
 (YARN-1029), it might be a good idea to use ZKFC for a first-cut automatic 
 failover implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2352) FairScheduler: Collect metrics on duration of critical methods that affect performance

2014-08-06 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087887#comment-14087887
 ] 

Sandy Ryza commented on YARN-2352:
--

IIUC, this patch will only record the duration.  If we go that route, I think 
we should call these metrics lastNodeUpdateDuration etc..  However, would it 
make sense to go with an approach that records more historical information?  
For example, RPCMetrics uses a MutableRate to keep stats on the processing time 
for RPCs, and I think a similar model could work here.

Last, is there any need to make the FSPerfMetrics instance static?  Right now I 
think the Fair Scheduler has managed to avoid any mutable static variables. 

 FairScheduler: Collect metrics on duration of critical methods that affect 
 performance
 --

 Key: YARN-2352
 URL: https://issues.apache.org/jira/browse/YARN-2352
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.4.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: fs-perf-metrics.png, yarn-2352-1.patch, 
 yarn-2352-2.patch, yarn-2352-2.patch


 We need more metrics for better visibility into FairScheduler performance. At 
 the least, we need to do this for (1) handle node events, (2) update, (3) 
 compute fairshares, (4) preemption.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2359) Application is hung without timeout and retry after DNS/network is down.


[ 
https://issues.apache.org/jira/browse/YARN-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088002#comment-14088002
 ] 

Jian He commented on YARN-2359:
---

[~zxu],  thanks for working on it.  I have a question: 
bq. The application attempt is at state RMAppAttemptState.SCHEDULED, it receive 
RMAppAttemptEventType.CONTAINER_ALLOCATED event, because the 
IllegalArgumentException(due to DNS error) happened, it stay at state 
RMAppAttemptState.SCHEDULED. 
where in the code is the IllegalArgumentException thrown ?

 Application is hung without timeout and retry after DNS/network is down. 
 -

 Key: YARN-2359
 URL: https://issues.apache.org/jira/browse/YARN-2359
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-2359.000.patch, YARN-2359.001.patch, 
 YARN-2359.002.patch


 Application is hung without timeout and retry after DNS/network is down. 
 It is because right after the container is allocated for the AM, the 
 DNS/network is down for the node which has the AM container.
 The application attempt is at state RMAppAttemptState.SCHEDULED, it receive 
 RMAppAttemptEventType.CONTAINER_ALLOCATED event, because the 
 IllegalArgumentException(due to DNS error) happened, it stay at state 
 RMAppAttemptState.SCHEDULED. In the state machine, only two events will be 
 processed at this state:
 RMAppAttemptEventType.CONTAINER_ALLOCATED and RMAppAttemptEventType.KILL.
 The code didn't handle the event(RMAppAttemptEventType.CONTAINER_FINISHED) 
 which will be generated when the node and container timeout. So even the node 
 is removed, the Application is still hung in this state 
 RMAppAttemptState.SCHEDULED.
 The only way to make the application exit this state is to send 
 RMAppAttemptEventType.KILL event which will only be generated when you 
 manually kill the application from Job Client by forceKillApplication.
 To fix the issue, we should add an entry in the state machine table to handle 
 RMAppAttemptEventType.CONTAINER_FINISHED event at state 
 RMAppAttemptState.SCHEDULED
 add the following code in StateMachineFactory:
 {code}.addTransition(RMAppAttemptState.SCHEDULED, 
   RMAppAttemptState.FINAL_SAVING,
   RMAppAttemptEventType.CONTAINER_FINISHED,
   new FinalSavingTransition(
 new AMContainerCrashedBeforeRunningTransition(), 
 RMAppAttemptState.FAILED)){code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2374) YARN trunk build failing TestDistributedShell.testDSShell


[ 
https://issues.apache.org/jira/browse/YARN-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088017#comment-14088017
 ] 

Jian He commented on YARN-2374:
---

checking this in.

 YARN trunk build failing TestDistributedShell.testDSShell
 -

 Key: YARN-2374
 URL: https://issues.apache.org/jira/browse/YARN-2374
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2374.0.patch, apache-yarn-2374.1.patch, 
 apache-yarn-2374.2.patch, apache-yarn-2374.3.patch, apache-yarn-2374.4.patch


 The YARN trunk build has been failing for the last few days in the 
 distributed shell module.
 {noformat}
 testDSShell(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 27.269 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:188)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2359) Application is hung without timeout and retry after DNS/network is down.

2014-08-06 Thread zhihai xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088047#comment-14088047
 ] 

zhihai xu commented on YARN-2359:
-

[~jianhe] The code is in pullNewlyAllocatedContainersAndNMTokens of 
SchedulerApplicationAttempt.java
{code}
  try {
// create container token and NMToken altogether.
container.setContainerToken(rmContext.getContainerTokenSecretManager()
  .createContainerToken(container.getId(), container.getNodeId(),
getUser(), container.getResource(), container.getPriority(),
rmContainer.getCreationTime()));
NMToken nmToken =
rmContext.getNMTokenSecretManager().createAndGetNMToken(getUser(),
  getApplicationAttemptId(), container);
if (nmToken != null) {
  nmTokens.add(nmToken);
}
  } catch (IllegalArgumentException e) {
// DNS might be down, skip returning this container.
LOG.error(Error trying to assign container token and NM token to +
 an allocated container  + container.getId(), e);
continue;
  }
{code}

When IllegalArgumentException exception happened from createContainerToken, the 
code will skip the container.
Then zero container is returned in amContainerAllocation.
The following code in AMContainerAllocatedTransition in RMAppAttemptImpl.java 
will keep retry CONTAINER_ALLOCATED in SCHEDULED state.
So IllegalArgumentException will cause zero container returned in 
amContainerAllocation, which will cause RMAppAttemptImpl stay at state 
RMAppAttemptState.SCHEDULED.

{code}
 if (amContainerAllocation.getContainers().size() == 0) {
appAttempt.retryFetchingAMContainer(appAttempt);
return RMAppAttemptState.SCHEDULED;
  }
{code}

 Application is hung without timeout and retry after DNS/network is down. 
 -

 Key: YARN-2359
 URL: https://issues.apache.org/jira/browse/YARN-2359
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-2359.000.patch, YARN-2359.001.patch, 
 YARN-2359.002.patch


 Application is hung without timeout and retry after DNS/network is down. 
 It is because right after the container is allocated for the AM, the 
 DNS/network is down for the node which has the AM container.
 The application attempt is at state RMAppAttemptState.SCHEDULED, it receive 
 RMAppAttemptEventType.CONTAINER_ALLOCATED event, because the 
 IllegalArgumentException(due to DNS error) happened, it stay at state 
 RMAppAttemptState.SCHEDULED. In the state machine, only two events will be 
 processed at this state:
 RMAppAttemptEventType.CONTAINER_ALLOCATED and RMAppAttemptEventType.KILL.
 The code didn't handle the event(RMAppAttemptEventType.CONTAINER_FINISHED) 
 which will be generated when the node and container timeout. So even the node 
 is removed, the Application is still hung in this state 
 RMAppAttemptState.SCHEDULED.
 The only way to make the application exit this state is to send 
 RMAppAttemptEventType.KILL event which will only be generated when you 
 manually kill the application from Job Client by forceKillApplication.
 To fix the issue, we should add an entry in the state machine table to handle 
 RMAppAttemptEventType.CONTAINER_FINISHED event at state 
 RMAppAttemptState.SCHEDULED
 add the following code in StateMachineFactory:
 {code}.addTransition(RMAppAttemptState.SCHEDULED, 
   RMAppAttemptState.FINAL_SAVING,
   RMAppAttemptEventType.CONTAINER_FINISHED,
   new FinalSavingTransition(
 new AMContainerCrashedBeforeRunningTransition(), 
 RMAppAttemptState.FAILED)){code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2374) YARN trunk build failing TestDistributedShell.testDSShell


[ 
https://issues.apache.org/jira/browse/YARN-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088060#comment-14088060
 ] 

Hudson commented on YARN-2374:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6023 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6023/])
YARN-2374. Fixed TestDistributedShell#testDSShell failure due to hostname 
dismatch. Contributed by Varun Vasudev (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616302)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java


 YARN trunk build failing TestDistributedShell.testDSShell
 -

 Key: YARN-2374
 URL: https://issues.apache.org/jira/browse/YARN-2374
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.6.0

 Attachments: apache-yarn-2374.0.patch, apache-yarn-2374.1.patch, 
 apache-yarn-2374.2.patch, apache-yarn-2374.3.patch, apache-yarn-2374.4.patch


 The YARN trunk build has been failing for the last few days in the 
 distributed shell module.
 {noformat}
 testDSShell(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 27.269 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:188)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2387) Resource Manager crashes with NPE due to lack of synchronization

2014-08-06 Thread Mit Desai (JIRA)

Mit Desai created YARN-2387:
---

 Summary: Resource Manager crashes with NPE due to lack of 
synchronization
 Key: YARN-2387
 URL: https://issues.apache.org/jira/browse/YARN-2387
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Mit Desai
Assignee: Mit Desai


We recently came across a 0.23 RM crashing with an NPE. Here is the stacktrace 
for it.

{noformat}
2014-08-06 05:56:52,165 [ResourceManager Event Processor] FATAL
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in
handling event type NODE_UPDATE to the scheduler
java.lang.NullPointerException
at
org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.mergeLocalToBuilder(ContainerStatusPBImpl.java:61)
at
org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.mergeLocalToProto(ContainerStatusPBImpl.java:68)
at
org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.getProto(ContainerStatusPBImpl.java:53)
at
org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.getProto(ContainerStatusPBImpl.java:34)
at
org.apache.hadoop.yarn.api.records.ProtoBase.toString(ProtoBase.java:55)
at java.lang.String.valueOf(String.java:2854)
at java.lang.StringBuilder.append(StringBuilder.java:128)
at
org.apache.hadoop.yarn.api.records.impl.pb.ContainerPBImpl.toString(ContainerPBImpl.java:353)
at java.lang.String.valueOf(String.java:2854)
at java.lang.StringBuilder.append(StringBuilder.java:128)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1405)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:790)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:602)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:688)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:82)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:339)
at java.lang.Thread.run(Thread.java:722)
2014-08-06 05:56:52,166 [ResourceManager Event Processor] INFO
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
{noformat}

On investigating a on the issue we found that the ContainerStatusPBImpl has 
methods that are called by different threads and are not synchronized. Even the 
2.X code looks alike.

We need to make these methods synchronized so that we do not encounter this 
problem in future.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2288) Data persistent in timelinestore should be versioned


[ 
https://issues.apache.org/jira/browse/YARN-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088085#comment-14088085
 ] 

Zhijie Shen commented on YARN-2288:
---

TestTimelineWebServices fails on trunk, it seems to be broken by HADOOP-10791. 
I'll file a separate ticket.

 Data persistent in timelinestore should be versioned
 

 Key: YARN-2288
 URL: https://issues.apache.org/jira/browse/YARN-2288
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.4.1
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2288-v2.patch, YARN-2288.patch


 We have LevelDB-backed TimelineStore, it should have schema version for 
 changes in schema in future.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2388) TestTimelineWebServices fails on trunk after HADOOP-10791

Zhijie Shen created YARN-2388:
-

 Summary: TestTimelineWebServices fails on trunk after HADOOP-10791
 Key: YARN-2388
 URL: https://issues.apache.org/jira/browse/YARN-2388
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen
Assignee: Zhijie Shen


https://builds.apache.org/job/PreCommit-YARN-Build/4530//testReport/



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2388) TestTimelineWebServices fails on trunk after HADOOP-10791


 [ 
https://issues.apache.org/jira/browse/YARN-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2388:
--

Attachment: YARN-2388.1.patch

Make a quick fix for the test failure.

 TestTimelineWebServices fails on trunk after HADOOP-10791
 -

 Key: YARN-2388
 URL: https://issues.apache.org/jira/browse/YARN-2388
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2388.1.patch


 https://builds.apache.org/job/PreCommit-YARN-Build/4530//testReport/



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2288) Data persistent in timelinestore should be versioned

2014-08-06 Thread Subramaniam Venkatraman Krishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088107#comment-14088107
 ] 

Zhijie Shen commented on YARN-2288:
---

bq. If objects in store will get lost after TS restart, we don't need it. What 
do you think?

I neglect the fact of being persisted. I agree on it.

bq. Do we have plan to persistent MemoryTimelineStore?

At least we're going to have a HbaseTimelineStore. CURRENT_VERSION_INFO can is 
case-by-case for each impl, but TS_STORE_VERSION_KEY is going to be a common 
constant across different impls. In addition,  TS_STORE_VERSION_KEY - 
TIMELINE_STORE_VERSION_KEY?

some other nits:

1. T - t?
{code}
+  Incompatible version for Timeline store: expecting version  
{code}

2. Unnecessary change?
{code}
-  @SuppressWarnings(resource)
{code}

Other than that, I think the patch is good to go.

 Data persistent in timelinestore should be versioned
 

 Key: YARN-2288
 URL: https://issues.apache.org/jira/browse/YARN-2288
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.4.1
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2288-v2.patch, YARN-2288.patch


 We have LevelDB-backed TimelineStore, it should have schema version for 
 changes in schema in future.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2389) Adding support for drainig a queue, ie killing all apps in the queue

Subramaniam Venkatraman Krishnan created YARN-2389:
--

 Summary: Adding support for drainig a queue, ie killing all apps 
in the queue
 Key: YARN-2389
 URL: https://issues.apache.org/jira/browse/YARN-2389
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Karthik Kambatla


This is a parallel JIRA to YARN-2378. Fair scheduler already supports moving a 
single application from one queue to another. This will add support to move all 
applications from the specified source queue to target.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2385) Adding support for listing all applications in a queue

2014-08-06 Thread Subramaniam Venkatraman Krishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Venkatraman Krishnan updated YARN-2385:
---

Component/s: capacityscheduler

 Adding support for listing all applications in a queue
 --

 Key: YARN-2385
 URL: https://issues.apache.org/jira/browse/YARN-2385
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Karthik Kambatla
  Labels: abstractyarnscheduler

 This is a parallel JIRA to YARN-2378. Fair scheduler already supports moving 
 a single application from one queue to another. This will add support to move 
 all applications from the specified source queue to target.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2385) Adding support for listing all applications in a queue

2014-08-06 Thread Subramaniam Venkatraman Krishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Venkatraman Krishnan updated YARN-2385:
---

Summary: Adding support for listing all applications in a queue  (was: 
Adding support for move all applications from a source queue to destination 
queue)

 Adding support for listing all applications in a queue
 --

 Key: YARN-2385
 URL: https://issues.apache.org/jira/browse/YARN-2385
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Karthik Kambatla
  Labels: abstractyarnscheduler

 This is a parallel JIRA to YARN-2378. Fair scheduler already supports moving 
 a single application from one queue to another. This will add support to move 
 all applications from the specified source queue to target.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2385) Adding support for listing all applications in a queue

2014-08-06 Thread Subramaniam Venkatraman Krishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Venkatraman Krishnan updated YARN-2385:
---

Labels: abstractyarnscheduler  (was: fairscheduler)

 Adding support for listing all applications in a queue
 --

 Key: YARN-2385
 URL: https://issues.apache.org/jira/browse/YARN-2385
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Karthik Kambatla
  Labels: abstractyarnscheduler

 This is a parallel JIRA to YARN-2378. Fair scheduler already supports moving 
 a single application from one queue to another. This will add support to move 
 all applications from the specified source queue to target.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2389) Adding support for drainig a queue, ie killing all apps in the queue

2014-08-06 Thread Subramaniam Venkatraman Krishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Venkatraman Krishnan updated YARN-2389:
---

Labels: capacity-scheduler fairscheduler  (was: fairscheduler)

 Adding support for drainig a queue, ie killing all apps in the queue
 

 Key: YARN-2389
 URL: https://issues.apache.org/jira/browse/YARN-2389
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Karthik Kambatla
  Labels: capacity-scheduler, fairscheduler

 This is a parallel JIRA to YARN-2378. Fair scheduler already supports moving 
 a single application from one queue to another. This will add support to move 
 all applications from the specified source queue to target.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2385) Adding support for listing all applications in a queue

2014-08-06 Thread Subramaniam Venkatraman Krishnan (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Subramaniam Venkatraman Krishnan updated YARN-2385:
---

Description: This JIRA proposes adding a method in AbstractYarnScheduler to
get all the pending/active applications. Fair scheduler already supports moving
a single application from one queue to another. Support for the same is being
added to Capacity Scheduler as part of YARN-2378 and YARN-2248. So with the
addition of this method, we can transparently add support for moving all
applications from source queue to target queue and draining a queue, i.e.
killing all applications in a queue as proposed by YARN-2389 (was: This is a
parallel JIRA to YARN-2378. Fair scheduler already supports moving a single
application from one queue to another. This will add support to move all
applications from the specified source queue to target.)

Adding support for listing all applications in a queue
--

Key: YARN-2385
URL: https://issues.apache.org/jira/browse/YARN-2385
Project: Hadoop YARN
Issue Type: Sub-task
Components: capacityscheduler, fairscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Karthik Kambatla
Labels: abstractyarnscheduler

This JIRA proposes adding a method in AbstractYarnScheduler to get all the
pending/active applications. Fair scheduler already supports moving a single
application from one queue to another. Support for the same is being added to
Capacity Scheduler as part of YARN-2378 and YARN-2248. So with the addition
of this method, we can transparently add support for moving all applications
from source queue to target queue and draining a queue, i.e. killing all
applications in a queue as proposed by YARN-2389

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2389) Adding support for drainig a queue, ie killing all apps in the queue

2014-08-06 Thread Subramaniam Venkatraman Krishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Venkatraman Krishnan updated YARN-2389:
---

Description: This is a parallel JIRA to YARN-2378. Fair scheduler already 
supports moving a single application from one queue to another. This will add 
support to move all applications from the specified source queue to target. 
This will use YARN-2385 so will work for both Capacity  Fair scheduler.  (was: 
This is a parallel JIRA to YARN-2378. Fair scheduler already supports moving a 
single application from one queue to another. This will add support to move all 
applications from the specified source queue to target.)

 Adding support for drainig a queue, ie killing all apps in the queue
 

 Key: YARN-2389
 URL: https://issues.apache.org/jira/browse/YARN-2389
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Karthik Kambatla
  Labels: capacity-scheduler, fairscheduler

 This is a parallel JIRA to YARN-2378. Fair scheduler already supports moving 
 a single application from one queue to another. This will add support to move 
 all applications from the specified source queue to target. This will use 
 YARN-2385 so will work for both Capacity  Fair scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2389) Adding support for drainig a queue, ie killing all apps in the queue

2014-08-06 Thread Subramaniam Venkatraman Krishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Venkatraman Krishnan updated YARN-2389:
---

Component/s: capacityscheduler

 Adding support for drainig a queue, ie killing all apps in the queue
 

 Key: YARN-2389
 URL: https://issues.apache.org/jira/browse/YARN-2389
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Karthik Kambatla
  Labels: capacity-scheduler, fairscheduler

 This is a parallel JIRA to YARN-2378. Fair scheduler already supports moving 
 a single application from one queue to another. This will add support to move 
 all applications from the specified source queue to target.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-2389) Adding support for drainig a queue, ie killing all apps in the queue

2014-08-06 Thread Subramaniam Venkatraman Krishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Venkatraman Krishnan reassigned YARN-2389:
--

Assignee: Subramaniam Venkatraman Krishnan  (was: Karthik Kambatla)

 Adding support for drainig a queue, ie killing all apps in the queue
 

 Key: YARN-2389
 URL: https://issues.apache.org/jira/browse/YARN-2389
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Subramaniam Venkatraman Krishnan
  Labels: capacity-scheduler, fairscheduler

 This is a parallel JIRA to YARN-2378. Fair scheduler already supports moving 
 a single application from one queue to another. This will add support to move 
 all applications from the specified source queue to target. This will use 
 YARN-2385 so will work for both Capacity  Fair scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2374) YARN trunk build failing TestDistributedShell.testDSShell

2014-08-06 Thread Subramaniam Venkatraman Krishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088121#comment-14088121
 ] 

Hudson commented on YARN-2374:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #636 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/636/])
YARN-2374. Fixed TestDistributedShell#testDSShell failure due to hostname 
dismatch. Contributed by Varun Vasudev (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616302)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java


 YARN trunk build failing TestDistributedShell.testDSShell
 -

 Key: YARN-2374
 URL: https://issues.apache.org/jira/browse/YARN-2374
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.6.0

 Attachments: apache-yarn-2374.0.patch, apache-yarn-2374.1.patch, 
 apache-yarn-2374.2.patch, apache-yarn-2374.3.patch, apache-yarn-2374.4.patch


 The YARN trunk build has been failing for the last few days in the 
 distributed shell module.
 {noformat}
 testDSShell(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 27.269 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:188)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2248) Capacity Scheduler changes for moving apps between queues


[ 
https://issues.apache.org/jira/browse/YARN-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088137#comment-14088137
 ] 

Subramaniam Venkatraman Krishnan commented on YARN-2248:


Hi [~keyki], we have been working on adding support for move for sometime in 
Capacity Scheduler as part of YARN-2378 (originally YARN-1707) and [~vvasudev] 
was kind enough to point out that you were doing the same. To prevent 
duplication, I suggest we merge our work.

I looked at your patch  we are doing essentially the same thing (which was 
good validation for both of us :)). Based on [~leftnoteasy]'s [feedback | 
https://issues.apache.org/jira/browse/YARN-2378?focusedCommentId=14087723], I 
think it would be easiest if I merged your metrics test with the patch I have. 
Would that be OK?

 Capacity Scheduler changes for moving apps between queues
 -

 Key: YARN-2248
 URL: https://issues.apache.org/jira/browse/YARN-2248
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Janos Matyas
Assignee: Janos Matyas
 Fix For: 2.6.0

 Attachments: YARN-2248-1.patch, YARN-2248-2.patch, YARN-2248-3.patch


 We would like to have the capability (same as the Fair Scheduler has) to move 
 applications between queues. 
 We have made a baseline implementation and tests to start with - and we would 
 like the community to review, come up with suggestions and finally have this 
 contributed. 
 The current implementation is available for 2.4.1 - so the first thing is 
 that we'd need to identify the target version as there are differences 
 between 2.4.* and 3.* interfaces.
 The story behind is available at 
 http://blog.sequenceiq.com/blog/2014/07/02/move-applications-between-queues/ 
 and the baseline implementation and test at:
 https://github.com/sequenceiq/hadoop-common/blob/branch-2.4.1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/a/ExtendedCapacityScheduler.java#L924
 https://github.com/sequenceiq/hadoop-common/blob/branch-2.4.1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/a/TestExtendedCapacitySchedulerAppMove.java



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler

2014-08-06 Thread Subramaniam Venkatraman Krishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088146#comment-14088146
 ] 

Subramaniam Venkatraman Krishnan commented on YARN-2378:


Thanks [~vvasudev] for pointing out the parallel work and [~leftnoteasy] for 
your feedback.

I agree we should merge both  have a [proposal | 
https://issues.apache.org/jira/browse/YARN-2248?focusedCommentId=14088137] 
based on your review.

About your comments on the patch I uploaded:
  * Thanks for clarifying on not requiring to check capacity before move.
  * I will look at the implementing state store in move transition.
  * Will look at the test failures  fix them, my bad.

 Adding support for moving apps between queues in Capacity Scheduler
 ---

 Key: YARN-2378
 URL: https://issues.apache.org/jira/browse/YARN-2378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Subramaniam Venkatraman Krishnan
  Labels: capacity-scheduler
 Attachments: YARN-2378.patch


 As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 
 to smaller patches for manageability. This JIRA will address adding support 
 for moving apps between queues in Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2359) Application is hung without timeout and retry after DNS/network is down.


[ 
https://issues.apache.org/jira/browse/YARN-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088161#comment-14088161
 ] 

Jian He commented on YARN-2359:
---

I see, thanks for your explanation. looks good to me too

 Application is hung without timeout and retry after DNS/network is down. 
 -

 Key: YARN-2359
 URL: https://issues.apache.org/jira/browse/YARN-2359
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-2359.000.patch, YARN-2359.001.patch, 
 YARN-2359.002.patch


 Application is hung without timeout and retry after DNS/network is down. 
 It is because right after the container is allocated for the AM, the 
 DNS/network is down for the node which has the AM container.
 The application attempt is at state RMAppAttemptState.SCHEDULED, it receive 
 RMAppAttemptEventType.CONTAINER_ALLOCATED event, because the 
 IllegalArgumentException(due to DNS error) happened, it stay at state 
 RMAppAttemptState.SCHEDULED. In the state machine, only two events will be 
 processed at this state:
 RMAppAttemptEventType.CONTAINER_ALLOCATED and RMAppAttemptEventType.KILL.
 The code didn't handle the event(RMAppAttemptEventType.CONTAINER_FINISHED) 
 which will be generated when the node and container timeout. So even the node 
 is removed, the Application is still hung in this state 
 RMAppAttemptState.SCHEDULED.
 The only way to make the application exit this state is to send 
 RMAppAttemptEventType.KILL event which will only be generated when you 
 manually kill the application from Job Client by forceKillApplication.
 To fix the issue, we should add an entry in the state machine table to handle 
 RMAppAttemptEventType.CONTAINER_FINISHED event at state 
 RMAppAttemptState.SCHEDULED
 add the following code in StateMachineFactory:
 {code}.addTransition(RMAppAttemptState.SCHEDULED, 
   RMAppAttemptState.FINAL_SAVING,
   RMAppAttemptEventType.CONTAINER_FINISHED,
   new FinalSavingTransition(
 new AMContainerCrashedBeforeRunningTransition(), 
 RMAppAttemptState.FAILED)){code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2248) Capacity Scheduler changes for moving apps between queues

2014-08-06 Thread Krisztian Horvath (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088185#comment-14088185
 ] 

Krisztian Horvath commented on YARN-2248:
-

Hi,

As long as we don't break the functionality we can merge them and try to take 
the best out of them, so yes. Have you tried your patch with the queue metrics 
test, yet?

 Capacity Scheduler changes for moving apps between queues
 -

 Key: YARN-2248
 URL: https://issues.apache.org/jira/browse/YARN-2248
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Janos Matyas
Assignee: Janos Matyas
 Fix For: 2.6.0

 Attachments: YARN-2248-1.patch, YARN-2248-2.patch, YARN-2248-3.patch


 We would like to have the capability (same as the Fair Scheduler has) to move 
 applications between queues. 
 We have made a baseline implementation and tests to start with - and we would 
 like the community to review, come up with suggestions and finally have this 
 contributed. 
 The current implementation is available for 2.4.1 - so the first thing is 
 that we'd need to identify the target version as there are differences 
 between 2.4.* and 3.* interfaces.
 The story behind is available at 
 http://blog.sequenceiq.com/blog/2014/07/02/move-applications-between-queues/ 
 and the baseline implementation and test at:
 https://github.com/sequenceiq/hadoop-common/blob/branch-2.4.1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/a/ExtendedCapacityScheduler.java#L924
 https://github.com/sequenceiq/hadoop-common/blob/branch-2.4.1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/a/TestExtendedCapacitySchedulerAppMove.java



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1337) Recover containers upon nodemanager restart


[ 
https://issues.apache.org/jira/browse/YARN-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088204#comment-14088204
 ] 

Hadoop QA commented on YARN-1337:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12659958/YARN-1337-v1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4538//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4538//console

This message is automatically generated.

 Recover containers upon nodemanager restart
 ---

 Key: YARN-1337
 URL: https://issues.apache.org/jira/browse/YARN-1337
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1337-v1.patch


 To support work-preserving NM restart we need to recover the state of the 
 containers when the nodemanager went down.  This includes informing the RM of 
 containers that have exited in the interim and a strategy for dealing with 
 the exit codes from those containers along with how to reacquire the active 
 containers and determine their exit codes when they terminate.  The state of 
 finished containers also needs to be recovered.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2388) TestTimelineWebServices fails on trunk after HADOOP-10791


[ 
https://issues.apache.org/jira/browse/YARN-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088238#comment-14088238
 ] 

Hadoop QA commented on YARN-2388:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660201/YARN-2388.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4539//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4539//console

This message is automatically generated.

 TestTimelineWebServices fails on trunk after HADOOP-10791
 -

 Key: YARN-2388
 URL: https://issues.apache.org/jira/browse/YARN-2388
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2388.1.patch


 https://builds.apache.org/job/PreCommit-YARN-Build/4530//testReport/



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2388) TestTimelineWebServices fails on trunk after HADOOP-10791

2014-08-06 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088243#comment-14088243
 ] 

Xuan Gong commented on YARN-2388:
-

+1 LGTM

 TestTimelineWebServices fails on trunk after HADOOP-10791
 -

 Key: YARN-2388
 URL: https://issues.apache.org/jira/browse/YARN-2388
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2388.1.patch


 https://builds.apache.org/job/PreCommit-YARN-Build/4530//testReport/



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure

2014-08-06 Thread Craig Welch (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2008:
--

Attachment: YARN-2008.8.patch

Make ResourceCalculator.isInvalidDivisor abstract, move (correct) impls into 
Default and Dominant, checking for 0 mem and 0 mem or 0 vcore, respectively

 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Craig Welch
 Attachments: YARN-2008.1.patch, YARN-2008.2.patch, YARN-2008.3.patch, 
 YARN-2008.4.patch, YARN-2008.5.patch, YARN-2008.6.patch, YARN-2008.7.patch, 
 YARN-2008.8.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-1488) Allow containers to delegate resources to another container

2014-08-06 Thread Arun C Murthy (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned YARN-1488:
---

Assignee: Arun C Murthy

 Allow containers to delegate resources to another container
 ---

 Key: YARN-1488
 URL: https://issues.apache.org/jira/browse/YARN-1488
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 We should allow containers to delegate resources to another container. This 
 would allow external frameworks to share not just YARN's resource-management 
 capabilities but also it's workload-management capabilities.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1488) Allow containers to delegate resources to another container

2014-08-06 Thread Arun C Murthy (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088309#comment-14088309
 ] 

Arun C Murthy commented on YARN-1488:
-

I have an early patch I'll share shortly, this feature ask is coming up in a 
lot of places and has generated lots of interest.

 Allow containers to delegate resources to another container
 ---

 Key: YARN-1488
 URL: https://issues.apache.org/jira/browse/YARN-1488
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 We should allow containers to delegate resources to another container. This 
 would allow external frameworks to share not just YARN's resource-management 
 capabilities but also it's workload-management capabilities.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically


[ 
https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088346#comment-14088346
 ] 

Jian He commented on YARN-2212:
---

looks good overall.  minor comments:
- AllocateResponse#newInstance:  the first newInstance should not be changed, 
it’s marked stable
-  // Should have exception: check exception type 

 ApplicationMaster needs to find a way to update the AMRMToken periodically
 --

 Key: YARN-2212
 URL: https://issues.apache.org/jira/browse/YARN-2212
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2212.1.patch, YARN-2212.2.patch, 
 YARN-2212.3.1.patch, YARN-2212.3.patch, YARN-2212.4.patch, YARN-2212.5.patch, 
 YARN-2212.5.patch, YARN-2212.5.rebase.patch, YARN-2212.6.patch, 
 YARN-2212.6.patch, YARN-2212.7.patch, YARN-2212.7.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically

2014-08-06 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2212:


Attachment: YARN-2212.8.patch

 ApplicationMaster needs to find a way to update the AMRMToken periodically
 --

 Key: YARN-2212
 URL: https://issues.apache.org/jira/browse/YARN-2212
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2212.1.patch, YARN-2212.2.patch, 
 YARN-2212.3.1.patch, YARN-2212.3.patch, YARN-2212.4.patch, YARN-2212.5.patch, 
 YARN-2212.5.patch, YARN-2212.5.rebase.patch, YARN-2212.6.patch, 
 YARN-2212.6.patch, YARN-2212.7.patch, YARN-2212.7.patch, YARN-2212.8.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically

2014-08-06 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088447#comment-14088447
 ] 

Xuan Gong commented on YARN-2212:
-

bq. AllocateResponse#newInstance: the first newInstance should not be changed, 
it’s marked stable

FIXED

bq. // Should have exception: check exception type

FIXED

 ApplicationMaster needs to find a way to update the AMRMToken periodically
 --

 Key: YARN-2212
 URL: https://issues.apache.org/jira/browse/YARN-2212
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2212.1.patch, YARN-2212.2.patch, 
 YARN-2212.3.1.patch, YARN-2212.3.patch, YARN-2212.4.patch, YARN-2212.5.patch, 
 YARN-2212.5.patch, YARN-2212.5.rebase.patch, YARN-2212.6.patch, 
 YARN-2212.6.patch, YARN-2212.7.patch, YARN-2212.7.patch, YARN-2212.8.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure


[ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088470#comment-14088470
 ] 

Hadoop QA commented on YARN-2008:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660231/YARN-2008.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4540//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4540//console

This message is automatically generated.

 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Craig Welch
 Attachments: YARN-2008.1.patch, YARN-2008.2.patch, YARN-2008.3.patch, 
 YARN-2008.4.patch, YARN-2008.5.patch, YARN-2008.6.patch, YARN-2008.7.patch, 
 YARN-2008.8.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2352) FairScheduler: Collect metrics on duration of critical methods that affect performance


 [ 
https://issues.apache.org/jira/browse/YARN-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2352:
---

Attachment: yarn-2352-3.patch

Thanks Sandy for pointing me to RpcMetrics.

MutableRate seemed a good candidate for the stats that we want to collect. 
Updated the patch to use that. For MutableRate, I have enabled showing extended 
stats like stdev, min/max etc. by default. In the future, we can add a config 
to toggle this if we see any particular overhead. 

Regarding using a Singleton, if I don't do this, the tests fail complaining of 
already existing metrics for FSDurations. Even QueueMetrics has a static map 
that it re-uses. 

 FairScheduler: Collect metrics on duration of critical methods that affect 
 performance
 --

 Key: YARN-2352
 URL: https://issues.apache.org/jira/browse/YARN-2352
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.4.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: fs-perf-metrics.png, yarn-2352-1.patch, 
 yarn-2352-2.patch, yarn-2352-2.patch, yarn-2352-3.patch


 We need more metrics for better visibility into FairScheduler performance. At 
 the least, we need to do this for (1) handle node events, (2) update, (3) 
 compute fairshares, (4) preemption.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2359) Application is hung without timeout and retry after DNS/network is down.


[ 
https://issues.apache.org/jira/browse/YARN-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088479#comment-14088479
 ] 

Karthik Kambatla commented on YARN-2359:


Checking this in..

 Application is hung without timeout and retry after DNS/network is down. 
 -

 Key: YARN-2359
 URL: https://issues.apache.org/jira/browse/YARN-2359
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-2359.000.patch, YARN-2359.001.patch, 
 YARN-2359.002.patch


 Application is hung without timeout and retry after DNS/network is down. 
 It is because right after the container is allocated for the AM, the 
 DNS/network is down for the node which has the AM container.
 The application attempt is at state RMAppAttemptState.SCHEDULED, it receive 
 RMAppAttemptEventType.CONTAINER_ALLOCATED event, because the 
 IllegalArgumentException(due to DNS error) happened, it stay at state 
 RMAppAttemptState.SCHEDULED. In the state machine, only two events will be 
 processed at this state:
 RMAppAttemptEventType.CONTAINER_ALLOCATED and RMAppAttemptEventType.KILL.
 The code didn't handle the event(RMAppAttemptEventType.CONTAINER_FINISHED) 
 which will be generated when the node and container timeout. So even the node 
 is removed, the Application is still hung in this state 
 RMAppAttemptState.SCHEDULED.
 The only way to make the application exit this state is to send 
 RMAppAttemptEventType.KILL event which will only be generated when you 
 manually kill the application from Job Client by forceKillApplication.
 To fix the issue, we should add an entry in the state machine table to handle 
 RMAppAttemptEventType.CONTAINER_FINISHED event at state 
 RMAppAttemptState.SCHEDULED
 add the following code in StateMachineFactory:
 {code}.addTransition(RMAppAttemptState.SCHEDULED, 
   RMAppAttemptState.FINAL_SAVING,
   RMAppAttemptEventType.CONTAINER_FINISHED,
   new FinalSavingTransition(
 new AMContainerCrashedBeforeRunningTransition(), 
 RMAppAttemptState.FAILED)){code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2359) Application hangs when it fails to launch AM container


 [ 
https://issues.apache.org/jira/browse/YARN-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2359:
---

Summary: Application hangs when it fails to launch AM container   (was: 
Application is hung without timeout and retry after DNS/network is down. )

 Application hangs when it fails to launch AM container 
 ---

 Key: YARN-2359
 URL: https://issues.apache.org/jira/browse/YARN-2359
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-2359.000.patch, YARN-2359.001.patch, 
 YARN-2359.002.patch


 Application is hung without timeout and retry after DNS/network is down. 
 It is because right after the container is allocated for the AM, the 
 DNS/network is down for the node which has the AM container.
 The application attempt is at state RMAppAttemptState.SCHEDULED, it receive 
 RMAppAttemptEventType.CONTAINER_ALLOCATED event, because the 
 IllegalArgumentException(due to DNS error) happened, it stay at state 
 RMAppAttemptState.SCHEDULED. In the state machine, only two events will be 
 processed at this state:
 RMAppAttemptEventType.CONTAINER_ALLOCATED and RMAppAttemptEventType.KILL.
 The code didn't handle the event(RMAppAttemptEventType.CONTAINER_FINISHED) 
 which will be generated when the node and container timeout. So even the node 
 is removed, the Application is still hung in this state 
 RMAppAttemptState.SCHEDULED.
 The only way to make the application exit this state is to send 
 RMAppAttemptEventType.KILL event which will only be generated when you 
 manually kill the application from Job Client by forceKillApplication.
 To fix the issue, we should add an entry in the state machine table to handle 
 RMAppAttemptEventType.CONTAINER_FINISHED event at state 
 RMAppAttemptState.SCHEDULED
 add the following code in StateMachineFactory:
 {code}.addTransition(RMAppAttemptState.SCHEDULED, 
   RMAppAttemptState.FINAL_SAVING,
   RMAppAttemptEventType.CONTAINER_FINISHED,
   new FinalSavingTransition(
 new AMContainerCrashedBeforeRunningTransition(), 
 RMAppAttemptState.FAILED)){code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure

2014-08-06 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088488#comment-14088488
 ] 

Craig Welch commented on YARN-2008:
---

TestAMRestart passes on my box with the changes, build server issue?

 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Craig Welch
 Attachments: YARN-2008.1.patch, YARN-2008.2.patch, YARN-2008.3.patch, 
 YARN-2008.4.patch, YARN-2008.5.patch, YARN-2008.6.patch, YARN-2008.7.patch, 
 YARN-2008.8.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2249) RM may receive container release request on AM resync before container is actually recovered

2014-08-06 Thread Subramaniam Venkatraman Krishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2249:
--

Attachment: YARN-2249.1.patch

Instead of making client side changes, changed RM to cache the outstanding 
release request. And the container won't be recovered if the container remains 
in the cache. The cache will be cleaned after NM expire interval if no such 
container is received by RM for recovery.

Uploaded a patch based on that.

 RM may receive container release request on AM resync before container is 
 actually recovered
 

 Key: YARN-2249
 URL: https://issues.apache.org/jira/browse/YARN-2249
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2249.1.patch


 AM resync on RM restart will send outstanding container release requests back 
 to the new RM. In the meantime, NMs report the container statuses back to RM 
 to recover the containers. If RM receives the container release request  
 before the container is actually recovered in scheduler, the container won't 
 be released and the release request will be lost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-08-06 Thread Eric Payne (JIRA)

[
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eric Payne updated YARN-415:

Attachment: YARN-415.201408062232.txt

[~leftnoteasy], Thank you for your suggestions. I added an end-to-end unit test
that covered most of your points. However, I had trouble setting up a test with
more than one attempt for the same app. I think I covered the rest.

Capture memory utilization at the app-level for chargeback
--

Key: YARN-415
URL: https://issues.apache.org/jira/browse/YARN-415
Project: Hadoop YARN
Issue Type: New Feature
Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
Attachments: YARN-415--n10.patch, YARN-415--n2.patch,
YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch,
YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch,
YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt,
YARN-415.201406262136.txt, YARN-415.201407042037.txt,
YARN-415.201407071542.txt, YARN-415.201407171553.txt,
YARN-415.201407172144.txt, YARN-415.201407232237.txt,
YARN-415.201407242148.txt, YARN-415.201407281816.txt,
YARN-415.201408062232.txt, YARN-415.patch

For the purpose of chargeback, I'd like to be able to compute the cost of an
application in terms of cluster resource usage. To start out, I'd like to
get the memory utilization of an application. The unit should be MB-seconds
or something similar and, from a chargeback perspective, the memory amount
should be the memory reserved for the application, as even if the app didn't
use all that memory, no one else was able to use it.
(reserved ram for container 1 * lifetime of container 1) + (reserved ram for
container 2 * lifetime of container 2) + ... + (reserved ram for container n
* lifetime of container n)
It'd be nice to have this at the app level instead of the job level because:
1. We'd still be able to get memory usage for jobs that crashed (and wouldn't
appear on the job history server).
2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
This new metric should be available both through the RM UI and RM Web
Services REST API.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2248) Capacity Scheduler changes for moving apps between queues


[ 
https://issues.apache.org/jira/browse/YARN-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088619#comment-14088619
 ] 

Subramaniam Venkatraman Krishnan commented on YARN-2248:


Thanks [~keyki]. I just added all your test cases and ran them  they do pass 
with my patch including the queue metrics test. The test cases are quite 
useful, thanks again.

 Capacity Scheduler changes for moving apps between queues
 -

 Key: YARN-2248
 URL: https://issues.apache.org/jira/browse/YARN-2248
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Janos Matyas
Assignee: Janos Matyas
 Fix For: 2.6.0

 Attachments: YARN-2248-1.patch, YARN-2248-2.patch, YARN-2248-3.patch


 We would like to have the capability (same as the Fair Scheduler has) to move 
 applications between queues. 
 We have made a baseline implementation and tests to start with - and we would 
 like the community to review, come up with suggestions and finally have this 
 contributed. 
 The current implementation is available for 2.4.1 - so the first thing is 
 that we'd need to identify the target version as there are differences 
 between 2.4.* and 3.* interfaces.
 The story behind is available at 
 http://blog.sequenceiq.com/blog/2014/07/02/move-applications-between-queues/ 
 and the baseline implementation and test at:
 https://github.com/sequenceiq/hadoop-common/blob/branch-2.4.1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/a/ExtendedCapacityScheduler.java#L924
 https://github.com/sequenceiq/hadoop-common/blob/branch-2.4.1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/a/TestExtendedCapacitySchedulerAppMove.java



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2359) Application hangs when it fails to launch AM container


[ 
https://issues.apache.org/jira/browse/YARN-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088634#comment-14088634
 ] 

Hudson commented on YARN-2359:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6025 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6025/])
YARN-2359. Application hangs when it fails to launch AM container. (Zhihai Xu 
via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616375)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java


 Application hangs when it fails to launch AM container 
 ---

 Key: YARN-2359
 URL: https://issues.apache.org/jira/browse/YARN-2359
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Fix For: 2.6.0

 Attachments: YARN-2359.000.patch, YARN-2359.001.patch, 
 YARN-2359.002.patch


 Application is hung without timeout and retry after DNS/network is down. 
 It is because right after the container is allocated for the AM, the 
 DNS/network is down for the node which has the AM container.
 The application attempt is at state RMAppAttemptState.SCHEDULED, it receive 
 RMAppAttemptEventType.CONTAINER_ALLOCATED event, because the 
 IllegalArgumentException(due to DNS error) happened, it stay at state 
 RMAppAttemptState.SCHEDULED. In the state machine, only two events will be 
 processed at this state:
 RMAppAttemptEventType.CONTAINER_ALLOCATED and RMAppAttemptEventType.KILL.
 The code didn't handle the event(RMAppAttemptEventType.CONTAINER_FINISHED) 
 which will be generated when the node and container timeout. So even the node 
 is removed, the Application is still hung in this state 
 RMAppAttemptState.SCHEDULED.
 The only way to make the application exit this state is to send 
 RMAppAttemptEventType.KILL event which will only be generated when you 
 manually kill the application from Job Client by forceKillApplication.
 To fix the issue, we should add an entry in the state machine table to handle 
 RMAppAttemptEventType.CONTAINER_FINISHED event at state 
 RMAppAttemptState.SCHEDULED
 add the following code in StateMachineFactory:
 {code}.addTransition(RMAppAttemptState.SCHEDULED, 
   RMAppAttemptState.FINAL_SAVING,
   RMAppAttemptEventType.CONTAINER_FINISHED,
   new FinalSavingTransition(
 new AMContainerCrashedBeforeRunningTransition(), 
 RMAppAttemptState.FAILED)){code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

[
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088646#comment-14088646
]

Hadoop QA commented on YARN-415:

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12660287/YARN-415.201408062232.txt
against trunk revision .

{color:red}-1 patch{color}. The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4542//console

This message is automatically generated.

Capture memory utilization at the app-level for chargeback
--

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically


[ 
https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088653#comment-14088653
 ] 

Hadoop QA commented on YARN-2212:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660260/YARN-2212.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4541//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4541//console

This message is automatically generated.

 ApplicationMaster needs to find a way to update the AMRMToken periodically
 --

 Key: YARN-2212
 URL: https://issues.apache.org/jira/browse/YARN-2212
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2212.1.patch, YARN-2212.2.patch, 
 YARN-2212.3.1.patch, YARN-2212.3.patch, YARN-2212.4.patch, YARN-2212.5.patch, 
 YARN-2212.5.patch, YARN-2212.5.rebase.patch, YARN-2212.6.patch, 
 YARN-2212.6.patch, YARN-2212.7.patch, YARN-2212.7.patch, YARN-2212.8.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2388) TestTimelineWebServices fails on trunk after HADOOP-10791


[ 
https://issues.apache.org/jira/browse/YARN-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088654#comment-14088654
 ] 

Zhijie Shen commented on YARN-2388:
---

[~xgong], thanks! I'll commit it later today if no more comments.

 TestTimelineWebServices fails on trunk after HADOOP-10791
 -

 Key: YARN-2388
 URL: https://issues.apache.org/jira/browse/YARN-2388
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2388.1.patch


 https://builds.apache.org/job/PreCommit-YARN-Build/4530//testReport/



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2288) Data persistent in timelinestore should be versioned


 [ 
https://issues.apache.org/jira/browse/YARN-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2288:
-

Attachment: YARN-2288-v3.patch

 Data persistent in timelinestore should be versioned
 

 Key: YARN-2288
 URL: https://issues.apache.org/jira/browse/YARN-2288
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.4.1
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2288-v2.patch, YARN-2288-v3.patch, YARN-2288.patch


 We have LevelDB-backed TimelineStore, it should have schema version for 
 changes in schema in future.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2288) Data persistent in timelinestore should be versioned


[ 
https://issues.apache.org/jira/browse/YARN-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088655#comment-14088655
 ] 

Junping Du commented on YARN-2288:
--

Thanks for review, [~zjshen]! Please see my reply below:
bq. but TS_STORE_VERSION_KEY is going to be a common constant across different 
impls. In addition, TS_STORE_VERSION_KEY - TIMELINE_STORE_VERSION_KEY?
Actually, I had a long discussion with Jason on YARN-2045 and both of us think 
we should keep API (include public constant) as simple as possible. This key 
will not be used outside of class or sub-classes, so there is no hard 
requirement to put it over its parent class (an interface actually), the only 
value to put this to parent class is one line code reuse but this is not 
necessary for some other sub-classes (i.e MemoryTimelineStore) and bring extra 
complexity to interface which is simple now. So I prefer it to stay at sub 
class until HBase implementation is there and we have strong feeling to share 
it across different impls. Thoughts? I will fix the name issue here.

bq. T - t?
Nice catch. Will fix it soon.

bq. Unnecessary change for -  @SuppressWarnings(resource)?
That just fix a Javac warning. Fix it in a separated patch sounds overkill, so 
include a fix here.

 Data persistent in timelinestore should be versioned
 

 Key: YARN-2288
 URL: https://issues.apache.org/jira/browse/YARN-2288
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.4.1
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2288-v2.patch, YARN-2288-v3.patch, YARN-2288.patch


 We have LevelDB-backed TimelineStore, it should have schema version for 
 changes in schema in future.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2288) Data persistent in timelinestore should be versioned


[ 
https://issues.apache.org/jira/browse/YARN-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088657#comment-14088657
 ] 

Junping Du commented on YARN-2288:
--

Address [~zjshen]'s comments in v3 patch.

 Data persistent in timelinestore should be versioned
 

 Key: YARN-2288
 URL: https://issues.apache.org/jira/browse/YARN-2288
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.4.1
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2288-v2.patch, YARN-2288-v3.patch, YARN-2288.patch


 We have LevelDB-backed TimelineStore, it should have schema version for 
 changes in schema in future.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2249) RM may receive container release request on AM resync before container is actually recovered

[
https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088663#comment-14088663
]

Hadoop QA commented on YARN-2249:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12660275/YARN-2249.1.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 2 new
or modified test files.

{color:red}-1 javac{color:red}. The patch appears to cause the build to
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4543//console

This message is automatically generated.

RM may receive container release request on AM resync before container is
actually recovered

Key: YARN-2249
URL: https://issues.apache.org/jira/browse/YARN-2249
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
Attachments: YARN-2249.1.patch

AM resync on RM restart will send outstanding container release requests back
to the new RM. In the meantime, NMs report the container statuses back to RM
to recover the containers. If RM receives the container release request
before the container is actually recovered in scheduler, the container won't
be released and the release request will be lost.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-1729) TimelineWebServices always passes primary and secondary filters as strings

2014-08-06 Thread Leitao Guo (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo reassigned YARN-1729:


Assignee: Leitao Guo  (was: Billie Rinaldi)

 TimelineWebServices always passes primary and secondary filters as strings
 --

 Key: YARN-1729
 URL: https://issues.apache.org/jira/browse/YARN-1729
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Leitao Guo
 Fix For: 2.4.0

 Attachments: YARN-1729.1.patch, YARN-1729.2.patch, YARN-1729.3.patch, 
 YARN-1729.4.patch, YARN-1729.5.patch, YARN-1729.6.patch, YARN-1729.7.patch


 Primary filters and secondary filter values can be arbitrary json-compatible 
 Object.  The web services should determine if the filters specified as query 
 parameters are objects or strings before passing them to the store.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2352) FairScheduler: Collect metrics on duration of critical methods that affect performance


[ 
https://issues.apache.org/jira/browse/YARN-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088745#comment-14088745
 ] 

Hadoop QA commented on YARN-2352:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660272/yarn-2352-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4544//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4544//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4544//console

This message is automatically generated.

 FairScheduler: Collect metrics on duration of critical methods that affect 
 performance
 --

 Key: YARN-2352
 URL: https://issues.apache.org/jira/browse/YARN-2352
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.4.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: fs-perf-metrics.png, yarn-2352-1.patch, 
 yarn-2352-2.patch, yarn-2352-2.patch, yarn-2352-3.patch


 We need more metrics for better visibility into FairScheduler performance. At 
 the least, we need to do this for (1) handle node events, (2) update, (3) 
 compute fairshares, (4) preemption.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2249) RM may receive container release request on AM resync before container is actually recovered


 [ 
https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2249:
--

Attachment: YARN-2249.1.patch

 RM may receive container release request on AM resync before container is 
 actually recovered
 

 Key: YARN-2249
 URL: https://issues.apache.org/jira/browse/YARN-2249
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2249.1.patch, YARN-2249.1.patch


 AM resync on RM restart will send outstanding container release requests back 
 to the new RM. In the meantime, NMs report the container statuses back to RM 
 to recover the containers. If RM receives the container release request  
 before the container is actually recovered in scheduler, the container won't 
 be released and the release request will be lost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2248) Capacity Scheduler changes for moving apps between queues

2014-08-06 Thread Krisztian Horvath (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088832#comment-14088832
 ] 

Krisztian Horvath commented on YARN-2248:
-

Is there a change we can get this committed in 2.6.0?

 Capacity Scheduler changes for moving apps between queues
 -

 Key: YARN-2248
 URL: https://issues.apache.org/jira/browse/YARN-2248
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Janos Matyas
Assignee: Janos Matyas
 Fix For: 2.6.0

 Attachments: YARN-2248-1.patch, YARN-2248-2.patch, YARN-2248-3.patch


 We would like to have the capability (same as the Fair Scheduler has) to move 
 applications between queues. 
 We have made a baseline implementation and tests to start with - and we would 
 like the community to review, come up with suggestions and finally have this 
 contributed. 
 The current implementation is available for 2.4.1 - so the first thing is 
 that we'd need to identify the target version as there are differences 
 between 2.4.* and 3.* interfaces.
 The story behind is available at 
 http://blog.sequenceiq.com/blog/2014/07/02/move-applications-between-queues/ 
 and the baseline implementation and test at:
 https://github.com/sequenceiq/hadoop-common/blob/branch-2.4.1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/a/ExtendedCapacityScheduler.java#L924
 https://github.com/sequenceiq/hadoop-common/blob/branch-2.4.1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/a/TestExtendedCapacitySchedulerAppMove.java



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2288) Data persistent in timelinestore should be versioned