date:20140429


[ 
https://issues.apache.org/jira/browse/YARN-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984432#comment-13984432
 ] 

Hadoop QA commented on YARN-1996:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12642433/YARN-1996.v01.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3654//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3654//console

This message is automatically generated.

 Provide alternative policies for UNHEALTHY nodes.
 -

 Key: YARN-1996
 URL: https://issues.apache.org/jira/browse/YARN-1996
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, scheduler
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: YARN-1996.v01.patch


 Currently, UNHEALTHY nodes can significantly prolong execution of large 
 expensive jobs as demonstrated by MAPREDUCE-5817, and downgrade the cluster 
 health even further due to [positive 
 feedback|http://en.wikipedia.org/wiki/Positive_feedback]. A container set 
 that might have deemed the node unhealthy in the first place starts spreading 
 across the cluster because the current node is declared unusable and all its 
 containers are killed and rescheduled on different nodes.
 To mitigate this, we experiment with a patch that allows containers already 
 running on a node turning UNHEALTHY to complete (drain) whereas no new 
 container can be assigned to it until it turns healthy again.
 This mechanism can also be used for graceful decommissioning of NM. To this 
 end, we have to write a health script  such that it can deterministically 
 report UNHEALTHY. For example with 
 {code}
 if [ -e $1 ] ; then   
  
   echo ERROR Node decommmissioning via health script hack 
  
 fi 
 {code}
 In the current version patch, the behavior is controlled by a boolean 
 property {{yarn.nodemanager.unheathy.drain.containers}}. More versatile 
 policies are possible in the future work. Currently, the health state of a 
 node is binary determined based on the disk checker and the health script 
 ERROR outputs. However, we can as well interpret health script output similar 
 to java logging levels (one of which is ERROR) such as WARN, FATAL. Each 
 level can then be treated differently. E.g.,
 - FATAL:  unusable like today 
 - ERROR: drain
 - WARN: halve the node capacity.
 complimented with some equivalence rules such as 3 WARN messages == ERROR,  
 2*ERROR == FATAL, etc. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1987) Wrapper for leveldb DBIterator to aid in handling database exceptions


[ 
https://issues.apache.org/jira/browse/YARN-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984490#comment-13984490
 ] 

Hadoop QA commented on YARN-1987:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12642479/YARN-1987.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3657//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3657//console

This message is automatically generated.

 Wrapper for leveldb DBIterator to aid in handling database exceptions
 -

 Key: YARN-1987
 URL: https://issues.apache.org/jira/browse/YARN-1987
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1987.patch


 Per discussions in YARN-1984 and MAPREDUCE-5652, it would be nice to have a 
 utility wrapper around leveldb's DBIterator to translate the raw 
 RuntimeExceptions it can throw into DBExceptions to make it easier to handle 
 database errors while iterating.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1912) ResourceLocalizer started without any jvm memory control


[ 
https://issues.apache.org/jira/browse/YARN-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984485#comment-13984485
 ] 

Hadoop QA commented on YARN-1912:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12642478/YARN-1912-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3656//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3656//console

This message is automatically generated.

 ResourceLocalizer started without any jvm memory control
 

 Key: YARN-1912
 URL: https://issues.apache.org/jira/browse/YARN-1912
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: stanley shi
 Attachments: YARN-1912-0.patch, YARN-1912-1.patch


 In the LinuxContainerExecutor.java#startLocalizer, it does not specify any 
 -Xmx configurations in the command, this caused the ResourceLocalizer to be 
 started with default memory setting.
 In an server-level hardware, it will use 25% of the system memory as the max 
 heap size, this will cause memory issue in some cases.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-738) TestClientRMTokens is failing irregularly while running all yarn tests

2014-04-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984587#comment-13984587
 ] 

Hudson commented on YARN-738:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #5584 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5584/])
YARN-738. TestClientRMTokens is failing irregularly while running all yarn 
tests. Contributed by Ming Ma (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1591030)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMTokens.java


 TestClientRMTokens is failing irregularly while running all yarn tests
 --

 Key: YARN-738
 URL: https://issues.apache.org/jira/browse/YARN-738
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Ming Ma
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-738.patch


 Running org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens
 Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 16.787 sec 
  FAILURE!
 testShortCircuitRenewCancel(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
   Time elapsed: 186 sec   ERROR!
 java.lang.RuntimeException: getProxy
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens$YarnBadRPC.getProxy(TestClientRMTokens.java:334)
   at 
 org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:157)
   at 
 org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.renew(RMDelegationTokenIdentifier.java:102)
   at org.apache.hadoop.security.token.Token.renew(Token.java:372)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:306)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancel(TestClientRMTokens.java:240)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
   at 
 org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
   at 
 org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1996) Provide alternative policies for UNHEALTHY nodes.

2014-04-29 Thread Gera Shegalov (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984664#comment-13984664
]

Gera Shegalov commented on YARN-1996:
-

[~ste...@apache.org] thanks for pointing out the JIRA about decommissioning.
I'll link them to this JIRA. The main point of this JIRA is to gracefully deal
with the UNHEALTHY state determined by the health script.

Provide alternative policies for UNHEALTHY nodes.
-

echo ERROR Node decommmissioning via health script hack

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1362) Distinguish between nodemanager shutdown for decommission vs shutdown for restart

2014-04-29 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1362:
-

Attachment: YARN-1362.patch

Small patch that enhances the NM context that provides get/set for a decomm 
flag.  This allows code to query whether the NM has been told to decommission 
and act accordingly during shutdown.

 Distinguish between nodemanager shutdown for decommission vs shutdown for 
 restart
 -

 Key: YARN-1362
 URL: https://issues.apache.org/jira/browse/YARN-1362
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
 Attachments: YARN-1362.patch


 When a nodemanager shuts down it needs to determine if it is likely to be 
 restarted.  If a restart is likely then it needs to preserve container 
 directories, logs, distributed cache entries, etc.  If it is being shutdown 
 more permanently (e.g.: like a decommission) then the nodemanager should 
 cleanup directories and logs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1929) DeadLock in RM when automatic failover is enabled.

2014-04-29 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984699#comment-13984699
 ] 

Vinod Kumar Vavilapalli commented on YARN-1929:
---

Seems 'fine' to me. It is one of those 
fine-for-now-but-not-sure-if-anything-else-is-broken.

OTOH, we aren't getting rid of the remaining locking in CompositeService. 
Something that we should fix separately. Don't want this patch to blow up more.

The test looks fine except for the 1second sleep. I can see that causing issues 
on VMs but let's see.

Checking this in.

 DeadLock in RM when automatic failover is enabled.
 --

 Key: YARN-1929
 URL: https://issues.apache.org/jira/browse/YARN-1929
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
 Environment: Yarn HA cluster
Reporter: Rohith
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1929-1.patch, yarn-1929-2.patch


 Dead lock detected  in RM when automatic failover is enabled.
 {noformat}
 Found one Java-level deadlock:
 =
 Thread-2:
   waiting to lock monitor 0x7fb514303cf0 (object 0xef153fd0, a 
 org.apache.hadoop.ha.ActiveStandbyElector),
   which is held by main-EventThread
 main-EventThread:
   waiting to lock monitor 0x7fb514750a48 (object 0xef154020, a 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService),
   which is held by Thread-2
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1612) Change Fair Scheduler to not disable delay scheduling by default

2014-04-29 Thread Chen He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-1612:
--

Attachment: YARN-1612-v2.patch

 Change Fair Scheduler to not disable delay scheduling by default
 

 Key: YARN-1612
 URL: https://issues.apache.org/jira/browse/YARN-1612
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Chen He
 Attachments: YARN-1612-v2.patch, YARN-1612.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1885) RM may not send the finished signal to some nodes where the application ran after RM restarts


[ 
https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984722#comment-13984722
 ] 

Jian He commented on YARN-1885:
---

Thanks for the update!
- some places exceed the 80 column limit, like the RMAppImpl transitions.
- app.isAppFinalStateStored() better use isAppInFinalState instead ?
- sleeping for a fixed amount time is not deterministic, test may fail 
randomly. it’s better doing it in a while loop with heartbeats, and exit out of 
the loop if condition meets.
{code}
// sleep for a while before do next heartbeat
Thread.sleep(1000);
NodeHeartbeatResponse response = nm1.nodeHeartbeat(true);
{code}
- timeout = 60, timeout too long.
- these two transitions cannot happen? Generally, we should not add events to 
states where the transitions can never happen, that’ll hide bugs.
{code}
.addTransition(RMAppState.NEW, RMAppState.NEW, RMAppEventType.NODE_ADDED,
new NodeAddedTransition())
.addTransition(RMAppState.NEW_SAVING, RMAppState.NEW_SAVING, 
RMAppEventType.NODE_ADDED,
new NodeAddedTransition())
{code}
- These two loops may block the register RPC call for a while, I think we may 
send them as the payload of RMNodeStartEvent and handle them in 
RMNodeAddTransition ?
{code}
// Handle container statuses reported by NM
if (!request.getContainerStatuses().isEmpty()) {
  LOG.info(received container statuses on node manager register :
  + request.getContainerStatuses());
  for (ContainerStatus containerStatus : request.getContainerStatuses()) {
handleContainerStatus(containerStatus);
  }
}

// Handle running applications reported by NM
if (null != request.getRunningApplications()) {
  for (ApplicationId appId : request.getRunningApplications()) {
handleRunningAppOnNode(appId, request.getNodeId());
  }
}
{code}

 RM may not send the finished signal to some nodes where the application ran 
 after RM restarts
 -

 Key: YARN-1885
 URL: https://issues.apache.org/jira/browse/YARN-1885
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Wangda Tan
 Attachments: YARN-1885.patch, YARN-1885.patch, YARN-1885.patch


 During our HA testing we have seen cases where yarn application logs are not 
 available through the cli but i can look at AM logs through the UI. RM was 
 also being restarted in the background as the application was running.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1929) DeadLock in RM when automatic failover is enabled.

2014-04-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984843#comment-13984843
 ] 

Hudson commented on YARN-1929:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5585 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5585/])
YARN-1929. Fixed a deadlock in ResourceManager that occurs when failover 
happens right at the time of shutdown. Contributed by Karthik Kambatla. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1591071)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/service/CompositeService.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java


 DeadLock in RM when automatic failover is enabled.
 --

 Key: YARN-1929
 URL: https://issues.apache.org/jira/browse/YARN-1929
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
 Environment: Yarn HA cluster
Reporter: Rohith
Assignee: Karthik Kambatla
Priority: Blocker
 Fix For: 2.4.1

 Attachments: yarn-1929-1.patch, yarn-1929-2.patch


 Dead lock detected  in RM when automatic failover is enabled.
 {noformat}
 Found one Java-level deadlock:
 =
 Thread-2:
   waiting to lock monitor 0x7fb514303cf0 (object 0xef153fd0, a 
 org.apache.hadoop.ha.ActiveStandbyElector),
   which is held by main-EventThread
 main-EventThread:
   waiting to lock monitor 0x7fb514750a48 (object 0xef154020, a 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService),
   which is held by Thread-2
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1362) Distinguish between nodemanager shutdown for decommission vs shutdown for restart


[ 
https://issues.apache.org/jira/browse/YARN-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984884#comment-13984884
 ] 

Hadoop QA commented on YARN-1362:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12642514/YARN-1362.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3659//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3659//console

This message is automatically generated.

 Distinguish between nodemanager shutdown for decommission vs shutdown for 
 restart
 -

 Key: YARN-1362
 URL: https://issues.apache.org/jira/browse/YARN-1362
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1362.patch


 When a nodemanager shuts down it needs to determine if it is likely to be 
 restarted.  If a restart is likely then it needs to preserve container 
 directories, logs, distributed cache entries, etc.  If it is being shutdown 
 more permanently (e.g.: like a decommission) then the nodemanager should 
 cleanup directories and logs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1612) Change Fair Scheduler to not disable delay scheduling by default


[ 
https://issues.apache.org/jira/browse/YARN-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984896#comment-13984896
 ] 

Hadoop QA commented on YARN-1612:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12642521/YARN-1612-v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3658//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3658//console

This message is automatically generated.

 Change Fair Scheduler to not disable delay scheduling by default
 

 Key: YARN-1612
 URL: https://issues.apache.org/jira/browse/YARN-1612
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Chen He
 Attachments: YARN-1612-v2.patch, YARN-1612.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1885) RM may not send the finished signal to some nodes where the application ran after RM restarts

2014-04-29 Thread Wangda Tan (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984945#comment-13984945
]

Wangda Tan commented on YARN-1885:
--

[~jianhe], Thanks for your review!
bq. some places exceed the 80 column limit, like the RMAppImpl transitions.
Will correct this later
bq. app.isAppFinalStateStored() better use isAppInFinalState instead ?
Agree, it's a bug using isAppFinalStateStored()
bq. sleeping for a fixed amount time is not deterministic, test may fail
randomly. it’s better doing it in a while loop with heartbeats, and exit out of
the loop if condition meets.
Agree
bq. timeout = 60, timeout too long.
Sorry for this typo :)
bq. these two transitions cannot happen? Generally, we should not add events to
states where the transitions can never happen, that’ll hide bugs.
Agree, and I think SUBMITTED is also cannot happen, because an app with
SUBMITTED state doesn't launch any container, so NMs will not have the app in
runningApplication list. Do you agree?
bq. These two loops may block the register RPC call for a while, I think we may
send them as the payload of RMNodeStartEvent and handle them in
RMNodeAddTransition ?
IMO, this shouldn't be a big problem, because there's no blocking calls existed
in handleRunningAppOnNode/handleContainerStatus. So additional microseconds of
latency (just loop array) should be fine. Is it?
Attached new patch.

RM may not send the finished signal to some nodes where the application ran
after RM restarts
-

Key: YARN-1885
URL: https://issues.apache.org/jira/browse/YARN-1885
Project: Hadoop YARN
Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Wangda Tan
Attachments: YARN-1885.patch, YARN-1885.patch, YARN-1885.patch

During our HA testing we have seen cases where yarn application logs are not
available through the cli but i can look at AM logs through the UI. RM was
also being restarted in the background as the application was running.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1885) RM may not send the finished signal to some nodes where the application ran after RM restarts

2014-04-29 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1885:
-

Attachment: YARN-1885.patch

 RM may not send the finished signal to some nodes where the application ran 
 after RM restarts
 -

 Key: YARN-1885
 URL: https://issues.apache.org/jira/browse/YARN-1885
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Wangda Tan
 Attachments: YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, 
 YARN-1885.patch


 During our HA testing we have seen cases where yarn application logs are not 
 available through the cli but i can look at AM logs through the UI. RM was 
 also being restarted in the background as the application was running.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1696) Document RM HA

2014-04-29 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1696:
--

Attachment: YARN-1696.6.patch

Same patch as before but with a few edits to make it better.

Will check this in once Jenkins says okay.

 Document RM HA
 --

 Key: YARN-1696
 URL: https://issues.apache.org/jira/browse/YARN-1696
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Tsuyoshi OZAWA
Priority: Blocker
 Attachments: YARN-1676.5.patch, YARN-1696-3.patch, YARN-1696.2.patch, 
 YARN-1696.4.patch, YARN-1696.6.patch, rm-ha-overview.png, rm-ha-overview.svg, 
 yarn-1696-1.patch


 Add documentation for RM HA. Marking this a blocker for 2.4 as this is 
 required to call RM HA Stable and ready for public consumption. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1696) Document RM HA


[ 
https://issues.apache.org/jira/browse/YARN-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985067#comment-13985067
 ] 

Hadoop QA commented on YARN-1696:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12642567/YARN-1696.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3660//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3660//console

This message is automatically generated.

 Document RM HA
 --

 Key: YARN-1696
 URL: https://issues.apache.org/jira/browse/YARN-1696
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Tsuyoshi OZAWA
Priority: Blocker
 Attachments: YARN-1676.5.patch, YARN-1696-3.patch, YARN-1696.2.patch, 
 YARN-1696.4.patch, YARN-1696.6.patch, rm-ha-overview.png, rm-ha-overview.svg, 
 yarn-1696-1.patch


 Add documentation for RM HA. Marking this a blocker for 2.4 as this is 
 required to call RM HA Stable and ready for public consumption. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1998) Change the time zone on the Yarn UI to the local time zone

2014-04-29 Thread Fengdong Yu (JIRA)

Fengdong Yu created YARN-1998:
-

 Summary: Change the time zone on the Yarn UI to the local time zone
 Key: YARN-1998
 URL: https://issues.apache.org/jira/browse/YARN-1998
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor


It shows GMT time zone for 'startTime' and 'finishTime' on the RM web UI, we 
should show the local time zone.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1998) Change the time zone on the Yarn UI to the local time zone

2014-04-29 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated YARN-1998:
--

Attachment: YARN-1998.patch

 Change the time zone on the Yarn UI to the local time zone
 --

 Key: YARN-1998
 URL: https://issues.apache.org/jira/browse/YARN-1998
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Attachments: YARN-1998.patch


 It shows GMT time zone for 'startTime' and 'finishTime' on the RM web UI, we 
 should show the local time zone.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1998) Change the time zone on the RM web UI to the local time zone

2014-04-29 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated YARN-1998:
--

Summary: Change the time zone on the RM web UI to the local time zone  
(was: Change the time zone on the Yarn UI to the local time zone)

 Change the time zone on the RM web UI to the local time zone
 

 Key: YARN-1998
 URL: https://issues.apache.org/jira/browse/YARN-1998
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Attachments: YARN-1998.patch


 It shows GMT time zone for 'startTime' and 'finishTime' on the RM web UI, we 
 should show the local time zone.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1999) Move HistoryServerRest.apt.vm into the Mapreduce section


 [ 
https://issues.apache.org/jira/browse/YARN-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated YARN-1999:
---

Affects Version/s: 2.4.0

 Move HistoryServerRest.apt.vm into the Mapreduce section
 

 Key: YARN-1999
 URL: https://issues.apache.org/jira/browse/YARN-1999
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.4.0
Reporter: Ravi Prakash

 Now that we have the YARN HistoryServer, perhaps we should move 
 HistoryServerRest.apt.vm into the MapReduce section where it really belongs?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1999) Move HistoryServerRest.apt.vm into the Mapreduce section


 [ 
https://issues.apache.org/jira/browse/YARN-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated YARN-1999:
---

Component/s: documentation

 Move HistoryServerRest.apt.vm into the Mapreduce section
 

 Key: YARN-1999
 URL: https://issues.apache.org/jira/browse/YARN-1999
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.4.0
Reporter: Ravi Prakash

 Now that we have the YARN HistoryServer, perhaps we should move 
 HistoryServerRest.apt.vm into the MapReduce section where it really belongs?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1999) Move HistoryServerRest.apt.vm into the Mapreduce section


 [ 
https://issues.apache.org/jira/browse/YARN-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated YARN-1999:
---

Target Version/s: 2.5.0

 Move HistoryServerRest.apt.vm into the Mapreduce section
 

 Key: YARN-1999
 URL: https://issues.apache.org/jira/browse/YARN-1999
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.4.0
Reporter: Ravi Prakash

 Now that we have the YARN HistoryServer, perhaps we should move 
 HistoryServerRest.apt.vm into the MapReduce section where it really belongs?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1999) Move HistoryServerRest.apt.vm into the Mapreduce section

Ravi Prakash created YARN-1999:
--

 Summary: Move HistoryServerRest.apt.vm into the Mapreduce section
 Key: YARN-1999
 URL: https://issues.apache.org/jira/browse/YARN-1999
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ravi Prakash


Now that we have the YARN HistoryServer, perhaps we should move 
HistoryServerRest.apt.vm into the MapReduce section where it really belongs?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1885) RM may not send the finished signal to some nodes where the application ran after RM restarts


[ 
https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985083#comment-13985083
 ] 

Hadoop QA commented on YARN-1885:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12642551/YARN-1885.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3661//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3661//console

This message is automatically generated.

 RM may not send the finished signal to some nodes where the application ran 
 after RM restarts
 -

 Key: YARN-1885
 URL: https://issues.apache.org/jira/browse/YARN-1885
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Wangda Tan
 Attachments: YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, 
 YARN-1885.patch


 During our HA testing we have seen cases where yarn application logs are not 
 available through the cli but i can look at AM logs through the UI. RM was 
 also being restarted in the background as the application was running.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1998) Change the time zone on the RM web UI to the local time zone


[ 
https://issues.apache.org/jira/browse/YARN-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985105#comment-13985105
 ] 

Hadoop QA commented on YARN-1998:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12642581/YARN-1998.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3662//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3662//console

This message is automatically generated.

 Change the time zone on the RM web UI to the local time zone
 

 Key: YARN-1998
 URL: https://issues.apache.org/jira/browse/YARN-1998
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Attachments: YARN-1998.patch


 It shows GMT time zone for 'startTime' and 'finishTime' on the RM web UI, we 
 should show the local time zone.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2001) Persist NMs info for RM restart

Jian He created YARN-2001:
-

 Summary: Persist NMs info for RM restart
 Key: YARN-2001
 URL: https://issues.apache.org/jira/browse/YARN-2001
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He


RM should not accept allocate requests from AMs until all the NMs have 
registered with RM. For that, RM needs to remember the previous NMs and wait 
for all the NMs to register.
This is also useful for remembering decommissioned nodes across restarts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-556) RM Restart phase 2 - Work preserving restart


[ 
https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985148#comment-13985148
 ] 

Jian He commented on YARN-556:
--

Hi Anubhav,
Looked at the prototype patch. Regarding the approach, it’s better to have a 
scheduler-agnostic recovery mechanism with no or minimum  scheduler-specific 
changes, instead of implementing each scheduler specifically. YARN-1368 can be 
renamed to accommodate  the necessary common changes for all schedulers.Also, 
adding cluster timestamp to the container Id doesn’t  seem right and that’ll 
also break compatibility. 


 RM Restart phase 2 - Work preserving restart
 

 Key: YARN-556
 URL: https://issues.apache.org/jira/browse/YARN-556
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: Work Preserving RM Restart.pdf, 
 WorkPreservingRestartPrototype.001.patch


 YARN-128 covered storing the state needed for the RM to recover critical 
 information. This umbrella jira will track changes needed to recover the 
 running state of the cluster so that work can be preserved across RM restarts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1368) Common work to re-populate containers’ state into scheduler


 [ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1368:
--

Summary: Common work to re-populate containers’ state into scheduler  (was: 
RM should populate running container allocation information from NM resync)

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Anubhav Dhoot

 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler


[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985151#comment-13985151
 ] 

Jian He commented on YARN-1368:
---

Hi [~adhoot], mind if I take this over ? I have a preliminary patch which does 
the bulk of the work. I can upload very soon. thx

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Anubhav Dhoot

 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue

2014-04-29 Thread Sunil G (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985160#comment-13985160
]

Sunil G commented on YARN-1963:
---

We have done few analysis and implemented support for application priority.
I wish to share the thoughts here, kindly check the same.

Design thoughts:
1. Configuration Part
We planned to use some existing priority configuration as given below. These
are used to set a Job priority.
a. JobConf.getJobPriority() and Job.setPriority(JobPriority priority)
b. We can also use configuration mapreduce.job.priority.

The values for priority can be VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW

2. Scheduler Side
If the Capacity Scheduler queue has multiple applications(Jobs) to run with
different priorities, CS will allocate containers for the highest priority
application and then for next priority and so on.
When multiple queues are configured with different capacities, this priority
will work internal to the each queue.

For this, we planned to add a priority comparison check in the below data
structure.
ComparatorFiCaSchedulerApp applicationComparator

We added a priority check here in compare() of applicationComparator while
selecting applications. Updated design here will be like,
1. Check for priority first. If there, return highest priority job.
2. Continue existing logic such as App ID comparison and TimeStamp
comparison.

With these changes, we can make highest priority job will get preference in a
queue.

NB: In addition to this, we added a preemption module also to get High priority
jobs resources fast by preempting lower priority ones.

I wish to upload a patch if this approach is fine.

Support priorities across applications within the same queue
-

Key: YARN-1963
URL: https://issues.apache.org/jira/browse/YARN-1963
Project: Hadoop YARN
Issue Type: New Feature
Components: api, resourcemanager
Reporter: Arun C Murthy
Assignee: Arun C Murthy

It will be very useful to support priorities among applications within the
same queue, particularly in production scenarios. It allows for finer-grained
controls without having to force admins to create a multitude of queues, plus
allows existing applications to continue using existing queues which are
usually part of institutional memory.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler