[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can fail to shutdown

2016-01-05 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082641#comment-15082641
 ] 

zhihai xu commented on YARN-3697:
-

[~djp], yes, I just committed it to branch-2.6. thanks

> FairScheduler: ContinuousSchedulingThread can fail to shutdown
> --
>
> Key: YARN-3697
> URL: https://issues.apache.org/jira/browse/YARN-3697
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.7.2, 2.6.4
>
> Attachments: YARN-3697.000.patch, YARN-3697.001.patch
>
>
> FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
> sometimes. 
> The reason is because the InterruptedException is blocked in 
> continuousSchedulingAttempt
> {code}
>   try {
> if (node != null && Resources.fitsIn(minimumAllocation,
> node.getAvailableResource())) {
>   attemptScheduling(node);
> }
>   } catch (Throwable ex) {
> LOG.error("Error while attempting scheduling for node " + node +
> ": " + ex.toString(), ex);
>   }
> {code}
> I saw the following exception after stop:
> {code}
> 2015-05-17 23:30:43,065 WARN  [FairSchedulerContinuousScheduling] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285)
> 2015-05-17 23:30:43,066 ERROR [FairSchedulerContinuousScheduling] 
> fair.FairScheduler (FairScheduler.java:continuousSchedulingAttempt(1017)) - 
> Error while attempting scheduling for node host: 127.0.0.2:2 #containers=1 
> available= used=: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.InterruptedException
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.InterruptedException
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:249)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
>   at 
> 

[jira] [Updated] (YARN-3697) FairScheduler: ContinuousSchedulingThread can fail to shutdown

2016-01-05 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3697:

Fix Version/s: 2.6.4

> FairScheduler: ContinuousSchedulingThread can fail to shutdown
> --
>
> Key: YARN-3697
> URL: https://issues.apache.org/jira/browse/YARN-3697
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.7.2, 2.6.4
>
> Attachments: YARN-3697.000.patch, YARN-3697.001.patch
>
>
> FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
> sometimes. 
> The reason is because the InterruptedException is blocked in 
> continuousSchedulingAttempt
> {code}
>   try {
> if (node != null && Resources.fitsIn(minimumAllocation,
> node.getAvailableResource())) {
>   attemptScheduling(node);
> }
>   } catch (Throwable ex) {
> LOG.error("Error while attempting scheduling for node " + node +
> ": " + ex.toString(), ex);
>   }
> {code}
> I saw the following exception after stop:
> {code}
> 2015-05-17 23:30:43,065 WARN  [FairSchedulerContinuousScheduling] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285)
> 2015-05-17 23:30:43,066 ERROR [FairSchedulerContinuousScheduling] 
> fair.FairScheduler (FairScheduler.java:continuousSchedulingAttempt(1017)) - 
> Error while attempting scheduling for node host: 127.0.0.2:2 #containers=1 
> available= used=: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.InterruptedException
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.InterruptedException
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:249)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
>   at 
> 

[jira] [Updated] (YARN-4538) QueueMetrics pending cores and memory metrics wrong

2016-01-05 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4538:
---
Priority: Major  (was: Critical)

> QueueMetrics pending  cores and memory metrics wrong
> 
>
> Key: YARN-4538
> URL: https://issues.apache.org/jira/browse/YARN-4538
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> Submit 2 application to default queue 
> Check queue metrics for pending cores and memory
> {noformat}
> List allQueues = client.getChildQueueInfos("root");
> for (QueueInfo queueInfo : allQueues) {
>   QueueStatistics quastats = queueInfo.getQueueStatistics();
>   System.out.println(quastats.getPendingVCores());
>   System.out.println(quastats.getPendingMemoryMB());
> }
> {noformat}
> *Output :*
> -20
> -20480



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3446) FairScheduler HeadRoom calculation should exclude nodes in the blacklist.

2016-01-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082790#comment-15082790
 ] 

Hadoop QA commented on YARN-3446:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
2s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
33s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 40s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 31s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 142m 46s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12780487/YARN-3446.004.patch |
| JIRA Issue | YARN-3446 |
| 

[jira] [Created] (YARN-4543) TestNodeStatusUpdater.testStopReentrant fails + JUnit misusage

2016-01-05 Thread Akihiro Suda (JIRA)
Akihiro Suda created YARN-4543:
--

 Summary: TestNodeStatusUpdater.testStopReentrant fails + JUnit 
misusage
 Key: YARN-4543
 URL: https://issues.apache.org/jira/browse/YARN-4543
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Akihiro Suda
Priority: Minor


{panel}
TestNodeStatusUpdater.testStopReentrant:1269 expected:<0> but was:<1>
{panel}

https://github.com/apache/hadoop/blob/4ac6799d4a8b071e0d367c2d709e84d8ea06942d/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java#L1269

The corresponding  JUnit assertion code is: 
{panel}
 Assert.assertEquals(numCleanups.get(), 1);
{panel}

It seems that the 1st arg and the 2nd one should be swapped.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can fail to shutdown

2016-01-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082661#comment-15082661
 ] 

Hudson commented on YARN-3697:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9049 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9049/])
update CHANGES.txt to add YARN-3697 to 2.6.4 release (zxu: rev 
96d8f1d6f30299831dde3985a6a3bd08cb36fd38)
* hadoop-yarn-project/CHANGES.txt


> FairScheduler: ContinuousSchedulingThread can fail to shutdown
> --
>
> Key: YARN-3697
> URL: https://issues.apache.org/jira/browse/YARN-3697
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.7.2, 2.6.4
>
> Attachments: YARN-3697.000.patch, YARN-3697.001.patch
>
>
> FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
> sometimes. 
> The reason is because the InterruptedException is blocked in 
> continuousSchedulingAttempt
> {code}
>   try {
> if (node != null && Resources.fitsIn(minimumAllocation,
> node.getAvailableResource())) {
>   attemptScheduling(node);
> }
>   } catch (Throwable ex) {
> LOG.error("Error while attempting scheduling for node " + node +
> ": " + ex.toString(), ex);
>   }
> {code}
> I saw the following exception after stop:
> {code}
> 2015-05-17 23:30:43,065 WARN  [FairSchedulerContinuousScheduling] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285)
> 2015-05-17 23:30:43,066 ERROR [FairSchedulerContinuousScheduling] 
> fair.FairScheduler (FairScheduler.java:continuousSchedulingAttempt(1017)) - 
> Error while attempting scheduling for node host: 127.0.0.2:2 #containers=1 
> available= used=: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.InterruptedException
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.InterruptedException
>   at 
> 

[jira] [Updated] (YARN-4544) All the log messages about rolling monitoring interval are shown with the WARN level

2016-01-05 Thread Takashi Ohnishi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takashi Ohnishi updated YARN-4544:
--
Attachment: YARN-4544.patch

Attached a patch.

> All the log messages about rolling monitoring interval are shown with the 
> WARN level
> 
>
> Key: YARN-4544
> URL: https://issues.apache.org/jira/browse/YARN-4544
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.7.1
>Reporter: Takashi Ohnishi
> Attachments: YARN-4544.patch
>
>
> About yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds, 
> there are three log messages corresponding to the value set to this 
> parameter, but all of them are shown with the WARN level.
> (a) disabled (default)
> {code}
> 2016-01-05 22:19:29,062 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggreg 
> ation.AppLogAggregatorImpl: rollingMonitorInterval is set as -1. The log 
> rolling mornitoring inte rval is disabled. The logs will be aggregated 
> after this application is finished.
> {code}
> (b) enabled
> {code}
> 2016-01-06 00:41:15,808 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
>  rollingMonitorInterval is set as 7200. The logs will be aggregated every 
> 7200 seconds
> {code}
> (c) enabled but wrong configuration
> {code}
> 2016-01-06 00:39:50,820 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
>  rollingMonitorIntervall should be more than or equal to 3600 seconds. Using 
> 3600 seconds instead.
> {code}
> I think it is better to output with WARN only in case (c), but it is ok to 
> output with INFO in case (a) and (b).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4537) Pull out priority comparison from fifocomparator and use compound comparator for FifoOrdering policy

2016-01-05 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083308#comment-15083308
 ] 

Sunil G commented on YARN-4537:
---

Thanks Rohith. Yes, Its fine.

> Pull out priority comparison from fifocomparator and use compound comparator 
> for FifoOrdering policy
> 
>
> Key: YARN-4537
> URL: https://issues.apache.org/jira/browse/YARN-4537
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4537.patch, 0002-YARN-4537.patch, 
> 0003-YARN-4537.patch, 0003-YARN-4537.patch
>
>
> Currently, priority comparison is integrated with FifoComparator. There 
> should be a separate comparator defined for priority comparison so that down 
> the line if any new ordering policy wants to integrate priority, they can use 
> compound comparator where priority will be high preference. 
> The following changes are expected to be done as part of this JIRA
> # Pull out priority comparison from FifoComparator
> # Define new priority comparator
> # Use compound comparator for FifoOrderingPolicy. Oder of preference is   
> Priority,FIFO



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4544) All the log messages about rolling monitoring interval are shown with the WARN level

2016-01-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083327#comment-15083327
 ] 

Hadoop QA commented on YARN-4544:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
55s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
59s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 27s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 58s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
16s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 33m 10s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12780568/YARN-4544.patch |
| JIRA Issue | YARN-4544 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux a3e3cbeec60d 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (YARN-4537) Pull out priority comparison from fifocomparator and use compound comparator for FifoOrdering policy

2016-01-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083378#comment-15083378
 ] 

Hadoop QA commented on YARN-4537:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 53s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 15s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 36s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
20s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 6m 22s 
{color} | {color:red} branch/hadoop-yarn-project/hadoop-yarn no findbugs output 
file (hadoop-yarn-project/hadoop-yarn/target/findbugsXml.xml) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 56s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 20s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
2s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 49s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 49s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 9s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 9s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 27s 
{color} | {color:red} Patch generated 1 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 9, now 9). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 6m 14s 
{color} | {color:red} patch/hadoop-yarn-project/hadoop-yarn no findbugs output 
file (hadoop-yarn-project/hadoop-yarn/target/findbugsXml.xml) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 41s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 22s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 82m 31s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 213m 41s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | 

[jira] [Commented] (YARN-4425) Pluggable sharing policy for Partition Node Label resources

2016-01-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083404#comment-15083404
 ] 

Hadoop QA commented on YARN-4425:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
26s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 18s 
{color} | {color:red} Patch generated 21 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 429, now 443). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 9 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 103 line(s) with tabs. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 18s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 introduced 3 new FindBugs issues. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 18s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 2m 28s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91
 with JDK v1.7.0_91 generated 12 new issues (was 2, now 14). {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 6s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 7s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | 

[jira] [Updated] (YARN-4537) Pull out priority comparison from fifocomparator and use compound comparator for FifoOrdering policy

2016-01-05 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4537:

Attachment: 0003-YARN-4537.patch

[~sunilg] I updated patch avoiding expression evaluation which can be 
performance improved.

> Pull out priority comparison from fifocomparator and use compound comparator 
> for FifoOrdering policy
> 
>
> Key: YARN-4537
> URL: https://issues.apache.org/jira/browse/YARN-4537
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4537.patch, 0002-YARN-4537.patch, 
> 0003-YARN-4537.patch, 0003-YARN-4537.patch
>
>
> Currently, priority comparison is integrated with FifoComparator. There 
> should be a separate comparator defined for priority comparison so that down 
> the line if any new ordering policy wants to integrate priority, they can use 
> compound comparator where priority will be high preference. 
> The following changes are expected to be done as part of this JIRA
> # Pull out priority comparison from FifoComparator
> # Define new priority comparator
> # Use compound comparator for FifoOrderingPolicy. Oder of preference is   
> Priority,FIFO



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list

2016-01-05 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-4311:
--
Attachment: YARN-4311-v4.patch

Thanks [~templedf]. Attaching an updated patch. I removed the null check on 
entry set after some thought on how the entries are updated for an iterator. 
Made the suggested changes as well.

> Removing nodes from include and exclude lists will not remove them from 
> decommissioned nodes list
> -
>
> Key: YARN-4311
> URL: https://issues.apache.org/jira/browse/YARN-4311
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-4311-v1.patch, YARN-4311-v2.patch, 
> YARN-4311-v3.patch, YARN-4311-v4.patch
>
>
> In order to fully forget about a node, removing the node from include and 
> exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The 
> tricky part that [~jlowe] pointed out was the case when include lists are not 
> used, in that case we don't want the nodes to fall off if they are not active.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications

2016-01-05 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083283#comment-15083283
 ] 

Sunil G commented on YARN-4479:
---

Patch looks good [~rohithsharma].

> Retrospect app-priority in pendingOrderingPolicy during recovering 
> applications
> ---
>
> Key: YARN-4479
> URL: https://issues.apache.org/jira/browse/YARN-4479
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, 
> 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch, 
> 0005-YARN-4479.patch
>
>
> Currently, same ordering policy is used for pending applications and active 
> applications. When priority is configured for an applications, during 
> recovery high priority application get activated first. It is possible that 
> low priority job was submitted and running state. 
> This causes low priority job in starvation after recovery



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4537) Pull out priority comparison from fifocomparator and use compound comparator for FifoOrdering policy

2016-01-05 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083305#comment-15083305
 ] 

Rohith Sharma K S commented on YARN-4537:
-

One small update I have done and updated it.

> Pull out priority comparison from fifocomparator and use compound comparator 
> for FifoOrdering policy
> 
>
> Key: YARN-4537
> URL: https://issues.apache.org/jira/browse/YARN-4537
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4537.patch, 0002-YARN-4537.patch, 
> 0003-YARN-4537.patch, 0003-YARN-4537.patch
>
>
> Currently, priority comparison is integrated with FifoComparator. There 
> should be a separate comparator defined for priority comparison so that down 
> the line if any new ordering policy wants to integrate priority, they can use 
> compound comparator where priority will be high preference. 
> The following changes are expected to be done as part of this JIRA
> # Pull out priority comparison from FifoComparator
> # Define new priority comparator
> # Use compound comparator for FifoOrderingPolicy. Oder of preference is   
> Priority,FIFO



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Support fetching entities by UID and change the REST interface to conform to current REST APIs' in YARN

2016-01-05 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083381#comment-15083381
 ] 

Varun Saxena commented on YARN-4224:


bq. One quick thing is about TimelineReaderContext. This class appears to have 
a lot of duplicated logic as TimelineCollectorContext, and serves as a similar 
role. Do we want to reorganize them or consolidate them?
I see. For TimelineReaderContext we would need entity type and entity id in 
addition to fields in TimelineCollectorContext. Maybe we can rename 
TimelineCollectorContext to TimelineContext class. And extend this class as 
TimelineReaderContext containing the 2 extra fields.

bq. In the original design, /flows will return a list of flows with latest 
activities on the cluster. Shall we keep this endpoint (as I noticed, /flows 
endpoint is not touched in this patch), or attach it to somewhere else in our 
hierarchy? 
Didn't quite get your question.
/flows endpoint is for querying active flows with default cluster id. And 
/clusters/\{clusterid\}/flows/ endpoint is for querying flows for specific 
cluster id.
Didn't keep a UID related endpoint for flows because cluster id by itself is a 
UID.

> Support fetching entities by UID and change the REST interface to conform to 
> current REST APIs' in YARN
> ---
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch, 
> YARN-4224-feature-YARN-2928.wip.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4537) Pull out priority comparison from fifocomparator and use compound comparator for FifoOrdering policy

2016-01-05 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083309#comment-15083309
 ] 

Sunil G commented on YARN-4537:
---

Thanks Rohith. Yes, Its fine.

> Pull out priority comparison from fifocomparator and use compound comparator 
> for FifoOrdering policy
> 
>
> Key: YARN-4537
> URL: https://issues.apache.org/jira/browse/YARN-4537
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4537.patch, 0002-YARN-4537.patch, 
> 0003-YARN-4537.patch, 0003-YARN-4537.patch
>
>
> Currently, priority comparison is integrated with FifoComparator. There 
> should be a separate comparator defined for priority comparison so that down 
> the line if any new ordering policy wants to integrate priority, they can use 
> compound comparator where priority will be high preference. 
> The following changes are expected to be done as part of this JIRA
> # Pull out priority comparison from FifoComparator
> # Define new priority comparator
> # Use compound comparator for FifoOrderingPolicy. Oder of preference is   
> Priority,FIFO



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3692) Allow REST API to set a user generated message when killing an application

2016-01-05 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-3692:

Description: 
Currently YARN's REST API supports killing an application without setting a 
diagnostic message. It would be good to provide that support.

*Use Case*
Usually this helps in workflow management in a multi-tenant environment when 
the workflow scheduler (or the hadoop admin) wants to kill a job - and let the 
user know the reason why the job was killed. Killing the job by setting a 
diagnostic message is a very good solution for that. Ideally, we can set the 
diagnostic message on all such interface:
yarn kill -applicationId ... -diagnosticMessage "some message added by 
admin/workflow"
REST API { 'state': 'KILLED', 'diagnosticMessage': 'some message added by 
admin/workflow'}

  was:Currently YARN's REST API supports killing an application without setting 
a diagnostic message. It would be good to provide that support.


> Allow REST API to set a user generated message when killing an application
> --
>
> Key: YARN-3692
> URL: https://issues.apache.org/jira/browse/YARN-3692
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Rajat Jain
>Assignee: Rohith Sharma K S
>
> Currently YARN's REST API supports killing an application without setting a 
> diagnostic message. It would be good to provide that support.
> *Use Case*
> Usually this helps in workflow management in a multi-tenant environment when 
> the workflow scheduler (or the hadoop admin) wants to kill a job - and let 
> the user know the reason why the job was killed. Killing the job by setting a 
> diagnostic message is a very good solution for that. Ideally, we can set the 
> diagnostic message on all such interface:
> yarn kill -applicationId ... -diagnosticMessage "some message added by 
> admin/workflow"
> REST API { 'state': 'KILLED', 'diagnosticMessage': 'some message added by 
> admin/workflow'}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching

2016-01-05 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083163#comment-15083163
 ] 

Junping Du commented on YARN-4265:
--

Thanks [~gtCarrera9] for reply and updating the patch. 
I am still reviewing the new patch but have some quick response below:
bq.  Plugins are provided by third-party applications such as Tez. Right now we 
cannot assume which exact entity group plugin the user is using, therefore we 
have to conservatively leave this config as empty. 
Thanks for clarification on this. Can we provide a default one (could be 
simple) for built-in YARN applications, i.e. distributed shell? This can be 
helpful to demonstrate how to use it for applications other than Tez. Or other 
apps have to refer Tez code on how to use it. IMO, this is not a very good 
practice for YARN.

bq. we're using fine-grained locking for each cached item in the reader cache, 
and cache refresh only happens infrequently (~10 secs by default), so maybe 
we'd like to stabilize the whole synchronization story before fine tune this 
part?
Sounds like a plan. Looks like we still need to make effort on checking the 
whole synchronization mechanism. Isn't it?

> Provide new timeline plugin storage to support fine-grained entity caching
> --
>
> Key: YARN-4265
> URL: https://issues.apache.org/jira/browse/YARN-4265
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4265-trunk.001.patch, YARN-4265-trunk.002.patch, 
> YARN-4265.YARN-4234.001.patch, YARN-4265.YARN-4234.002.patch
>
>
> To support the newly proposed APIs in YARN-4234, we need to create a new 
> plugin timeline store. The store may have similar behavior as the 
> EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id 
> granularity, instead of application id granularity. Let's have this storage 
> as a standalone one, instead of updating EntityFileTimelineStore, to keep the 
> existing store (EntityFileTimelineStore) stable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-05 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083223#comment-15083223
 ] 

Karthik Kambatla edited comment on YARN-1011 at 1/5/16 3:28 PM:


We would run an opportunistic container on a node only if the actual 
utilization is less than the allocation by a margin bigger than the allocation 
of said opportunistic container. We reactively preempt the opportunistic 
container if the actual utilization goes over a threshold. To address spikes in 
usage where our reactive measures are too slow to kick in, we run the 
opportunistic containers at a strictly lower priority. 

bq. the app got opportunistic containers and their perf wasnt the same as 
normal containers - so it ran slower. 
As soon as we realize the perf is slower because the node has higher usage than 
we had anticipated, we preempt the container and retry allocation (guaranteed 
or opportunistic depending on the new cluster state). So, it shouldn't run 
slower for longer than our monitoring interval. Is this assumption okay? 

bq. However, things get complicated because a node with an opportunistic 
container may continue to run its normal containers while space frees up for 
guaranteed capacity on other nodes.
The opportunistic container will continue to run on this node so long as it is 
getting the resources it needs. If there is any sort of resource contention, it 
is preempted and is up for allocation on one of the free nodes. 

bq. This would require that the system upgrade opportunistic containers in the 
same order as it would allocate containers.
bq. IMO, the NM cannot make a local choice about upgrading its opportunistic 
containers because this is effectively a resource allocation decision and only 
the RM has the info to do that.
The RM schedules the next highest priority "task" for which it couldn't find a 
guaranteed container as an opportunistic container. This task continues to run 
as long as it is not getting enough resources. If there is no resource 
contention, the task continues to run. If guaranteed resources free up on the 
node it is running, isn't it fair to promote the container to Guaranteed. After 
all, if the resources unused were not hidden behind other containers' 
allocation and actually available as guaranteed capacity on that node 
initially, the RM would just have scheduled a guaranteed container in the first 
place.

I should probably clarify that the proposal here targets those cases where 
users' estimates are significantly off reality and there are enough free 
resources per node to run additional task(s) without causing any resource 
contention. Even though this is the norm, we want to guard against spikes in 
usage to avoid perf regressions. In practice, I expect admins to come up with a 
reasonable threshold for over-subscription: e.g. 0.8 - we use only 
oversubscribe upto 80% of capacity advertised through 
{{yarn.nodemanger.resource.*}}. Thinking more about this, this threshold should 
have an upper limit - 0.95? 



was (Author: kasha):
We would run an opportunistic container on a node only if the actual 
utilization is less than the allocation by a margin bigger than the allocation 
of said opportunistic container. We reactively preempt the opportunistic 
container if the actual utilization goes over a threshold. To address spikes in 
usage where our reactive measures are too slow to kick in, we run the 
opportunistic containers at a strictly lower priority. 

bq. the app got opportunistic containers and their perf wasnt the same as 
normal containers - so it ran slower. 
As soon as we realize the perf is slower because the node has higher usage than 
we had anticipated, we preempt the container and retry allocation (guaranteed 
or opportunistic depending on the new cluster state). So, it shouldn't run 
slower for longer than our monitoring interval. Is this assumption okay? 

bq. However, things get complicated because a node with an opportunistic 
container may continue to run its normal containers while space frees up for 
guaranteed capacity on other nodes.
The opportunistic container will continue to run on this node so long as it is 
getting the resources it needs. If there is any sort of resource contention, it 
is preempted and is up for allocation on one of the free nodes. 

bq. This would require that the system upgrade opportunistic containers in the 
same order as it would allocate containers.
bq. IMO, the NM cannot make a local choice about upgrading its opportunistic 
containers because this is effectively a resource allocation decision and only 
the RM has the info to do that.
The RM schedules the next highest priority "task" for which it couldn't find a 
guaranteed container as an opportunistic container. This task continues to run 
as long as it is not getting enough resources. If there is no resource 
contention, the task continues to run. If 

[jira] [Updated] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications

2016-01-05 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4479:

Attachment: 0005-YARN-4479.patch

> Retrospect app-priority in pendingOrderingPolicy during recovering 
> applications
> ---
>
> Key: YARN-4479
> URL: https://issues.apache.org/jira/browse/YARN-4479
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, 
> 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch, 
> 0005-YARN-4479.patch
>
>
> Currently, same ordering policy is used for pending applications and active 
> applications. When priority is configured for an applications, during 
> recovery high priority application get activated first. It is possible that 
> low priority job was submitted and running state. 
> This causes low priority job in starvation after recovery



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4425) Pluggable sharing policy for Partition Node Label resources

2016-01-05 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4425:

Attachment: YARN-4425.20160105-1.patch
ResourceSharingPolicyForNodeLabelsPartitions-V2.pdf

Hi [~wangda], i am uploading the initial POC patch, to s hare the approach 
which i have in mind, please take a look at it and we can discuss further.
If required i can add more testcases and few more configurations to show the 
the full capability of the interface exposed.
 

> Pluggable sharing policy for Partition Node Label resources
> ---
>
> Key: YARN-4425
> URL: https://issues.apache.org/jira/browse/YARN-4425
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: ResourceSharingPolicyForNodeLabelsPartitions-V1.pdf, 
> ResourceSharingPolicyForNodeLabelsPartitions-V2.pdf, 
> YARN-4425.20160105-1.patch
>
>
> As part of support for sharing NonExclusive Node Label partitions in 
> YARN-3214, NonExclusive partitions are shared only to Default Partitions and 
> also have fixed rule when apps in default partitions makes use of resources 
> of any NonExclusive partitions.
> There are many scenarios where in we require pluggable policy like 
> MutliTenant, Hierarchical etc.. where in each partition can determine when 
> they want to share the resources to other paritions and when other partitions 
> wants to use resources from others
> More details in the attached document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4544) All the log messages about rolling monitoring interval are shown with the WARN level

2016-01-05 Thread Takashi Ohnishi (JIRA)
Takashi Ohnishi created YARN-4544:
-

 Summary: All the log messages about rolling monitoring interval 
are shown with the WARN level
 Key: YARN-4544
 URL: https://issues.apache.org/jira/browse/YARN-4544
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: log-aggregation, nodemanager
Affects Versions: 2.7.1
Reporter: Takashi Ohnishi


About yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds, there 
are three log messages corresponding to the value set to this parameter, but 
all of them are shown with the WARN level.

(a) disabled (default)
{code}
2016-01-05 22:19:29,062 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggreg 
ation.AppLogAggregatorImpl: rollingMonitorInterval is set as -1. The log 
rolling mornitoring inte rval is disabled. The logs will be aggregated 
after this application is finished.
{code}

(b) enabled
{code}
2016-01-06 00:41:15,808 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
 rollingMonitorInterval is set as 7200. The logs will be aggregated every 7200 
seconds
{code}

(c) enabled but wrong configuration
{code}
2016-01-06 00:39:50,820 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
 rollingMonitorIntervall should be more than or equal to 3600 seconds. Using 
3600 seconds instead.
{code}

I think it is better to output with WARN only in case (c), but it is ok to 
output with INFO in case (a) and (b).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4537) Pull out priority comparison from fifocomparator and use compound comparator for FifoOrdering policy

2016-01-05 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083287#comment-15083287
 ] 

Sunil G commented on YARN-4537:
---

+1. Looks good.

> Pull out priority comparison from fifocomparator and use compound comparator 
> for FifoOrdering policy
> 
>
> Key: YARN-4537
> URL: https://issues.apache.org/jira/browse/YARN-4537
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4537.patch, 0002-YARN-4537.patch, 
> 0003-YARN-4537.patch
>
>
> Currently, priority comparison is integrated with FifoComparator. There 
> should be a separate comparator defined for priority comparison so that down 
> the line if any new ordering policy wants to integrate priority, they can use 
> compound comparator where priority will be high preference. 
> The following changes are expected to be done as part of this JIRA
> # Pull out priority comparison from FifoComparator
> # Define new priority comparator
> # Use compound comparator for FifoOrderingPolicy. Oder of preference is   
> Priority,FIFO



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4425) Pluggable sharing policy for Partition Node Label resources

2016-01-05 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083167#comment-15083167
 ] 

Naganarasimha G R commented on YARN-4425:
-

Further i tried to consider your comment ??I can understand why we need two 
separate policies, maybe we can consider to merge two policies to one: 
currently exclusivity of node label is just a hint, sharing policy could decide 
if and how much resources could be shared from each partition??,
Hope i got the comment right ?
 

> Pluggable sharing policy for Partition Node Label resources
> ---
>
> Key: YARN-4425
> URL: https://issues.apache.org/jira/browse/YARN-4425
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: ResourceSharingPolicyForNodeLabelsPartitions-V1.pdf, 
> ResourceSharingPolicyForNodeLabelsPartitions-V2.pdf, 
> YARN-4425.20160105-1.patch
>
>
> As part of support for sharing NonExclusive Node Label partitions in 
> YARN-3214, NonExclusive partitions are shared only to Default Partitions and 
> also have fixed rule when apps in default partitions makes use of resources 
> of any NonExclusive partitions.
> There are many scenarios where in we require pluggable policy like 
> MutliTenant, Hierarchical etc.. where in each partition can determine when 
> they want to share the resources to other paritions and when other partitions 
> wants to use resources from others
> More details in the attached document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-05 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083223#comment-15083223
 ] 

Karthik Kambatla commented on YARN-1011:


We would run an opportunistic container on a node only if the actual 
utilization is less than the allocation by a margin bigger than the allocation 
of said opportunistic container. We reactively preempt the opportunistic 
container if the actual utilization goes over a threshold. To address spikes in 
usage where our reactive measures are too slow to kick in, we run the 
opportunistic containers at a strictly lower priority. 

bq. the app got opportunistic containers and their perf wasnt the same as 
normal containers - so it ran slower. 
As soon as we realize the perf is slower because the node has higher usage than 
we had anticipated, we preempt the container and retry allocation (guaranteed 
or opportunistic depending on the new cluster state). So, it shouldn't run 
slower for longer than our monitoring interval. Is this assumption okay? 

bq. However, things get complicated because a node with an opportunistic 
container may continue to run its normal containers while space frees up for 
guaranteed capacity on other nodes.
The opportunistic container will continue to run on this node so long as it is 
getting the resources it needs. If there is any sort of resource contention, it 
is preempted and is up for allocation on one of the free nodes. 

bq. This would require that the system upgrade opportunistic containers in the 
same order as it would allocate containers.
bq. IMO, the NM cannot make a local choice about upgrading its opportunistic 
containers because this is effectively a resource allocation decision and only 
the RM has the info to do that.
The RM schedules the next highest priority "task" for which it couldn't find a 
guaranteed container as an opportunistic container. This task continues to run 
as long as it is not getting enough resources. If there is no resource 
contention, the task continues to run. If guaranteed resources free up on the 
node it is running, isn't it fair to promote the container to Guaranteed. After 
all, if the resources unused were not hidden behind other containers' 
allocation and actually available as guaranteed capacity on that node 
initially, the RM would just have scheduled a guaranteed container in the first 
place.

I should probably clarify that the proposal here targets those cases where 
users' estimates are significantly off reality and there are enough free 
resources per node to run additional task(s) without causing any resource 
contention. Even though this is the norm, we want to guard against spikes in 
usage to avoid perf regressions. In practice, I expect admins to come up with a 
reasonable threshold for over-subscription: e.g. 0.8 - we use only 
oversubscribe upto 80% of capacity advertised through 
{{yarn.nodemanger.resource.*}}


> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-05 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083226#comment-15083226
 ] 

Karthik Kambatla commented on YARN-1011:


Since the plan is develop this on a branch, we could get started on some of the 
sub-tasks and adjust things as the design evolves. I am creating YARN-1011 
branch to track this work. 

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications

2016-01-05 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083248#comment-15083248
 ] 

Rohith Sharma K S commented on YARN-4479:
-

updated 0005-YARN-4479 patch , kindly review it

> Retrospect app-priority in pendingOrderingPolicy during recovering 
> applications
> ---
>
> Key: YARN-4479
> URL: https://issues.apache.org/jira/browse/YARN-4479
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, 
> 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch, 
> 0005-YARN-4479.patch
>
>
> Currently, same ordering policy is used for pending applications and active 
> applications. When priority is configured for an applications, during 
> recovery high priority application get activated first. It is possible that 
> low priority job was submitted and running state. 
> This causes low priority job in starvation after recovery



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

2016-01-05 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083538#comment-15083538
 ] 

Wangda Tan commented on YARN-4304:
--

Hi Sunil,

Thanks for update, I can still see {{"maxAMLimitPercentage"}} in parent queue, 
could you double check it?

> AM max resource configuration per partition to be displayed/updated correctly 
> in UI and in various partition related metrics
> 
>
> Key: YARN-4304
> URL: https://issues.apache.org/jira/browse/YARN-4304
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, 
> 0003-YARN-4304.patch, 0004-YARN-4304.patch, 0005-YARN-4304.patch, 
> 0005-YARN-4304.patch, 0006-YARN-4304.patch, 0007-YARN-4304.patch, 
> 0008-YARN-4304.patch, 0009-YARN-4304.patch, REST_and_UI.zip
>
>
> As we are supporting per-partition level max AM resource percentage 
> configuration, UI and various metrics also need to display correct 
> configurations related to same. 
> For eg: Current UI still shows am-resource percentage per queue level. This 
> is to be updated correctly when label config is used.
> - Display max-am-percentage per-partition in Scheduler UI (label also) and in 
> ClusterMetrics page
> - Update queue/partition related metrics w.r.t per-partition 
> am-resource-percentage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Support fetching entities by UID and change the REST interface to conform to current REST APIs' in YARN

2016-01-05 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083473#comment-15083473
 ] 

Li Lu commented on YARN-4224:
-

Thanks Varun. 
bq. /flows endpoint is for querying active flows with default cluster id. And 
/clusters/{clusterid}/flows/ endpoint is for querying flows for specific 
cluster id.
OK, once such endpoints are intact I'm fine. BTW, is it a good idea to rename 
/flows to something like /activeFlows to avoid clash (and distinguish with 
other endpoints end with flow)? 

> Support fetching entities by UID and change the REST interface to conform to 
> current REST APIs' in YARN
> ---
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch, 
> YARN-4224-feature-YARN-2928.wip.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

2016-01-05 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4304:
--
Attachment: 0009-YARN-4304.patch

Thank You [~leftnoteasy]
Uploading a new patch addressing the comments.

Also attached the REST o/p with labels after making the changes based on review 
comments. Kindly help to check the same.

> AM max resource configuration per partition to be displayed/updated correctly 
> in UI and in various partition related metrics
> 
>
> Key: YARN-4304
> URL: https://issues.apache.org/jira/browse/YARN-4304
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, 
> 0003-YARN-4304.patch, 0004-YARN-4304.patch, 0005-YARN-4304.patch, 
> 0005-YARN-4304.patch, 0006-YARN-4304.patch, 0007-YARN-4304.patch, 
> 0008-YARN-4304.patch, 0009-YARN-4304.patch, REST_and_UI.zip
>
>
> As we are supporting per-partition level max AM resource percentage 
> configuration, UI and various metrics also need to display correct 
> configurations related to same. 
> For eg: Current UI still shows am-resource percentage per queue level. This 
> is to be updated correctly when label config is used.
> - Display max-am-percentage per-partition in Scheduler UI (label also) and in 
> ClusterMetrics page
> - Update queue/partition related metrics w.r.t per-partition 
> am-resource-percentage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching

2016-01-05 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083480#comment-15083480
 ] 

Li Lu commented on YARN-4265:
-

bq. Can we provide a default one (could be simple) for built-in YARN 
applications, i.e. distributed shell?
We can certainly do that, given that the new APIs are in and we've got this 
patch. Since this is a separate work than introducing the whole new storage (as 
in this JIRA), maybe we can address this as a new JIRA? I can start off doing 
that right away. 

bq.  Looks like we still need to make effort on checking the whole 
synchronization mechanism. Isn't it?
I checked all synchronization logic a few times before I submit the original 
patch. We're using fine-grained locking in this patch, so yes it's more error 
pruning. I'll check again later on and fix any problem I discovered. Feel free 
to raise up any concerns in your review. Thanks! 

> Provide new timeline plugin storage to support fine-grained entity caching
> --
>
> Key: YARN-4265
> URL: https://issues.apache.org/jira/browse/YARN-4265
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4265-trunk.001.patch, YARN-4265-trunk.002.patch, 
> YARN-4265.YARN-4234.001.patch, YARN-4265.YARN-4234.002.patch
>
>
> To support the newly proposed APIs in YARN-4234, we need to create a new 
> plugin timeline store. The store may have similar behavior as the 
> EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id 
> granularity, instead of application id granularity. Let's have this storage 
> as a standalone one, instead of updating EntityFileTimelineStore, to keep the 
> existing store (EntityFileTimelineStore) stable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

2016-01-05 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4304:
--
Attachment: (was: REST_and_UI.zip)

> AM max resource configuration per partition to be displayed/updated correctly 
> in UI and in various partition related metrics
> 
>
> Key: YARN-4304
> URL: https://issues.apache.org/jira/browse/YARN-4304
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, 
> 0003-YARN-4304.patch, 0004-YARN-4304.patch, 0005-YARN-4304.patch, 
> 0005-YARN-4304.patch, 0006-YARN-4304.patch, 0007-YARN-4304.patch, 
> 0008-YARN-4304.patch
>
>
> As we are supporting per-partition level max AM resource percentage 
> configuration, UI and various metrics also need to display correct 
> configurations related to same. 
> For eg: Current UI still shows am-resource percentage per queue level. This 
> is to be updated correctly when label config is used.
> - Display max-am-percentage per-partition in Scheduler UI (label also) and in 
> ClusterMetrics page
> - Update queue/partition related metrics w.r.t per-partition 
> am-resource-percentage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

2016-01-05 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4304:
--
Attachment: REST_and_UI.zip

Uploading screen shots and REST o/p as per latest change

> AM max resource configuration per partition to be displayed/updated correctly 
> in UI and in various partition related metrics
> 
>
> Key: YARN-4304
> URL: https://issues.apache.org/jira/browse/YARN-4304
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, 
> 0003-YARN-4304.patch, 0004-YARN-4304.patch, 0005-YARN-4304.patch, 
> 0005-YARN-4304.patch, 0006-YARN-4304.patch, 0007-YARN-4304.patch, 
> 0008-YARN-4304.patch, REST_and_UI.zip
>
>
> As we are supporting per-partition level max AM resource percentage 
> configuration, UI and various metrics also need to display correct 
> configurations related to same. 
> For eg: Current UI still shows am-resource percentage per queue level. This 
> is to be updated correctly when label config is used.
> - Display max-am-percentage per-partition in Scheduler UI (label also) and in 
> ClusterMetrics page
> - Update queue/partition related metrics w.r.t per-partition 
> am-resource-percentage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4545) Allow YARN distributed shell to use ATS v1.5 APIs

2016-01-05 Thread Li Lu (JIRA)
Li Lu created YARN-4545:
---

 Summary: Allow YARN distributed shell to use ATS v1.5 APIs
 Key: YARN-4545
 URL: https://issues.apache.org/jira/browse/YARN-4545
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu


We can use YARN distributed shell as a demo for the ATS v1.5 APIs. We need to 
allow distributed shell post data with ATS v1.5 API if 1.5 is enabled in the 
system. We also need to provide a sample plugin to read those data out. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications

2016-01-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083611#comment-15083611
 ] 

Hadoop QA commented on YARN-4479:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
49s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 37s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 6m 18s 
{color} | {color:red} branch/hadoop-yarn-project/hadoop-yarn no findbugs output 
file (hadoop-yarn-project/hadoop-yarn/target/findbugsXml.xml) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 31s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
6s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 49s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 49s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 6m 21s 
{color} | {color:red} patch/hadoop-yarn-project/hadoop-yarn no findbugs output 
file (hadoop-yarn-project/hadoop-yarn/target/findbugsXml.xml) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 50s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 14s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 83m 27s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 84m 51s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
24s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 219m 58s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_91 Failed 

[jira] [Commented] (YARN-4537) Pull out priority comparison from fifocomparator and use compound comparator for FifoOrdering policy

2016-01-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083669#comment-15083669
 ] 

Hadoop QA commented on YARN-4537:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 32s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 37s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
34s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 8s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
22s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 7m 58s 
{color} | {color:red} branch/hadoop-yarn-project/hadoop-yarn no findbugs output 
file (hadoop-yarn-project/hadoop-yarn/target/findbugsXml.xml) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 17s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 0s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 37s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 33s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
34s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 9s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 7m 43s 
{color} | {color:red} patch/hadoop-yarn-project/hadoop-yarn no findbugs output 
file (hadoop-yarn-project/hadoop-yarn/target/findbugsXml.xml) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 30s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 2s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 2s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 78m 40s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
20s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 216m 58s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || 

[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

2016-01-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083758#comment-15083758
 ] 

Hadoop QA commented on YARN-4304:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
0s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
33s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 17s 
{color} | {color:red} Patch generated 22 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 251, now 261). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 35s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 27s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 141m 37s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesForCSWithPartitions |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesForCSWithPartitions |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
\\
\\
|| 

[jira] [Commented] (YARN-3089) LinuxContainerExecutor does not handle file arguments to deleteAsUser

2016-01-05 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083077#comment-15083077
 ] 

Jason Lowe commented on YARN-3089:
--

As I mentioned in my last comment above, this shouldn't be needed in 2.6.  The 
problem is triggered by log rotation aggregation support which is only in 2.7.  
It shouldn't hurt to have it in 2.6, but it's not necessary either.

> LinuxContainerExecutor does not handle file arguments to deleteAsUser
> -
>
> Key: YARN-3089
> URL: https://issues.apache.org/jira/browse/YARN-3089
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Eric Payne
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: YARN-3089.v1.txt, YARN-3089.v2.txt, YARN-3089.v3.txt
>
>
> YARN-2468 added the deletion of individual logs that are aggregated, but this 
> fails to delete log files when the LCE is being used.  The LCE native 
> executable assumes the paths being passed are paths and the delete fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-05 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083683#comment-15083683
 ] 

Bikas Saha commented on YARN-1011:
--

Good points but let me play the devils advocate to get some more clarity :)
bq. As soon as we realize the perf is slower because the node has higher usage 
than we had anticipated, we preempt the container and retry allocation 
(guaranteed or opportunistic depending on the new cluster state). So, it 
shouldn't run slower for longer than our monitoring interval. Is this 
assumption okay?
How do we determine that the perf is slower? The CPU would never exceed 100% 
even under over-allocation. Is preempting always necessary? If we are sure that 
the OS is going to starve the opportunistic containers, then can assume that 
when the node is fully utilized, then only our guaranteed containers are using 
resources? So we can let the opportunistic containers be so that they can start 
soaking up excess capacity after the normal containers have stopped spiking. 
Perhaps some experiments will shed some light on this.

bq. The opportunistic container will continue to run on this node so long as it 
is getting the resources it needs. If there is any sort of resource contention, 
it is preempted and is up for allocation on one of the free nodes.
Lets say job capacity is 1 container and the job asks for 2. Its get 1 normal 
container and 1 opportunistic container. Now it releases its 1 normal 
container. At this point what happens to the opportunistic container. It is 
clearly running at lower priority on the node and as such we are not giving the 
job its guaranteed capacity. The question is not about finding an optimal 
solution for this problem (and there may not be one). The issue here is to 
crisply define the semantics around scheduling in the design. Whatever the 
semantics are, we should clearly know what they are. IMO, the exact semantics 
of scheduling should be in the docs.

bq. The RM schedules the next highest priority "task" for which it couldn't 
find a guaranteed container as an opportunistic container. This task continues 
to run as long as it is not getting enough resources. If there is no resource 
contention, the task continues to run. If guaranteed resources free up on the 
node it is running, isn't it fair to promote the container to Guaranteed.
Sure. And thats why the system should upgrade opportunistic containers in the 
order in which they were allocated. However, the decision must be made at the 
RM and not the NM since the NMs dont know about total capacity and multiple NMs 
locally upgrading their opportunistic containers might end up over-allocating 
for a job. Further, the queue sharing state may have changed since the 
opportunistic allocation, and hence assuming that the opportunistic container 
"would have" gotten that allocation anyways, at a later point in time, may not 
be valid.

In summary, what we need in the document is a clear definition of the 
scheduling policy around this - whatever that policy may be.


> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3934) Application with large ApplicationSubmissionContext can cause RM to exit when ZK store is used

2016-01-05 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote reassigned YARN-3934:
-

Assignee: Dustin Cote

> Application with large ApplicationSubmissionContext can cause RM to exit when 
> ZK store is used
> --
>
> Key: YARN-3934
> URL: https://issues.apache.org/jira/browse/YARN-3934
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Dustin Cote
>
> Use the following steps to test.
> 1. Set up ZK as the RM HA store.
> 2. Submit a job that refers to lots of distributed cache files with long HDFS 
> path, which will cause the app state size to exceed ZK's max object size 
> limit.
> 3. RM can't write to ZK and exit with the following exception.
> {noformat}
> 2015-07-10 22:21:13,002 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
> org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause:
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
> at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:944)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:941)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1083)
> {noformat}
> In this case, RM could have rejected the app during submitApplication RPC if 
> the size of ApplicationSubmissionContext is too large.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4539) CommonNodeLabelsManager throw NullPointerException when the fairScheduler init failed

2016-01-05 Thread tangshangwen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tangshangwen resolved YARN-4539.

Resolution: Fixed

> CommonNodeLabelsManager throw NullPointerException when the fairScheduler 
> init failed
> -
>
> Key: YARN-4539
> URL: https://issues.apache.org/jira/browse/YARN-4539
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: tangshangwen
>Assignee: tangshangwen
> Attachments: YARN-4539.1.patch
>
>
> When the scheduler initialization failed and RM stop compositeService cause 
> the CommonNodeLabelsManager throw NullPointerException.
> {noformat}
> 2016-01-04 22:19:52,190 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
> failed in state INITED; cause: java.io.IOException: Failed to initialize 
> FairScheduler
> java.io.IOException: Failed to initialize FairScheduler
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1377)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1394)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> 
> 2016-01-04 22:19:52,193 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager 
> metrics system...
> 2016-01-04 22:19:52,194 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics 
> system stopped.
> 2016-01-04 22:19:52,194 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics 
> system shutdown complete.
> 2016-01-04 22:19:52,194 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
> AsyncDispatcher is draining to stop, igonring any new events.
> 2016-01-04 22:19:52,194 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in 
> state STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.stopDispatcher(CommonNodeLabelsManager.java:251)
>   at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:257)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4537) Pull out priority comparison from fifocomparator and use compound comparator for FifoOrdering policy

2016-01-05 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15084943#comment-15084943
 ] 

Naganarasimha G R commented on YARN-4537:
-

+1 LGTM

> Pull out priority comparison from fifocomparator and use compound comparator 
> for FifoOrdering policy
> 
>
> Key: YARN-4537
> URL: https://issues.apache.org/jira/browse/YARN-4537
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4537.patch, 0002-YARN-4537.patch, 
> 0003-YARN-4537.patch, 0003-YARN-4537.patch
>
>
> Currently, priority comparison is integrated with FifoComparator. There 
> should be a separate comparator defined for priority comparison so that down 
> the line if any new ordering policy wants to integrate priority, they can use 
> compound comparator where priority will be high preference. 
> The following changes are expected to be done as part of this JIRA
> # Pull out priority comparison from FifoComparator
> # Define new priority comparator
> # Use compound comparator for FifoOrderingPolicy. Oder of preference is   
> Priority,FIFO



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications

2016-01-05 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4479:

Attachment: 0006-YARN-4479.patch

Updating the patch fixing test failure for TestApplicationLimits.
Other test failures are unrelated the patch
Findbugs warning : -1 is due to file not found. I think it is unrelated the 
patch

> Retrospect app-priority in pendingOrderingPolicy during recovering 
> applications
> ---
>
> Key: YARN-4479
> URL: https://issues.apache.org/jira/browse/YARN-4479
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, 
> 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch, 
> 0005-YARN-4479.patch, 0006-YARN-4479.patch
>
>
> Currently, same ordering policy is used for pending applications and active 
> applications. When priority is configured for an applications, during 
> recovery high priority application get activated first. It is possible that 
> low priority job was submitted and running state. 
> This causes low priority job in starvation after recovery



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4538) QueueMetrics pending cores and memory metrics wrong

2016-01-05 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085011#comment-15085011
 ] 

Bibin A Chundatt commented on YARN-4538:


Test failures are not related to patch attached.
All CI failures are tracked  already as part of YARN-4478

> QueueMetrics pending  cores and memory metrics wrong
> 
>
> Key: YARN-4538
> URL: https://issues.apache.org/jira/browse/YARN-4538
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4538.patch
>
>
> Submit 2 application to default queue 
> Check queue metrics for pending cores and memory
> {noformat}
> List allQueues = client.getChildQueueInfos("root");
> for (QueueInfo queueInfo : allQueues) {
>   QueueStatistics quastats = queueInfo.getQueueStatistics();
>   System.out.println(quastats.getPendingVCores());
>   System.out.println(quastats.getPendingMemoryMB());
> }
> {noformat}
> *Output :*
> -20
> -20480



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications

2016-01-05 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15084608#comment-15084608
 ] 

Jian He commented on YARN-4479:
---

+1, [~rohithsharma], could you see if the warnings are related

> Retrospect app-priority in pendingOrderingPolicy during recovering 
> applications
> ---
>
> Key: YARN-4479
> URL: https://issues.apache.org/jira/browse/YARN-4479
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, 
> 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch, 
> 0005-YARN-4479.patch
>
>
> Currently, same ordering policy is used for pending applications and active 
> applications. When priority is configured for an applications, during 
> recovery high priority application get activated first. It is possible that 
> low priority job was submitted and running state. 
> This causes low priority job in starvation after recovery



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4547) LeagQueue#getApplications() is read-only interface, but it provides reference to caller

2016-01-05 Thread Rohith Sharma K S (JIRA)
Rohith Sharma K S created YARN-4547:
---

 Summary: LeagQueue#getApplications() is read-only interface, but 
it provides reference to caller
 Key: YARN-4547
 URL: https://issues.apache.org/jira/browse/YARN-4547
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S


The below API is read-only interface, but returning reference to the caller. 
This causing caller to modify the orderingPolicy entities. If required 
reference of ordering policy, caller can use 
{{LeagQueue#getOrderingPolicy()#getSchedulableEntities()}}
{code}
  /**
   * Obtain (read-only) collection of active applications.
   */
  public Collection getApplications() {
return orderingPolicy.getSchedulableEntities();
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4547) LeagQueue#getApplications() is read-only interface, but it provides reference to caller

2016-01-05 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4547:

Description: 
The below API is read-only interface, but returning reference to the caller. 
This causing caller to modify the orderingPolicy entities. If required 
reference of ordering policy, caller can use 
{{LeagQueue#getOrderingPolicy()#getSchedulableEntities()}}

The returning object should be clone of orderingPolicy.getSchedulableEntities()
{code}
  /**
   * Obtain (read-only) collection of active applications.
   */
  public Collection getApplications() {
return orderingPolicy.getSchedulableEntities();
  }
{code}

  was:
The below API is read-only interface, but returning reference to the caller. 
This causing caller to modify the orderingPolicy entities. If required 
reference of ordering policy, caller can use 
{{LeagQueue#getOrderingPolicy()#getSchedulableEntities()}}
{code}
  /**
   * Obtain (read-only) collection of active applications.
   */
  public Collection getApplications() {
return orderingPolicy.getSchedulableEntities();
  }
{code}


> LeagQueue#getApplications() is read-only interface, but it provides reference 
> to caller
> ---
>
> Key: YARN-4547
> URL: https://issues.apache.org/jira/browse/YARN-4547
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>
> The below API is read-only interface, but returning reference to the caller. 
> This causing caller to modify the orderingPolicy entities. If required 
> reference of ordering policy, caller can use 
> {{LeagQueue#getOrderingPolicy()#getSchedulableEntities()}}
> The returning object should be clone of 
> orderingPolicy.getSchedulableEntities()
> {code}
>   /**
>* Obtain (read-only) collection of active applications.
>*/
>   public Collection getApplications() {
> return orderingPolicy.getSchedulableEntities();
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4539) CommonNodeLabelsManager throw NullPointerException when the fairScheduler init failed

2016-01-05 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S resolved YARN-4539.
-
Resolution: Duplicate

> CommonNodeLabelsManager throw NullPointerException when the fairScheduler 
> init failed
> -
>
> Key: YARN-4539
> URL: https://issues.apache.org/jira/browse/YARN-4539
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: tangshangwen
>Assignee: tangshangwen
> Attachments: YARN-4539.1.patch
>
>
> When the scheduler initialization failed and RM stop compositeService cause 
> the CommonNodeLabelsManager throw NullPointerException.
> {noformat}
> 2016-01-04 22:19:52,190 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
> failed in state INITED; cause: java.io.IOException: Failed to initialize 
> FairScheduler
> java.io.IOException: Failed to initialize FairScheduler
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1377)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1394)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> 
> 2016-01-04 22:19:52,193 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager 
> metrics system...
> 2016-01-04 22:19:52,194 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics 
> system stopped.
> 2016-01-04 22:19:52,194 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics 
> system shutdown complete.
> 2016-01-04 22:19:52,194 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
> AsyncDispatcher is draining to stop, igonring any new events.
> 2016-01-04 22:19:52,194 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in 
> state STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.stopDispatcher(CommonNodeLabelsManager.java:251)
>   at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:257)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-4539) CommonNodeLabelsManager throw NullPointerException when the fairScheduler init failed

2016-01-05 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S reopened YARN-4539:
-

Reopening the issue for making duplicate

> CommonNodeLabelsManager throw NullPointerException when the fairScheduler 
> init failed
> -
>
> Key: YARN-4539
> URL: https://issues.apache.org/jira/browse/YARN-4539
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: tangshangwen
>Assignee: tangshangwen
> Attachments: YARN-4539.1.patch
>
>
> When the scheduler initialization failed and RM stop compositeService cause 
> the CommonNodeLabelsManager throw NullPointerException.
> {noformat}
> 2016-01-04 22:19:52,190 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
> failed in state INITED; cause: java.io.IOException: Failed to initialize 
> FairScheduler
> java.io.IOException: Failed to initialize FairScheduler
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1377)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1394)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> 
> 2016-01-04 22:19:52,193 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager 
> metrics system...
> 2016-01-04 22:19:52,194 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics 
> system stopped.
> 2016-01-04 22:19:52,194 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics 
> system shutdown complete.
> 2016-01-04 22:19:52,194 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
> AsyncDispatcher is draining to stop, igonring any new events.
> 2016-01-04 22:19:52,194 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in 
> state STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.stopDispatcher(CommonNodeLabelsManager.java:251)
>   at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:257)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4539) CommonNodeLabelsManager throw NullPointerException when the fairScheduler init failed

2016-01-05 Thread tangshangwen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15084576#comment-15084576
 ] 

tangshangwen commented on YARN-4539:


OK, Thanks [~bibinchundatt]

> CommonNodeLabelsManager throw NullPointerException when the fairScheduler 
> init failed
> -
>
> Key: YARN-4539
> URL: https://issues.apache.org/jira/browse/YARN-4539
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: tangshangwen
>Assignee: tangshangwen
> Attachments: YARN-4539.1.patch
>
>
> When the scheduler initialization failed and RM stop compositeService cause 
> the CommonNodeLabelsManager throw NullPointerException.
> {noformat}
> 2016-01-04 22:19:52,190 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
> failed in state INITED; cause: java.io.IOException: Failed to initialize 
> FairScheduler
> java.io.IOException: Failed to initialize FairScheduler
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1377)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1394)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> 
> 2016-01-04 22:19:52,193 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager 
> metrics system...
> 2016-01-04 22:19:52,194 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics 
> system stopped.
> 2016-01-04 22:19:52,194 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics 
> system shutdown complete.
> 2016-01-04 22:19:52,194 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
> AsyncDispatcher is draining to stop, igonring any new events.
> 2016-01-04 22:19:52,194 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in 
> state STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.stopDispatcher(CommonNodeLabelsManager.java:251)
>   at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:257)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-4539) CommonNodeLabelsManager throw NullPointerException when the fairScheduler init failed

2016-01-05 Thread tangshangwen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tangshangwen reopened YARN-4539:


> CommonNodeLabelsManager throw NullPointerException when the fairScheduler 
> init failed
> -
>
> Key: YARN-4539
> URL: https://issues.apache.org/jira/browse/YARN-4539
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: tangshangwen
>Assignee: tangshangwen
> Attachments: YARN-4539.1.patch
>
>
> When the scheduler initialization failed and RM stop compositeService cause 
> the CommonNodeLabelsManager throw NullPointerException.
> {noformat}
> 2016-01-04 22:19:52,190 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
> failed in state INITED; cause: java.io.IOException: Failed to initialize 
> FairScheduler
> java.io.IOException: Failed to initialize FairScheduler
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1377)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1394)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> 
> 2016-01-04 22:19:52,193 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager 
> metrics system...
> 2016-01-04 22:19:52,194 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics 
> system stopped.
> 2016-01-04 22:19:52,194 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics 
> system shutdown complete.
> 2016-01-04 22:19:52,194 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
> AsyncDispatcher is draining to stop, igonring any new events.
> 2016-01-04 22:19:52,194 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in 
> state STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.stopDispatcher(CommonNodeLabelsManager.java:251)
>   at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:257)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4539) CommonNodeLabelsManager throw NullPointerException when the fairScheduler init failed

2016-01-05 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15084602#comment-15084602
 ] 

Rohith Sharma K S commented on YARN-4539:
-

Hi [~tangshangwen], when closing the issue as a duplicate, mark the issue 
resolution as Duplicate. Resolution "Fixed"  will be used if there is patch 
committed by this JIRA.

> CommonNodeLabelsManager throw NullPointerException when the fairScheduler 
> init failed
> -
>
> Key: YARN-4539
> URL: https://issues.apache.org/jira/browse/YARN-4539
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: tangshangwen
>Assignee: tangshangwen
> Attachments: YARN-4539.1.patch
>
>
> When the scheduler initialization failed and RM stop compositeService cause 
> the CommonNodeLabelsManager throw NullPointerException.
> {noformat}
> 2016-01-04 22:19:52,190 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
> failed in state INITED; cause: java.io.IOException: Failed to initialize 
> FairScheduler
> java.io.IOException: Failed to initialize FairScheduler
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1377)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1394)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> 
> 2016-01-04 22:19:52,193 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager 
> metrics system...
> 2016-01-04 22:19:52,194 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics 
> system stopped.
> 2016-01-04 22:19:52,194 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics 
> system shutdown complete.
> 2016-01-04 22:19:52,194 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
> AsyncDispatcher is draining to stop, igonring any new events.
> 2016-01-04 22:19:52,194 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in 
> state STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.stopDispatcher(CommonNodeLabelsManager.java:251)
>   at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:257)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4539) CommonNodeLabelsManager throw NullPointerException when the fairScheduler init failed

2016-01-05 Thread tangshangwen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tangshangwen resolved YARN-4539.

Resolution: Duplicate

> CommonNodeLabelsManager throw NullPointerException when the fairScheduler 
> init failed
> -
>
> Key: YARN-4539
> URL: https://issues.apache.org/jira/browse/YARN-4539
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: tangshangwen
>Assignee: tangshangwen
> Attachments: YARN-4539.1.patch
>
>
> When the scheduler initialization failed and RM stop compositeService cause 
> the CommonNodeLabelsManager throw NullPointerException.
> {noformat}
> 2016-01-04 22:19:52,190 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
> failed in state INITED; cause: java.io.IOException: Failed to initialize 
> FairScheduler
> java.io.IOException: Failed to initialize FairScheduler
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1377)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1394)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> 
> 2016-01-04 22:19:52,193 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager 
> metrics system...
> 2016-01-04 22:19:52,194 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics 
> system stopped.
> 2016-01-04 22:19:52,194 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics 
> system shutdown complete.
> 2016-01-04 22:19:52,194 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
> AsyncDispatcher is draining to stop, igonring any new events.
> 2016-01-04 22:19:52,194 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in 
> state STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.stopDispatcher(CommonNodeLabelsManager.java:251)
>   at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:257)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3946) Update exact reason as to why a submitted app is in ACCEPTED state to app's diagnostic message

2016-01-05 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085027#comment-15085027
 ] 

Naganarasimha G R commented on YARN-3946:
-

I will take a look and update, Thanks for informing !

> Update exact reason as to why a submitted app is in ACCEPTED state to app's 
> diagnostic message
> --
>
> Key: YARN-3946
> URL: https://issues.apache.org/jira/browse/YARN-3946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sumit Nigam
>Assignee: Naganarasimha G R
> Fix For: 2.8.0
>
> Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, 
> YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch, 
> YARN-3946.v1.004.patch, YARN-3946.v1.005.patch, YARN-3946.v1.006.patch, 
> YARN-3946.v1.007.patch, YARN-3946.v1.008.patch
>
>
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4547) LeafQueue#getApplications() is read-only interface, but it provides reference to caller

2016-01-05 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4547:

Summary: LeafQueue#getApplications() is read-only interface, but it 
provides reference to caller  (was: LeagQueue#getApplications() is read-only 
interface, but it provides reference to caller)

> LeafQueue#getApplications() is read-only interface, but it provides reference 
> to caller
> ---
>
> Key: YARN-4547
> URL: https://issues.apache.org/jira/browse/YARN-4547
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>
> The below API is read-only interface, but returning reference to the caller. 
> This causing caller to modify the orderingPolicy entities. If required 
> reference of ordering policy, caller can use 
> {{LeagQueue#getOrderingPolicy()#getSchedulableEntities()}}
> The returning object should be clone of 
> orderingPolicy.getSchedulableEntities()
> {code}
>   /**
>* Obtain (read-only) collection of active applications.
>*/
>   public Collection getApplications() {
> return orderingPolicy.getSchedulableEntities();
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2

2016-01-05 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15084955#comment-15084955
 ] 

Naganarasimha G R commented on YARN-4238:
-

IMO i dont see much value or usecase for  *modified time*, so if required we 
can add it in future and would prefer to not include now. Thoughts?

> createdTime and modifiedTime is not reported while publishing entities to 
> ATSv2
> ---
>
> Key: YARN-4238
> URL: https://issues.apache.org/jira/browse/YARN-4238
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4238-YARN-2928.01.patch, 
> YARN-4238-feature-YARN-2928.002.patch, YARN-4238-feature-YARN-2928.003.patch, 
> YARN-4238-feature-YARN-2928.02.patch
>
>
> While publishing entities from RM and elsewhere we are not sending created 
> time. For instance, created time in TimelineServiceV2Publisher class and for 
> other entities in other such similar classes is not updated. We can easily 
> update created time when sending application created event. Likewise for 
> modification time on every write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3946) Update exact reason as to why a submitted app is in ACCEPTED state to app's diagnostic message

2016-01-05 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085012#comment-15085012
 ] 

Akira AJISAKA commented on YARN-3946:
-

Hi [~Naganarasimha] and [~leftnoteasy], TestNetworkedJob is failing after this 
issue. Would you please review the patch in MAPREDUCE-6579?

> Update exact reason as to why a submitted app is in ACCEPTED state to app's 
> diagnostic message
> --
>
> Key: YARN-3946
> URL: https://issues.apache.org/jira/browse/YARN-3946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sumit Nigam
>Assignee: Naganarasimha G R
> Fix For: 2.8.0
>
> Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, 
> YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch, 
> YARN-3946.v1.004.patch, YARN-3946.v1.005.patch, YARN-3946.v1.006.patch, 
> YARN-3946.v1.007.patch, YARN-3946.v1.008.patch
>
>
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4538) QueueMetrics pending cores and memory metrics wrong

2016-01-05 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4538:
---
Attachment: 0001-YARN-4538.patch

Thanks for looking into the issue, [~leftnoteasy] 

* Added logic to skip pending metrics update when containers<=0
* Updating patch for the issue. Please do review 

> QueueMetrics pending  cores and memory metrics wrong
> 
>
> Key: YARN-4538
> URL: https://issues.apache.org/jira/browse/YARN-4538
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4538.patch
>
>
> Submit 2 application to default queue 
> Check queue metrics for pending cores and memory
> {noformat}
> List allQueues = client.getChildQueueInfos("root");
> for (QueueInfo queueInfo : allQueues) {
>   QueueStatistics quastats = queueInfo.getQueueStatistics();
>   System.out.println(quastats.getPendingVCores());
>   System.out.println(quastats.getPendingMemoryMB());
> }
> {noformat}
> *Output :*
> -20
> -20480



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4537) Pull out priority comparison from fifocomparator and use compound comparator for FifoOrdering policy

2016-01-05 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15084606#comment-15084606
 ] 

Rohith Sharma K S commented on YARN-4537:
-

test case failures are handled in YARN-4306, 

> Pull out priority comparison from fifocomparator and use compound comparator 
> for FifoOrdering policy
> 
>
> Key: YARN-4537
> URL: https://issues.apache.org/jira/browse/YARN-4537
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4537.patch, 0002-YARN-4537.patch, 
> 0003-YARN-4537.patch, 0003-YARN-4537.patch
>
>
> Currently, priority comparison is integrated with FifoComparator. There 
> should be a separate comparator defined for priority comparison so that down 
> the line if any new ordering policy wants to integrate priority, they can use 
> compound comparator where priority will be high preference. 
> The following changes are expected to be done as part of this JIRA
> # Pull out priority comparison from FifoComparator
> # Define new priority comparator
> # Use compound comparator for FifoOrderingPolicy. Oder of preference is   
> Priority,FIFO



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4548) TestCapacityScheduler.testRecoverRequestAfterPreemption fails with NPE

2016-01-05 Thread Akihiro Suda (JIRA)
Akihiro Suda created YARN-4548:
--

 Summary: TestCapacityScheduler.testRecoverRequestAfterPreemption 
fails with NPE
 Key: YARN-4548
 URL: https://issues.apache.org/jira/browse/YARN-4548
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Akihiro Suda


{code}
testRecoverRequestAfterPreemption(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler)
  Time elapsed: 5.552 sec 
<<< ERROR!
java.lang.NullPointerException: null
   at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler.testRecoverRequestAfterPreemption(TestCapacitySch
eduler.java:1263)
{code}
https://github.com/apache/hadoop/blob/d36b6e045f317c94e97cb41a163aa974d161a404/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java#L1260-L1263

Jenkins also hit this two months ago: 
https://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201510.mbox/%3C1100047319.7290.1446252743553.JavaMail.jenkins@crius%3E

My Hadoop version: 4e4b3a8465a8433e78e015cb1ce7e0dc1ebeb523 (Dec 30, 2015)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4539) CommonNodeLabelsManager throw NullPointerException when the fairScheduler init failed

2016-01-05 Thread tangshangwen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15084575#comment-15084575
 ] 

tangshangwen commented on YARN-4539:


OK, Thanks [~bibinchundatt]

> CommonNodeLabelsManager throw NullPointerException when the fairScheduler 
> init failed
> -
>
> Key: YARN-4539
> URL: https://issues.apache.org/jira/browse/YARN-4539
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: tangshangwen
>Assignee: tangshangwen
> Attachments: YARN-4539.1.patch
>
>
> When the scheduler initialization failed and RM stop compositeService cause 
> the CommonNodeLabelsManager throw NullPointerException.
> {noformat}
> 2016-01-04 22:19:52,190 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
> failed in state INITED; cause: java.io.IOException: Failed to initialize 
> FairScheduler
> java.io.IOException: Failed to initialize FairScheduler
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1377)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1394)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> 
> 2016-01-04 22:19:52,193 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager 
> metrics system...
> 2016-01-04 22:19:52,194 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics 
> system stopped.
> 2016-01-04 22:19:52,194 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics 
> system shutdown complete.
> 2016-01-04 22:19:52,194 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
> AsyncDispatcher is draining to stop, igonring any new events.
> 2016-01-04 22:19:52,194 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in 
> state STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.stopDispatcher(CommonNodeLabelsManager.java:251)
>   at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:257)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4539) CommonNodeLabelsManager throw NullPointerException when the fairScheduler init failed

2016-01-05 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15084562#comment-15084562
 ] 

Bibin A Chundatt commented on YARN-4539:


[~tangshangwen]  Can we close this jira?

> CommonNodeLabelsManager throw NullPointerException when the fairScheduler 
> init failed
> -
>
> Key: YARN-4539
> URL: https://issues.apache.org/jira/browse/YARN-4539
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: tangshangwen
>Assignee: tangshangwen
> Attachments: YARN-4539.1.patch
>
>
> When the scheduler initialization failed and RM stop compositeService cause 
> the CommonNodeLabelsManager throw NullPointerException.
> {noformat}
> 2016-01-04 22:19:52,190 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
> failed in state INITED; cause: java.io.IOException: Failed to initialize 
> FairScheduler
> java.io.IOException: Failed to initialize FairScheduler
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1377)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1394)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> 
> 2016-01-04 22:19:52,193 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager 
> metrics system...
> 2016-01-04 22:19:52,194 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics 
> system stopped.
> 2016-01-04 22:19:52,194 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics 
> system shutdown complete.
> 2016-01-04 22:19:52,194 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
> AsyncDispatcher is draining to stop, igonring any new events.
> 2016-01-04 22:19:52,194 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in 
> state STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.stopDispatcher(CommonNodeLabelsManager.java:251)
>   at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:257)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

2016-01-05 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15084611#comment-15084611
 ] 

Sunil G commented on YARN-4304:
---

Hi [~leftnoteasy]
Thannk you for pointing out. As I see it, maxAMLimitPercentage is a primitive 
type (float).

So I think setting null wont help there. I will give some more try with other 
options and will update,.

> AM max resource configuration per partition to be displayed/updated correctly 
> in UI and in various partition related metrics
> 
>
> Key: YARN-4304
> URL: https://issues.apache.org/jira/browse/YARN-4304
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, 
> 0003-YARN-4304.patch, 0004-YARN-4304.patch, 0005-YARN-4304.patch, 
> 0005-YARN-4304.patch, 0006-YARN-4304.patch, 0007-YARN-4304.patch, 
> 0008-YARN-4304.patch, 0009-YARN-4304.patch, REST_and_UI.zip
>
>
> As we are supporting per-partition level max AM resource percentage 
> configuration, UI and various metrics also need to display correct 
> configurations related to same. 
> For eg: Current UI still shows am-resource percentage per queue level. This 
> is to be updated correctly when label config is used.
> - Display max-am-percentage per-partition in Scheduler UI (label also) and in 
> ClusterMetrics page
> - Update queue/partition related metrics w.r.t per-partition 
> am-resource-percentage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4538) QueueMetrics pending cores and memory metrics wrong

2016-01-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085004#comment-15085004
 ] 

Hadoop QA commented on YARN-4538:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
31s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
32s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 1s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 32s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 142m 46s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12780661/0001-YARN-4538.patch |
| JIRA Issue | YARN-4538 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| 

[jira] [Commented] (YARN-4168) Test TestLogAggregationService.testLocalFileDeletionOnDiskFull failing

2016-01-05 Thread Akihiro Suda (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085020#comment-15085020
 ] 

Akihiro Suda commented on YARN-4168:


I'm also hitting this one and YARN-1978.


> Test TestLogAggregationService.testLocalFileDeletionOnDiskFull failing
> --
>
> Key: YARN-4168
> URL: https://issues.apache.org/jira/browse/YARN-4168
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
> Environment: Jenkins
>Reporter: Steve Loughran
>Priority: Critical
>
> {{TestLogAggregationService.testLocalFileDeletionOnDiskFull}} failing on 
> [Jenkins build 
> 1136|https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-Yarn-trunk/1136/testReport/junit/org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation/TestLogAggregationService/testLocalFileDeletionOnDiskFull/]
> {code}
> {noformat}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertFalse(Assert.java:64)
>   at org.junit.Assert.assertFalse(Assert.java:74)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.verifyLocalFileDeletion(TestLogAggregationService.java:229)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLocalFileDeletionOnDiskFull(TestLogAggregationService.java:285)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1978) TestLogAggregationService#testLocalFileDeletionAfterUpload fails sometimes

2016-01-05 Thread Akihiro Suda (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085021#comment-15085021
 ] 

Akihiro Suda commented on YARN-1978:


I'm also hitting this one and YARN-4168.


> TestLogAggregationService#testLocalFileDeletionAfterUpload fails sometimes
> --
>
> Key: YARN-1978
> URL: https://issues.apache.org/jira/browse/YARN-1978
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: YARN-1978.txt
>
>
> This happens in a Windows VM, though the issue isn't related to Windows.
> {code}
> ---
> Test set: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
> ---
> Tests run: 11, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.859 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
> testLocalFileDeletionAfterUpload(org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService)
>   Time elapsed: 0.906 sec  <<< FAILURE!
> junit.framework.AssertionFailedError: check 
> Y:\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server\hadoop-yarn-server-nodemanager\target\TestLogAggregationService-localLogDir\application_1234_0001\container_1234_0001_01_01\stdout
> at junit.framework.Assert.fail(Assert.java:50)
> at junit.framework.Assert.assertTrue(Assert.java:20)
> at junit.framework.Assert.assertFalse(Assert.java:34)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLocalFileDeletionAfterUpload(TestLogAggregationService.java:201)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4335) Allow ResourceRequests to specify ExecutionType of a request ask

2016-01-05 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083929#comment-15083929
 ] 

Arun Suresh commented on YARN-4335:
---

[~leftnoteasy], YARN-2885 depends on this.. and YARN-4412 depends on YARN-2885

> Allow ResourceRequests to specify ExecutionType of a request ask
> 
>
> Key: YARN-4335
> URL: https://issues.apache.org/jira/browse/YARN-4335
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-4335-yarn-2877.001.patch, YARN-4335.002.patch
>
>
> YARN-2882 introduced container types that are internal (not user-facing) and 
> are used by the ContainerManager during execution at the NM.
> With this JIRA we are introducing (user-facing) resource request types that 
> are used by the AM to specify the type of the ResourceRequest.
> We will initially support two resource request types: CONSERVATIVE and 
> OPTIMISTIC.
> CONSERVATIVE resource requests will be handed internally to containers of 
> GUARANTEED type, whereas OPTIMISTIC resource requests will be handed to 
> QUEUEABLE containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4546) ResourceManager crash due to scheduling opportunity overflow

2016-01-05 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083927#comment-15083927
 ] 

Jason Lowe commented on YARN-4546:
--

When the overflow occurs the RM crashes with a stacktrace like this:
{noformat}
2015-12-26 20:18:39,731 [ResourceManager Event Processor] FATAL 
resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to 
the scheduler
java.lang.IllegalArgumentException: count cannot be negative: -2147483648
at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:115)
at 
com.google.common.collect.Multisets.checkNonnegative(Multisets.java:943)
at 
com.google.common.collect.AbstractMapBasedMultiset.setCount(AbstractMapBasedMultiset.java:277)
at com.google.common.collect.HashMultiset.setCount(HashMultiset.java:34)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.addSchedulingOpportunity(SchedulerApplicationAttempt.java:485)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:872)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:586)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:447)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1019)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1061)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:115)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:682)
at java.lang.Thread.run(Thread.java:745)
2015-12-26 20:18:39,732 [ResourceManager Event Processor] INFO 
resourcemanager.ResourceManager: Exiting, bbye..
{noformat}

In this particular case the resource request went unsatisfied for a long time 
due to the use of node labels and the application having blacklisted every node 
with that label.  At that point no node in the cluster could satisfy the 
request because it either didn't have the label or it was blacklisted.  So the 
resource request accumulated scheduling opportunities until the count 
eventually overflowed.

> ResourceManager crash due to scheduling opportunity overflow
> 
>
> Key: YARN-4546
> URL: https://issues.apache.org/jira/browse/YARN-4546
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.1
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
>
> If a resource request lingers long enough unsatisfied then the scheduling 
> opportunities count for the request can overflow and cause an RM crash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4200) Refactor reader classes in storage to nest under hbase specific package name

2016-01-05 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083932#comment-15083932
 ] 

Vrushali C commented on YARN-4200:
--

Hi [~gtCarrera]

The patch looks good but I believe it's outdated now (since we have a new code 
branch etc). Do you want to rebase it? Overall looks fine.

thanks
Vrushali


> Refactor reader classes in storage to nest under hbase specific package name
> 
>
> Key: YARN-4200
> URL: https://issues.apache.org/jira/browse/YARN-4200
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Li Lu
>Priority: Minor
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4200-YARN-2928.001.patch
>
>
> As suggested by [~gtCarrera9] in YARN-4074, filing jira to refactor the code 
> to group together the reader classes under a package in storage that 
> indicates these are hbase specific. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2885) Create AMRMProxy request interceptor for distributed scheduling decisions for queueable containers

2016-01-05 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083940#comment-15083940
 ] 

Wangda Tan commented on YARN-2885:
--

Thanks [~asuresh],

I'm fine with this patch's logic to interact with existing YARN components. I 
would like to call in help from others to review distributed scheduling related 
logic.

> Create AMRMProxy request interceptor for distributed scheduling decisions for 
> queueable containers
> --
>
> Key: YARN-2885
> URL: https://issues.apache.org/jira/browse/YARN-2885
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Arun Suresh
> Attachments: YARN-2885-yarn-2877.001.patch, 
> YARN-2885-yarn-2877.002.patch, YARN-2885-yarn-2877.full-2.patch, 
> YARN-2885-yarn-2877.full-3.patch, YARN-2885-yarn-2877.full.patch, 
> YARN-2885-yarn-2877.v4.patch, YARN-2885-yarn-2877.v5.patch, 
> YARN-2885_api_changes.patch
>
>
> We propose to add a Local ResourceManager (LocalRM) to the NM in order to 
> support distributed scheduling decisions. 
> Architecturally we leverage the RMProxy, introduced in YARN-2884. 
> The LocalRM makes distributed decisions for queuable containers requests. 
> Guaranteed-start requests are still handled by the central RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2016-01-05 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083947#comment-15083947
 ] 

Vrushali C commented on YARN-4356:
--

Hmm. I am not sure if it is inconsistent. The actual behavior right now checks 
for an exact 2. The javadoc is more comprehensive and so it still holds true 
since it's smaller than 3. Now say when we release v2 and then we are working 
v2.1, we might want to add in a check for 2.1 which does something different 
from 2 and at the same time, it's not really timeline v3. So IMO, the check is 
also fine and the javadoc is also fine.

> ensure the timeline service v.2 is disabled cleanly and has no impact when 
> it's turned off
> --
>
> Key: YARN-4356
> URL: https://issues.apache.org/jira/browse/YARN-4356
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Fix For: YARN-2928
>
> Attachments: YARN-4356-feature-YARN-2928.002.patch, 
> YARN-4356-feature-YARN-2928.003.patch, YARN-4356-feature-YARN-2928.004.patch, 
> YARN-4356-feature-YARN-2928.005.patch, YARN-4356-feature-YARN-2928.006.patch, 
> YARN-4356-feature-YARN-2928.poc.001.patch
>
>
> For us to be able to merge the first milestone drop to trunk, we want to 
> ensure that once disabled the timeline service v.2 has no impact from the 
> server side to the client side. If the timeline service is not enabled, no 
> action should be done. If v.1 is enabled but not v.2, v.1 should behave the 
> same as it does before the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-05 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083957#comment-15083957
 ] 

Bikas Saha commented on YARN-1011:
--

I agree with natural container churn in favor of preemption to avoid lost work 
though the issue of clearly defining scheduler policy still remains.

bq.  If we were oversubscribing 10X then I'd probably want it for sure, but if 
it's at most 2X capacity then worst case is a container only gets 50% of the 
resource it had requested. Obviously for something like memory this has to be 
closely controlled because going over the physical capabilities of the machine 
has very significant consequences. But for CPU, I'd definitely be inclined to 
live with the occasional 50% worst case for all containers, in order to avoid 
the 1/1024th worst case for OPPORTUNISTIC containers on a busy node.
I did not understand this. Does this mean, its ok for normal containers to run 
50% slower in the presence of opportunistic containers? If yes, then there are 
scenarios where this may not be a valid choice. E.g. when a cluster is running 
a mix of SLA and non-SLA jobs. Non-SLA jobs are ok if there containers got 
slowed down to increase cluster utilization by running opportunistic containers 
because we are getting higher overall throughput. But SLA jobs are not ok with 
missing deadlines because there tasks ran 50% slower. 

IMO, the litmus test for a feature like this would be to take an existing 
cluster (with low utilization because tasks are asking for more resources than 
what they need 100% of the time). Then turn this feature on and get better 
cluster utilization and throughput without affecting the existing workload. 
Whatever be the internal implementation details. Agree?

bq. 50% of maximum-under-utilized resource of past 30 min for each NM can be 
used to allocate opportunistic containers.
These are heuristics and may all be valid under different circumstances. What 
we should step back and see is what is the source of this optimization.
Observation : Cluster is under-utilized despite being fully allocated
Possible reasons : 
1) Tasks are incorrectly over-allocated. Will never use the resources they ask 
for and hence we can safely run additional opportunistic containers. So this 
feature is used to compensate for poorly configured applications. Probably a 
valid scenario but is it common?
2) Tasks are correctly allocated but dont use their capacity to the limit all 
the time. E.g. Terasort will use high cpu only during the sorting but not 
during the entire length of the job. But its containers will ask for enough CPU 
to run the sort in the desired time. This is a typical application behavior 
where resource usage varies over time. So this feature is used to soak up the 
fallow resources in the cluster while tasks are not using their quoted capacity.

The arguments and assumptions we make need to be considered in the light of 
which of 1 or 2 is the common case and where this feature will be useful.

While its useful to have configuration knobs, for a complex dynamic feature 
like this that is basically reacting to runtime observations, it may be quite 
hard to be able to configure this statically using manual configuration. While 
some limits about max over-allocation limit etc. are easy and probably required 
to configure, we should look at making this feature work by itself instead of 
relying exclusively on configuration (hell :P) for users to make this feature 
usable.

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2016-01-05 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083959#comment-15083959
 ] 

Li Lu commented on YARN-4356:
-

Ah OK. But we do need to update the actual code when we introduce a new version 
like 2.1, right? 

> ensure the timeline service v.2 is disabled cleanly and has no impact when 
> it's turned off
> --
>
> Key: YARN-4356
> URL: https://issues.apache.org/jira/browse/YARN-4356
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Fix For: YARN-2928
>
> Attachments: YARN-4356-feature-YARN-2928.002.patch, 
> YARN-4356-feature-YARN-2928.003.patch, YARN-4356-feature-YARN-2928.004.patch, 
> YARN-4356-feature-YARN-2928.005.patch, YARN-4356-feature-YARN-2928.006.patch, 
> YARN-4356-feature-YARN-2928.poc.001.patch
>
>
> For us to be able to merge the first milestone drop to trunk, we want to 
> ensure that once disabled the timeline service v.2 has no impact from the 
> server side to the client side. If the timeline service is not enabled, no 
> action should be done. If v.1 is enabled but not v.2, v.1 should behave the 
> same as it does before the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4546) ResourceManager crash due to scheduling opportunity overflow

2016-01-05 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-4546:
-
Attachment: YARN-4546.001.patch

Patch to cap scheduling opportunities so they don't overflow.

> ResourceManager crash due to scheduling opportunity overflow
> 
>
> Key: YARN-4546
> URL: https://issues.apache.org/jira/browse/YARN-4546
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.1
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-4546.001.patch
>
>
> If a resource request lingers long enough unsatisfied then the scheduling 
> opportunities count for the request can overflow and cause an RM crash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4200) Refactor reader classes in storage to nest under hbase specific package name

2016-01-05 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083933#comment-15083933
 ] 

Li Lu commented on YARN-4200:
-

Sure. Will do that soon. Thanks! 

> Refactor reader classes in storage to nest under hbase specific package name
> 
>
> Key: YARN-4200
> URL: https://issues.apache.org/jira/browse/YARN-4200
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Li Lu
>Priority: Minor
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4200-YARN-2928.001.patch
>
>
> As suggested by [~gtCarrera9] in YARN-4074, filing jira to refactor the code 
> to group together the reader classes under a package in storage that 
> indicates these are hbase specific. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2016-01-05 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083938#comment-15083938
 ] 

Li Lu commented on YARN-4356:
-

One quick query w.r.t changes in this JIRA: I just noticed that in 
YarnConfiguration#timelineServiceV2Enabled, the javadoc describes it as:
bq. whether the timeline service v.2 is enabled. V.2 refers to a version 
greater than equal to 2 but smaller than 3.
But the actual behavior is to directly check if the version of timeline server 
equals to 2. I believe this will generate some inconsistency. Are there any 
special considerations? Shall we quickly put an addendum patch? 

> ensure the timeline service v.2 is disabled cleanly and has no impact when 
> it's turned off
> --
>
> Key: YARN-4356
> URL: https://issues.apache.org/jira/browse/YARN-4356
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Fix For: YARN-2928
>
> Attachments: YARN-4356-feature-YARN-2928.002.patch, 
> YARN-4356-feature-YARN-2928.003.patch, YARN-4356-feature-YARN-2928.004.patch, 
> YARN-4356-feature-YARN-2928.005.patch, YARN-4356-feature-YARN-2928.006.patch, 
> YARN-4356-feature-YARN-2928.poc.001.patch
>
>
> For us to be able to merge the first milestone drop to trunk, we want to 
> ensure that once disabled the timeline service v.2 has no impact from the 
> server side to the client side. If the timeline service is not enabled, no 
> action should be done. If v.1 is enabled but not v.2, v.1 should behave the 
> same as it does before the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2

2016-01-05 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083941#comment-15083941
 ] 

Vrushali C commented on YARN-4238:
--


The hbase cell timestamps may not indicate the actual entity modification time. 
They indicate the time at which that cell was written to hbase. For metrics, 
the cell timestamp is the event metric timestamp so that metrics can be 
arranged in chronological order in hbase. 

For timeline entities, I believe we should have a complete sync between the 
object model in code and the data model in hbase. So we either keep modified 
time in the entity class as well as in hbase as a column or remove it 
altogether. 

If we decide to keep modified time as a class member, could we have the 
timeline entity constructor populate it in the constructor as well as the 
client update it? Also how about we make all the setter methods in the entity 
class update the modified time value? That way the field value still stays 
relevant.

> createdTime and modifiedTime is not reported while publishing entities to 
> ATSv2
> ---
>
> Key: YARN-4238
> URL: https://issues.apache.org/jira/browse/YARN-4238
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4238-YARN-2928.01.patch, 
> YARN-4238-feature-YARN-2928.002.patch, YARN-4238-feature-YARN-2928.003.patch, 
> YARN-4238-feature-YARN-2928.02.patch
>
>
> While publishing entities from RM and elsewhere we are not sending created 
> time. For instance, created time in TimelineServiceV2Publisher class and for 
> other entities in other such similar classes is not updated. We can easily 
> update created time when sending application created event. Likewise for 
> modification time on every write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4200) Refactor reader classes in storage to nest under hbase specific package name

2016-01-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083968#comment-15083968
 ] 

Hadoop QA commented on YARN-4200:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s {color} 
| {color:red} YARN-4200 does not apply to YARN-2928. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12766199/YARN-4200-YARN-2928.001.patch
 |
| JIRA Issue | YARN-4200 |
| Powered by | Apache Yetus 0.2.0-SNAPSHOT   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10162/console |


This message was automatically generated.



> Refactor reader classes in storage to nest under hbase specific package name
> 
>
> Key: YARN-4200
> URL: https://issues.apache.org/jira/browse/YARN-4200
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Li Lu
>Priority: Minor
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4200-YARN-2928.001.patch
>
>
> As suggested by [~gtCarrera9] in YARN-4074, filing jira to refactor the code 
> to group together the reader classes under a package in storage that 
> indicates these are hbase specific. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-05 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083974#comment-15083974
 ] 

Jason Lowe commented on YARN-1011:
--

bq. Tasks are incorrectly over-allocated. Will never use the resources they ask 
for and hence we can safely run additional opportunistic containers. So this 
feature is used to compensate for poorly configured applications. Probably a 
valid scenario but is it common?

In my experience this is fairly common.  Users tend to twiddle with config 
values until something is working then they don't bother to revisit until 
there's a problem.  And it's easier to over allocate than to spend the time to 
carefully tune the task size.  Even if the user is interested in tuning they 
can't always tune optimally.  Some examples are data skew or other 
task-specific issues where a few tasks need a lot of memory but the vast 
majority of the others do not.  Many frameworks only allow the task sizes to be 
configured as a group, so the user has to run all the tasks in the group with 
the worst-case container size even though most of them don't need it.  Pig on 
MapReduce is another example, where it will spawn multiple jobs but the user 
can only configure the memory settings once in the script and they apply to all 
jobs launched by the script.  Therefore the user has to set it to the 
worst-case size across all the script's jobs, and all but one of the jobs runs 
with oversized map containers.

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2016-01-05 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083994#comment-15083994
 ] 

Vrushali C commented on YARN-4356:
--

Yes, if we want the behavior to be different. If the 2.1 behavior simply builds 
upon 2, then we can leave it as TimelineV2. Or if you'd like, we can have a 
simple addendum patch to make it clearer, as you had suggested.

> ensure the timeline service v.2 is disabled cleanly and has no impact when 
> it's turned off
> --
>
> Key: YARN-4356
> URL: https://issues.apache.org/jira/browse/YARN-4356
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Fix For: YARN-2928
>
> Attachments: YARN-4356-feature-YARN-2928.002.patch, 
> YARN-4356-feature-YARN-2928.003.patch, YARN-4356-feature-YARN-2928.004.patch, 
> YARN-4356-feature-YARN-2928.005.patch, YARN-4356-feature-YARN-2928.006.patch, 
> YARN-4356-feature-YARN-2928.poc.001.patch
>
>
> For us to be able to merge the first milestone drop to trunk, we want to 
> ensure that once disabled the timeline service v.2 has no impact from the 
> server side to the client side. If the timeline service is not enabled, no 
> action should be done. If v.1 is enabled but not v.2, v.1 should behave the 
> same as it does before the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4335) Allow ResourceRequests to specify ExecutionType of a request ask

2016-01-05 Thread Konstantinos Karanasos (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantinos Karanasos updated YARN-4335:
-
Attachment: YARN-4335.002.patch

New version of the patch, addressing [~leftnoteasy]'s comments.
# Fixed getExeceutionType javadoc.
# Marked new APIs in ResourceRequest as unstable.

> Allow ResourceRequests to specify ExecutionType of a request ask
> 
>
> Key: YARN-4335
> URL: https://issues.apache.org/jira/browse/YARN-4335
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-4335-yarn-2877.001.patch, YARN-4335.002.patch
>
>
> YARN-2882 introduced container types that are internal (not user-facing) and 
> are used by the ContainerManager during execution at the NM.
> With this JIRA we are introducing (user-facing) resource request types that 
> are used by the AM to specify the type of the ResourceRequest.
> We will initially support two resource request types: CONSERVATIVE and 
> OPTIMISTIC.
> CONSERVATIVE resource requests will be handed internally to containers of 
> GUARANTEED type, whereas OPTIMISTIC resource requests will be handed to 
> QUEUEABLE containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4335) Allow ResourceRequests to specify ExecutionType of a request ask

2016-01-05 Thread Konstantinos Karanasos (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082917#comment-15082917
 ] 

Konstantinos Karanasos commented on YARN-4335:
--

Thank you for the feedback, [~leftnoteasy].

I addressed your first two comments and will now upload a new version of the 
patch.

bq. Javadocs of ExecutionType: if we use ExecutionType for resource request, I 
would suggest to add description that scheduler could use it to decide if idle 
resources could be used by the resource request.
Just double-checking: do you mean adding an additional description in the 
javadoc of the ExecutionType?
Let me know and I will go ahead and update the patch.

> Allow ResourceRequests to specify ExecutionType of a request ask
> 
>
> Key: YARN-4335
> URL: https://issues.apache.org/jira/browse/YARN-4335
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-4335-yarn-2877.001.patch
>
>
> YARN-2882 introduced container types that are internal (not user-facing) and 
> are used by the ContainerManager during execution at the NM.
> With this JIRA we are introducing (user-facing) resource request types that 
> are used by the AM to specify the type of the ResourceRequest.
> We will initially support two resource request types: CONSERVATIVE and 
> OPTIMISTIC.
> CONSERVATIVE resource requests will be handed internally to containers of 
> GUARANTEED type, whereas OPTIMISTIC resource requests will be handed to 
> QUEUEABLE containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-05 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083880#comment-15083880
 ] 

Nathan Roberts commented on YARN-1011:
--

Very excited about this feature and agree that we should make this as simple as 
possible in the first go around. I have a couple of initial questions. 

bq. As soon as we realize the perf is slower because the node has higher usage 
than we had anticipated, we preempt the container and retry allocation 
(guaranteed or opportunistic depending on the new cluster state). So, it 
shouldn't run slower for longer than our monitoring interval. Is this 
assumption okay?

This seems hard. ([~bikassaha] comment above). 

All of this basically boils down to the fact that preempting a container means 
lost work, so the decision to preempt something shouldn't be taken lightly. For 
resources like memory we have to react quickly, and that's fine. But for things 
like CPU, I'm personally ok with latency on the order of single digit minutes 
so that natural container churn almost always avoids preemption.

Because of that complexity, I'm not 100% convinced that disfavoring 
OPPORTUNISTIC containers (e.g. low value for cpu_shares) is something that buys 
us very much. If we were oversubscribing 10X then I'd probably want it for 
sure, but if it's at most 2X capacity then worst case is a container only gets 
50% of the resource it had requested. Obviously for something like memory this 
has to be closely controlled because going over the physical capabilities of 
the machine has very significant consequences. But for CPU, I'd definitely be 
inclined to live with the occasional 50% worst case for all containers, in 
order to avoid the 1/1024th worst case for OPPORTUNISTIC containers on a busy 
node.

So, hopefully we can make the policy quite configurable so that the amount of 
disfavoring can be tuned for various workloads.

bq. In practice, I expect admins to come up with a reasonable threshold for 
over-subscription: e.g. 0.8 - we use only oversubscribe upto 80% of capacity 
advertised through yarn.nodemanger.resource.*. Thinking more about this, this 
threshold should have an upper limit - 0.95? 
Can we make this per-resource? (80% memory, 120% CPU)?


> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4335) Allow ResourceRequests to specify ExecutionType of a request ask

2016-01-05 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083919#comment-15083919
 ] 

Wangda Tan commented on YARN-4335:
--

Thanks [~kkaranasos] for updating the patch

bq. Just double-checking: do you mean adding an additional description in the 
javadoc of the ExecutionType?
Yes

And:
{code}
87@Public
88@Stable
89public static ResourceRequest newInstance(Priority priority, String 
hostName, ...
{code}
Should be unstable as well.

As [~kasha] 
[commented|https://issues.apache.org/jira/browse/YARN-1011?focusedCommentId=15076749=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15076749]
 on YARN-1011:
bq. Given this complication, I would prefer we do not involve AMs in the 
decision-making process. Based on the need and usecases, we could revisit this 
at a later time. Note that YARN-4335 adds this to ResourceRequest for 
distributed scheduling, and even there they are not entirely sure if it needs 
to be a part.

I'm not sure if there's any tasks of YARN-2877 directly depends on it, It's 
fine to me to commit this to YARN-2877 branch, but I prefer to hold commit 
unless any tasks require it.

> Allow ResourceRequests to specify ExecutionType of a request ask
> 
>
> Key: YARN-4335
> URL: https://issues.apache.org/jira/browse/YARN-4335
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-4335-yarn-2877.001.patch, YARN-4335.002.patch
>
>
> YARN-2882 introduced container types that are internal (not user-facing) and 
> are used by the ContainerManager during execution at the NM.
> With this JIRA we are introducing (user-facing) resource request types that 
> are used by the AM to specify the type of the ResourceRequest.
> We will initially support two resource request types: CONSERVATIVE and 
> OPTIMISTIC.
> CONSERVATIVE resource requests will be handed internally to containers of 
> GUARANTEED type, whereas OPTIMISTIC resource requests will be handed to 
> QUEUEABLE containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Moved] (YARN-4546) ResourceManager crash due to scheduling opportunity overflow

2016-01-05 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe moved MAPREDUCE-6599 to YARN-4546:
-

Affects Version/s: (was: 2.6.1)
   2.6.1
 Target Version/s: 2.7.3, 2.6.4  (was: 2.7.3, 2.6.4)
  Component/s: (was: resourcemanager)
   resourcemanager
  Key: YARN-4546  (was: MAPREDUCE-6599)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

> ResourceManager crash due to scheduling opportunity overflow
> 
>
> Key: YARN-4546
> URL: https://issues.apache.org/jira/browse/YARN-4546
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.1
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
>
> If a resource request lingers long enough unsatisfied then the scheduling 
> opportunities count for the request can overflow and cause an RM crash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching

2016-01-05 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-4265:

Attachment: YARN-4265-trunk.003.patch

Refresh the patch once more to:
# Add documentations about cache item and its main features
# Double check synchronizations and fix unsynchronized accessors. 
# Change accessor modifiers to some methods to make them stricter. 

Ran findbugs locally and it looks fine. 

> Provide new timeline plugin storage to support fine-grained entity caching
> --
>
> Key: YARN-4265
> URL: https://issues.apache.org/jira/browse/YARN-4265
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4265-trunk.001.patch, YARN-4265-trunk.002.patch, 
> YARN-4265-trunk.003.patch, YARN-4265.YARN-4234.001.patch, 
> YARN-4265.YARN-4234.002.patch
>
>
> To support the newly proposed APIs in YARN-4234, we need to create a new 
> plugin timeline store. The store may have similar behavior as the 
> EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id 
> granularity, instead of application id granularity. Let's have this storage 
> as a standalone one, instead of updating EntityFileTimelineStore, to keep the 
> existing store (EntityFileTimelineStore) stable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4161) Capacity Scheduler : Assign single or multiple containers per heart beat driven by configuration

2016-01-05 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-4161:

Attachment: YARN-4161.patch

Attaching patch

> Capacity Scheduler : Assign single or multiple containers per heart beat 
> driven by configuration
> 
>
> Key: YARN-4161
> URL: https://issues.apache.org/jira/browse/YARN-4161
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: capacityscheduler
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: YARN-4161.patch
>
>
> Capacity Scheduler right now schedules multiple containers per heart beat if 
> there are more resources available in the node.
> This approach works fine however in some cases its not distribute the load 
> across the cluster hence throughput of the cluster suffers. I am adding 
> feature to drive that using configuration by that we can control the number 
> of containers assigned per heart beat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4541) Change log message in LocalizedResource#handle() to DEBUG

2016-01-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083829#comment-15083829
 ] 

Hadoop QA commented on YARN-4541:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 9s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 54s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 18s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 35m 30s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12780617/YARN-4541.002.patch |
| JIRA Issue | YARN-4541 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux c85a69ec18d7 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (YARN-4538) QueueMetrics pending cores and memory metrics wrong

2016-01-05 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083846#comment-15083846
 ] 

Wangda Tan commented on YARN-4538:
--

Thanks for reporting this issue, [~bibinchundatt].

The root cause is like what you said, _incrPendingResource and 
_decrPendingResource have different implementations.

I think there're two fixes needed:
- Add {{max(containers, 1)}} to _incrPendingResources
- Caller of {{QueueMetrics#incrPendingResources/decrPendingResources}} should 
avoid calling it when #containers == 0, and it's not a decrease/increase 
container.

Thoughts?

> QueueMetrics pending  cores and memory metrics wrong
> 
>
> Key: YARN-4538
> URL: https://issues.apache.org/jira/browse/YARN-4538
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> Submit 2 application to default queue 
> Check queue metrics for pending cores and memory
> {noformat}
> List allQueues = client.getChildQueueInfos("root");
> for (QueueInfo queueInfo : allQueues) {
>   QueueStatistics quastats = queueInfo.getQueueStatistics();
>   System.out.println(quastats.getPendingVCores());
>   System.out.println(quastats.getPendingMemoryMB());
> }
> {noformat}
> *Output :*
> -20
> -20480



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4541) Change log message in LocalizedResource#handle() to DEBUG

2016-01-05 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083857#comment-15083857
 ] 

Ray Chiang commented on YARN-4541:
--

No test needed for variable name fix.

> Change log message in LocalizedResource#handle() to DEBUG
> -
>
> Key: YARN-4541
> URL: https://issues.apache.org/jira/browse/YARN-4541
> Project: Hadoop YARN
>  Issue Type: Task
>Affects Versions: 2.8.0
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Minor
> Attachments: YARN-4541.001.patch, YARN-4541.002.patch
>
>
> This section of code can fill up a log fairly quickly.
> {code}
>if (oldState != newState) {
> LOG.info("Resource " + resourcePath + (localPath != null ?
>   "(->" + localPath + ")": "") + " transitioned from " + oldState
> + " to " + newState);
>}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4161) Capacity Scheduler : Assign single or multiple containers per heart beat driven by configuration

2016-01-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083867#comment-15083867
 ] 

Hadoop QA commented on YARN-4161:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s {color} 
| {color:red} YARN-4161 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12780627/YARN-4161.patch |
| JIRA Issue | YARN-4161 |
| Powered by | Apache Yetus 0.2.0-SNAPSHOT   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10160/console |


This message was automatically generated.



> Capacity Scheduler : Assign single or multiple containers per heart beat 
> driven by configuration
> 
>
> Key: YARN-4161
> URL: https://issues.apache.org/jira/browse/YARN-4161
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: capacityscheduler
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: YARN-4161.patch
>
>
> Capacity Scheduler right now schedules multiple containers per heart beat if 
> there are more resources available in the node.
> This approach works fine however in some cases its not distribute the load 
> across the cluster hence throughput of the cluster suffers. I am adding 
> feature to drive that using configuration by that we can control the number 
> of containers assigned per heart beat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-05 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083905#comment-15083905
 ] 

Wangda Tan commented on YARN-1011:
--

bq. So, hopefully we can make the policy quite configurable so that the amount 
of disfavoring can be tuned for various workloads.
I also +1 to make this policy configurable. This is way I filed YARN-4511.

Instead of threshold of "current" instant usage, I think we can use the 
threshold of "record" usage. For example, only 50% of *maximum-under-utilized* 
resource of past 30 min for each NM can be used to allocate opportunistic 
containers. This could reduce number of opportunistic containers we can 
allocate, but also this could reduce risks that opportunistic containers affect 
other containers.

bq. Can we make this per-resource? (80% memory, 120% CPU)?
+1


> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3692) Allow REST API to set a user generated message when killing an application

2016-01-05 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083018#comment-15083018
 ] 

Rohith Sharma K S commented on YARN-3692:
-

Thanks [~Naganarasimha Garla] and [~jlowe] for sharing your thoughts..
1. I will consider defining new api with an addition parameter 
{{killApplication(ApplicationId applicationId,String reason)}}
2. As of now, will not be deprecating the existing kill api
3. Generating a useful diagnostic message is a very good improvement.


> Allow REST API to set a user generated message when killing an application
> --
>
> Key: YARN-3692
> URL: https://issues.apache.org/jira/browse/YARN-3692
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Rajat Jain
>Assignee: Rohith Sharma K S
>
> Currently YARN's REST API supports killing an application without setting a 
> diagnostic message. It would be good to provide that support.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4542) Cleanup AHS code and configuration

2016-01-05 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083029#comment-15083029
 ] 

Junping Du commented on YARN-4542:
--

Hi Naga, please feel free to take it up. Thanks!

> Cleanup AHS code and configuration
> --
>
> Key: YARN-4542
> URL: https://issues.apache.org/jira/browse/YARN-4542
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>
> ATS (Application Timeline Sever/Service, we already have many versions so 
> far) has been designed and implemented to replace AHS for a long time. We 
> should consider to cleanup AHS related configuration and code later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2016-01-05 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083031#comment-15083031
 ] 

Junping Du commented on YARN-2902:
--

Sounds like a good plan. Thanks!

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Fix For: 2.7.2
>
> Attachments: YARN-2902-branch-2.6.01.patch, YARN-2902.002.patch, 
> YARN-2902.03.patch, YARN-2902.04.patch, YARN-2902.05.patch, 
> YARN-2902.06.patch, YARN-2902.07.patch, YARN-2902.08.patch, 
> YARN-2902.09.patch, YARN-2902.10.patch, YARN-2902.11.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can fail to shutdown

2016-01-05 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083034#comment-15083034
 ] 

Junping Du commented on YARN-3697:
--

Thanks [~zxu]!

> FairScheduler: ContinuousSchedulingThread can fail to shutdown
> --
>
> Key: YARN-3697
> URL: https://issues.apache.org/jira/browse/YARN-3697
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.7.2, 2.6.4
>
> Attachments: YARN-3697.000.patch, YARN-3697.001.patch
>
>
> FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
> sometimes. 
> The reason is because the InterruptedException is blocked in 
> continuousSchedulingAttempt
> {code}
>   try {
> if (node != null && Resources.fitsIn(minimumAllocation,
> node.getAvailableResource())) {
>   attemptScheduling(node);
> }
>   } catch (Throwable ex) {
> LOG.error("Error while attempting scheduling for node " + node +
> ": " + ex.toString(), ex);
>   }
> {code}
> I saw the following exception after stop:
> {code}
> 2015-05-17 23:30:43,065 WARN  [FairSchedulerContinuousScheduling] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285)
> 2015-05-17 23:30:43,066 ERROR [FairSchedulerContinuousScheduling] 
> fair.FairScheduler (FairScheduler.java:continuousSchedulingAttempt(1017)) - 
> Error while attempting scheduling for node host: 127.0.0.2:2 #containers=1 
> available= used=: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.InterruptedException
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.InterruptedException
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:249)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
>   at 
> 

[jira] [Updated] (YARN-4537) Pull out priority comparison from fifocomparator and use compound comparator for FifoOrdering policy

2016-01-05 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4537:

Attachment: 0003-YARN-4537.patch

Updating the patch fixing comments..Kindly review the patch

> Pull out priority comparison from fifocomparator and use compound comparator 
> for FifoOrdering policy
> 
>
> Key: YARN-4537
> URL: https://issues.apache.org/jira/browse/YARN-4537
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4537.patch, 0002-YARN-4537.patch, 
> 0003-YARN-4537.patch
>
>
> Currently, priority comparison is integrated with FifoComparator. There 
> should be a separate comparator defined for priority comparison so that down 
> the line if any new ordering policy wants to integrate priority, they can use 
> compound comparator where priority will be high preference. 
> The following changes are expected to be done as part of this JIRA
> # Pull out priority comparison from FifoComparator
> # Define new priority comparator
> # Use compound comparator for FifoOrderingPolicy. Oder of preference is   
> Priority,FIFO



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4546) ResourceManager crash due to scheduling opportunity overflow

2016-01-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15084475#comment-15084475
 ] 

Hadoop QA commented on YARN-4546:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 9s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 15s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 137m 43s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12780634/YARN-4546.001.patch |
| JIRA Issue | YARN-4546 |
| Optional Tests |  asflicense  

[jira] [Updated] (YARN-4544) All the log messages about rolling monitoring interval are shown with the WARN level

2016-01-05 Thread Takashi Ohnishi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takashi Ohnishi updated YARN-4544:
--
Description: 
About yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds, there 
are three log messages corresponding to the value set to this parameter, but 
all of them are shown with the WARN level.

(a) disabled (default)
{code}
2016-01-05 22:19:29,062 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggreg 
ation.AppLogAggregatorImpl: rollingMonitorInterval is set as -1. The log 
rolling mornitoring interval is disabled. The logs will be aggregated after 
this application is finished.
{code}

(b) enabled
{code}
2016-01-06 00:41:15,808 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
 rollingMonitorInterval is set as 7200. The logs will be aggregated every 7200 
seconds
{code}

(c) enabled but wrong configuration
{code}
2016-01-06 00:39:50,820 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
 rollingMonitorIntervall should be more than or equal to 3600 seconds. Using 
3600 seconds instead.
{code}

I think it is better to output with WARN only in case (c), but it is ok to 
output with INFO in case (a) and (b).


  was:
About yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds, there 
are three log messages corresponding to the value set to this parameter, but 
all of them are shown with the WARN level.

(a) disabled (default)
{code}
2016-01-05 22:19:29,062 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggreg 
ation.AppLogAggregatorImpl: rollingMonitorInterval is set as -1. The log 
rolling mornitoring inte rval is disabled. The logs will be aggregated 
after this application is finished.
{code}

(b) enabled
{code}
2016-01-06 00:41:15,808 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
 rollingMonitorInterval is set as 7200. The logs will be aggregated every 7200 
seconds
{code}

(c) enabled but wrong configuration
{code}
2016-01-06 00:39:50,820 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
 rollingMonitorIntervall should be more than or equal to 3600 seconds. Using 
3600 seconds instead.
{code}

I think it is better to output with WARN only in case (c), but it is ok to 
output with INFO in case (a) and (b).



> All the log messages about rolling monitoring interval are shown with the 
> WARN level
> 
>
> Key: YARN-4544
> URL: https://issues.apache.org/jira/browse/YARN-4544
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.7.1
>Reporter: Takashi Ohnishi
> Attachments: YARN-4544.patch
>
>
> About yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds, 
> there are three log messages corresponding to the value set to this 
> parameter, but all of them are shown with the WARN level.
> (a) disabled (default)
> {code}
> 2016-01-05 22:19:29,062 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggreg 
> ation.AppLogAggregatorImpl: rollingMonitorInterval is set as -1. The log 
> rolling mornitoring interval is disabled. The logs will be aggregated after 
> this application is finished.
> {code}
> (b) enabled
> {code}
> 2016-01-06 00:41:15,808 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
>  rollingMonitorInterval is set as 7200. The logs will be aggregated every 
> 7200 seconds
> {code}
> (c) enabled but wrong configuration
> {code}
> 2016-01-06 00:39:50,820 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
>  rollingMonitorIntervall should be more than or equal to 3600 seconds. Using 
> 3600 seconds instead.
> {code}
> I think it is better to output with WARN only in case (c), but it is ok to 
> output with INFO in case (a) and (b).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >